t
> we don't have so many variants and parameters. Otherwise, as we add more
> parameters for event time support and type information for serdes, it'll
> get really difficult to both discover the appropriate window variant and
> understand a window specification once written.
>
wing first and then separately
specify the aggregation operation (e.g. group by key, combine, sum,
etc.). This saves from the combinatorial explosion of keyed / global,
unwindowed (batch) / session / fixed / tumbling / etc., GBK / sum /
count / etc.
- Chris
On Tue, Jun 20, 2017 at 10:46 AM, Chris Pett
it for each job,
>>removing the current JobRunner interface
>>6. We also need a StreamTaskApplication class that allows user to
>>create task-level applications, by mandate the constructor with a
>> parameter
>>of StreamTaskFactory
>>
>>
taking one input source emits watermark at 1min interval and the
>> task taking another input source emits watermark at 5min interval, how does
>> the downstream consumer reconcile the watermarks?
>>
>> Watermark propagation does not require synchronization. Chris's equati
oo and I dropped the ControlMessage type in
> my
> > code. I also moved taskName, taskCount to the parent ControlMessage
> class.
> > Just updated the SEP-6. Please take a look again.
> >
> > Thanks,
> > Xinyu
> >
> > On Tue, May 30, 2017 at 9:12 AM, Chri
MessageType and ControlMessage.Type look redundant. You could either use
"ControlMessage" as the type in MessageType or drop ControlMessage.Type.
On Fri, May 26, 2017 at 5:14 PM, xinyu liu wrote:
> Thanks a lot for the comments. I updated the SEP with more details and
>
ke Kafka,
> artifactory or http server. For Kafka, it's straightforward but we will
> have the size limit or cut it by ourselves. For the other two, we need to
> investigate whether we can easily upload jars to our artifactory and
> localizing it with Yarn. Any opinions on this?
>
.run(), the operator user classes/lamdas in the StreamGraph need to
> > be serialized. As today, the existing option is to serialize to a stream,
> > either the coordinator stream or the pipeline control stream, which will
> > have the size limit per message. Do you see RPC as an
That should have been:
For #1, Beam doesn't have a hard requirement...
On Thu, Apr 27, 2017 at 9:07 AM, Chris Pettitt <cpett...@linkedin.com>
wrote:
> For #1, I doesn't have a hard requirement for any change from Samza. A
> very nice to have would be to allow the input systems
For #1, I doesn't have a hard requirement for any change from Samza. A very
nice to have would be to allow the input systems to be set up at the same
time as the rest of the StreamGraph. An even nicer to have would be to do
away with the callback based approach and treat graph building as a
, Apr 20, 2017 at 3:52 PM, Chris Pettitt <cpett...@linkedin.com>
wrote:
> It might be worth taking a look at how Beam does test streams. The API is
> more powerful than just passing in a queue, e.g.:
>
> TestStream source = TestStream.create(StringUtf8Coder.of())
> .addElement
It might be worth taking a look at how Beam does test streams. The API is
more powerful than just passing in a queue, e.g.:
TestStream source = TestStream.create(StringUtf8Coder.of())
.addElements(TimestampedValue.of("this", start))
.addElements(TimestampedValue.of("that", start))
(line 25)
<https://reviews.apache.org/r/53282/#comment224228>
We have a HighResolutionClock that does this. I think you can use it here.
- Chris Pettitt
On Nov. 2, 2016, 5:56 p.m., Xinyu Liu wrote:
>
> ---
> This is a
need an atomic
ref - a volatile would be sufficient. However, if the code can be run in a
multi-threaded path, you would indeed want to use CAS to dequeue the exception
in one operation.
- Chris Pettitt
On Sept. 29, 2016, 7:04 p.m., Xinyu Liu wrote
> On Sept. 14, 2016, 7:03 p.m., Chris Pettitt wrote:
> > This review turned out not to be so massive as I expected (lots of
> > deletes). I don't see any serious issues. There are some minor cosmetic
> > issues and some subjective stuff that you can take or leave. I t
35/#comment216488>
This doesn't appear to be testing anything (other than that types check).
samza-operator/src/test/java/org/apache/samza/task/AssembleCallGraphTask.java
(line 141)
<https://reviews.apache.org/r/47835/#comment216489>
Minor: indentation is off here.
> On Aug. 26, 2016, 7:51 p.m., Chris Pettitt wrote:
> > samza-api/src/main/java/org/apache/samza/system/IncomingMessageEnvelope.java,
> > line 31
> > <https://reviews.apache.org/r/51346/diff/4/?file=1486375#file1486375line31>
> >
> > How likely are we
, the documentation reads like it is
OK to do synchronous IO by just throwing more threads at the problem. This is
not a good idea. The ability to do synchronous IO is a transition step towards
async, not an end state.
- Chris Pettitt
On July 27, 2016, 11:05 p.m., Xinyu Liu wrote
>
> (Updated July 19, 2016, 1:05 a.m.)
>
>
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data
> Infrastructure).
>
>
> Repository: samza
>
>
> Description
> ---
>
from T to a Collection of R instead of having a separate
flat map function type.
This comment can be applied generally wherever we're providing "functions"
- Chris Pettitt
On July 13, 2016, 8:54 a.m., Yi Pan (Data Infra
/SamzaContainer.scala
18c09224bbae959342daf9b2b7a7d971cc224f48
samza-core/src/test/java/org/apache/samza/util/TestThrottlingExecutor.java
26590507b9c72a8c64171aeb1e5b7c3d5c24c41a
Diff: https://reviews.apache.org/r/50082/diff/
Testing
---
gradle test
Thanks,
Chris Pettitt
his is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50082/#review142459
---
On July 15, 2016, 7:59 p.m., Chris Pettitt wrote:
>
> ---
>
/SamzaContainer.scala
18c09224bbae959342daf9b2b7a7d971cc224f48
samza-core/src/test/java/org/apache/samza/util/TestThrottlingExecutor.java
26590507b9c72a8c64171aeb1e5b7c3d5c24c41a
Diff: https://reviews.apache.org/r/50082/diff/
Testing (updated)
---
gradle test
Thanks,
Chris Pettitt
18c09224bbae959342daf9b2b7a7d971cc224f48
samza-core/src/test/java/org/apache/samza/util/TestThrottlingExecutor.java
26590507b9c72a8c64171aeb1e5b7c3d5c24c41a
Diff: https://reviews.apache.org/r/50082/diff/
Testing
---
Thanks,
Chris Pettitt
---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48213/#review142418
---
Ship it!
- Chris Pettitt
On July 14, 2016, 12:43 a.m., Xinyu
r/host/PosixCommandBasedStatisticsGetter.java
(lines 47 - 55)
<https://reviews.apache.org/r/49877/#comment207972>
Not required, but you could simplify this with try-with-resources:
https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html
- Chris Pettitt
On July 14, 20
> On July 12, 2016, 6:43 p.m., Chris Pettitt wrote:
> > samza-core/src/main/java/org/apache/samza/processor/StreamProcessor.java,
> > lines 174-176
> > <https://reviews.apache.org/r/48356/diff/8/?file=1441900#file1441900line174>
> >
> > Don't we need t
> On June 27, 2016, 6:53 p.m., Chris Pettitt wrote:
> > A few more thoughts below.
> >
> > Still not a fan of the direction we're going with the config. I know it is
> > status quo, but it further locks us into a limited model. One other benefit
> > of
rently? If so, let's document
this.
- Chris Pettitt
On July 12, 2016, midnight, Navina Ramesh wrote:
>
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.
> On July 11, 2016, 6:47 p.m., Chris Pettitt wrote:
> > Very high level question: I assume you looked at `ps -o rss` and
> > disqualified it for some reason. Could you elaborate as to why? `ps` itself
> > is certainly more portable than procfs (though `-o rss` is not p
343>
In the code you copied this from I prefix the thread name with "Samza-".
The reason to do that is it becomes much easier when looking at a thread dump
to determine if the thread is from Samza or thirdy party code. I'd recommend
indicating this is a Samza thread someh
` and disqualified
it for some reason. Could you elaborate as to why? `ps` itself is certainly
more portable than procfs (though `-o rss` is not part of the POSIX standard) -
it works on RHEL and OSX. It also gives you the actual memory usage vs the
number of pages.
- Chris Pettitt
On July 11, 2016
che/samza/configbuilder/TestStandaloneConfigBuilder.java
(line 50)
<https://reviews.apache.org/r/48356/#comment204900>
To future proof this a bit you could use
AllSspToSingleTaskGrouperFactory.class.getName. Same for the one below.
- Ch
multi-threaded context.
samza-core/src/main/java/org/apache/samza/task/TaskCallbackManager.java (line
94)
<https://reviews.apache.org/r/48243/#comment203223>
final
- Chris Pettitt
On June 15, 2016, 11:41 p.m., Xin
> On June 15, 2016, 3:08 p.m., Chris Pettitt wrote:
> > samza-core/src/main/java/org/apache/samza/task/AsyncRunLoop.java, line 188
> > <https://reviews.apache.org/r/48243/diff/2/?file=1412994#file1412994line188>
> >
> > Do we need to handle the case th
nt as above re. run loop termination. You potentially could even
have a abortRunLoop(Throwable) function that would make this super clear.
We should sync up on how to coordinate the disk quota changes and this change.
- Chris Pettitt
On June
his is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48243/
> ---
>
> (Updated June 9, 2016, 7:49 p.m.)
>
>
> Review request for samza, Chris Pettitt, Navina Ram
eature to be < 150ns for Linux and OSX
Thanks,
Chris Pettitt
---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48182/#review136879
---
Ship it!
Ship It!
- Chris Pettitt
On June 9, 2016, 12:33
generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48182/
> ---
>
> (Updated June 9, 2016, 12:33 a.m.)
>
>
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data
> Infrastructure).
>
>
>
---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48213/#review136875
---
Ship it!
Ship It!
- Chris Pettitt
On June 8, 2016, 11:53
Container.scala
(lines 617 - 619)
<https://reviews.apache.org/r/48356/#comment201973>
I hope this is temporary. If not, if would be nicer to provide proper
lifecycle / shutdown mechansisms. For example, on stop, I may want to flush a
cache to disk.
- Chris Pettitt
On June 8, 2016, 9
as is, are a bit too heavy.
samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala (lines 71 -
75)
<https://reviews.apache.org/r/48243/#comment201795>
Same comment as for the async run loop.
- Chris Pettitt
O
es close correctly). Another benefit is that you reduce the number of
volatile reads you need to make.
- Chris Pettitt
On June 3, 2016, 10:09 p.m., Xinyu Liu wrote:
>
> ---
> This is an automatically generated e-mai
m the main
thread? It will give you a much more helpful error message.
- Chris Pettitt
On June 3, 2016, 9:30 p.m., Xinyu Liu wrote:
>
> ---
> This is an automatically generated e-mail. To reply, visit:
> http
le.
samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemProducer.scala
(line 129)
<https://reviews.apache.org/r/48213/#comment201105>
same re. lock / volatile
- Chris Pettitt
On June 3, 2016, 5:
a race here if this code is invoked before the store.put on
line 492?
- Chris Pettitt
On June 2, 2016, 6:33 p.m., Xinyu Liu wrote:
>
> ---
> This is an automatically generated e-mail. To
reducing them a few lines
down in the sleep call. The sleep call is returning the amount of error on the
sleep (which may be negative). Previously I actually measured the error here
instead of in the sleep call and did the decrement inline. I'm not wed to doing
it in the sleep
> On June 1, 2016, 3:20 p.m., Chris Pettitt wrote:
> > samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala, line
> > 179
> > <https://reviews.apache.org/r/48109/diff/1/?file=1402973#file1402973line179>
> >
> > I think you could even
existing tests with gradle test
- Verified throttling behavior and instrumentation with local deployment
- Verified average latency impact of feature to be < 150ns for Linux and OSX
Thanks,
Chris Pettitt
/coordinator/stream/CoordinatorStreamSystemConsumer.java
(line 66)
<https://reviews.apache.org/r/47197/#comment197035>
You actually don't need to wrap emptySet because it's already immutable.
- Chris Pettitt
On May 11, 2016, 8:13 p.m., Jake Maes
ent196969>
You only need volatile here (vs. AtomicReference) since you're not using
any CAS operation.
For full safety, you need to wrap the set in an unmodifiable wrapper.
Otherwise it would be possible to modify the set via "read only" methods like
getBootstrappedStr
elieve
javadoc is not generated for private members by default).
- Chris Pettitt
On May 5, 2016, 9:13 p.m., Navina Ramesh wrote:
>
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.
working with history (e.g. git bisect).
- Chris Pettitt
On April 25, 2016, 5:25 p.m., Navina Ramesh wrote:
>
> ---
> This is an automatically generated e-mail. To reply, visi
> On May 3, 2016, 5:23 p.m., Chris Pettitt wrote:
> > samza-core/src/main/java/org/apache/samza/coordinator/StreamPartitionCountMonitor.java,
> > line 107
> > <https://reviews.apache.org/r/46856/diff/2/?file=1367775#file1367775line107>
> >
> > This is
ient. I think you're
checking the terminated flag for test, but the test code is already doing a
timed wait, so you could switch it to
`assertTrue(monitor.awaitTermination(...))` for the same effect.
- Chris Pettitt
On April 29, 201
an implicit assumption that retryBackoff is providing
a happens-before constraint between an invocation of the exception handler in
retryBackoff and a subsequent invocation of the loop. This likely always holds,
but a CAS for numRetries would cover cases where that does not hold.
- Chris Pettit
; This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44920/
> ---
>
> (Updated April 12, 2016, 11:45 p.m.)
>
>
> Review request for samza, Boris Shkolnik, Chris Pettitt, Jake Mae
r than killing off several LoC) is that you get a timeout
for the whole process versus timeouts on individual steps.
Looks very good! I especially like the more intuitive host names.
- Chris Pettitt
On April 8, 2016, 3:02 p.m
.
Thanks,
Chris Pettitt
> On March 16, 2016, 11:10 p.m., Chris Pettitt wrote:
> > Some more comments
Sorry, did not mean to create issues for all of the below. I think #1 and #2
are the most interesting to look at of the group.
.org/r/44920/#comment186274>
Minor: may as well use a diamond here since you're using it on the next
line.
- Chris Pettitt
On March 16, 2016, 6:23 p.m., Jagadish Venkatraman wrote:
>
> ---
> This is an automatically gen
://reviews.apache.org/r/44920/#comment186254>
This should be volatile, especially as I believe it is being used to convey
state across threads (e.g. whatever calls onContainerCompleted and the main
thread in ClusterBasedJobCoordinator).
- Chris Pettitt
On March 16, 2016, 6:23 p.m., Jagadi
> On March 16, 2016, 8:48 p.m., Chris Pettitt wrote:
> > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaAppState.java,
> > line 36
> > <https://reviews.apache.org/r/44920/diff/1/?file=1301270#file1301270line36>
> >
> > Agreed. This i
64 matches
Mail list logo