I agree. IMO, application scheduling is not part of a streaming engine
functionality and there are plenty of other projects that can help with
it. A streaming engine whether in batch or streaming use case needs to
support
- watermarks and triggers (with few exceptions mostly supported by Apex)
- effective resource utilization (contributors to help with the
functionality are welcome)
Thank you,
Vlad
On 6/14/17 07:32, Amol Kekre wrote:
The only thing missing is to kick off a job, in case the ask is to use
resources the batch way "use and terminate once done". An operator
that keeps an eye and has ability to kick off a job suffices. Kicking
off a batch job can be done via any of the following
1. Files
-> Start post all data arrival. Usually a .done file in a dir,
which triggers entire dir to be processed
-> Start asap and end on .done
2. Message (a start message)
I think batch use cases are mainly #1. This technically is not a batch
vs stream use case, just a scheduler (Oozie like) part of batch.
Thks
Amol
/
/
E:a...@datatorrent.com <mailto:e%3aa...@datatorrent.com> | M:
510-449-2606 | Twitter: @/amolhkekre/
www.datatorrent.com <http://www.datatorrent.com>
On Tue, Jun 13, 2017 at 11:47 PM, Ganelin, Ilya
<ilya.gane...@capitalone.com <mailto:ilya.gane...@capitalone.com>> wrote:
I think it's a very relevant use case. In the Apex formulation
this would work as follows. An operator runs continuously and
maintains an internal state that tracks process files or an offset
(e.g. In Kafka). As more data becomes available, the operator
performs the appropriate operation and then returns to waiting. In
this fashion, batched data is processed as soon as it becomes
available but the process overall is still a batch process since
it's limited by the production of the source batches.
There are a couple of examples of this in Malhar, for example the
AbstractFileInputOperator.
Your earlier comment with regards to your motivation is
interesting. Can you elaborate on the load reduction you get with
your approach? A number of batched small writes to a DB may prove
to be more efficient from a latency or database utilization
standpoint when compared with infrequent large batch writes
particularly if they involve index updates.
------------------------------------------------------------------------
*From:* dashi...@yahoo.com <mailto:dashi...@yahoo.com>
<dashi...@yahoo.com <mailto:dashi...@yahoo.com>>
*Sent:* Tuesday, June 13, 2017 6:36:29 PM
*To:* guilhermeh...@gmail.com <mailto:guilhermeh...@gmail.com>;
users@apex.apache.org <mailto:users@apex.apache.org>
*Subject:* Re: Is there a way to schedule an operator?
I have input operators that reach out to Google, Facebook, Bing,
Yahoo etc. once a day or an hour and download marketing spend
statistics. Apex promises batch and streaming to be equal class
citizens. How is this equality achieved if there's no scheduler
for batch jobs to rely on? If want the dag to take data stream
from batch pipeline and affect streaming pipelines running
alongside. Do you not see this as a valid use case?
Sent from Yahoo Mail on Android
<https://overview.mail.yahoo.com/mobile/?.src=Android>
On Tue, Jun 13, 2017 at 5:29 PM, Guilherme Hott
<guilhermeh...@gmail.com <mailto:guilhermeh...@gmail.com>> wrote:
Hi guys,
Is there a way to schedule an operator? I need an
operator start the DAG once a day at 00am.
Best
--
*Guilherme Hott*
/Software Engineer/
Skype: guilhermehott
@guilhermehott
https://www.linkedin.com/in/guilhermehott
<https://www.linkedin.com/in/guilhermehott>
------------------------------------------------------------------------
The information contained in this e-mail is confidential and/or
proprietary to Capital One and/or its affiliates and may only be
used solely in performance of work or services for Capital One.
The information transmitted herewith is intended only for use by
the individual or entity to which it is addressed. If the reader
of this message is not the intended recipient, you are hereby
notified that any review, retransmission, dissemination,
distribution, copying or other use of, or taking of any action in
reliance upon this information is strictly prohibited. If you have
received this communication in error, please contact the sender
and delete the material from your computer.