I agree. IMO, application scheduling is not part of a streaming engine functionality and there are plenty of other projects that can help with it. A streaming engine whether in batch or streaming use case needs to support

- watermarks and triggers (with few exceptions mostly supported by Apex)
- effective resource utilization (contributors to help with the functionality are welcome)

Thank you,

Vlad

On 6/14/17 07:32, Amol Kekre wrote:

The only thing missing is to kick off a job, in case the ask is to use resources the batch way "use and terminate once done". An operator that keeps an eye and has ability to kick off a job suffices. Kicking off a batch job can be done via any of the following

1. Files
-> Start post all data arrival. Usually a .done file in a dir, which triggers entire dir to be processed
   -> Start asap and end on .done
2. Message (a start message)

I think batch use cases are mainly #1. This technically is not a batch vs stream use case, just a scheduler (Oozie like) part of batch.

Thks
Amol


/
/

E:a...@datatorrent.com <mailto:e%3aa...@datatorrent.com> | M: 510-449-2606 | Twitter: @/amolhkekre/

www.datatorrent.com <http://www.datatorrent.com>


On Tue, Jun 13, 2017 at 11:47 PM, Ganelin, Ilya <ilya.gane...@capitalone.com <mailto:ilya.gane...@capitalone.com>> wrote:

    I think it's a very relevant use case. In the Apex formulation
    this would work as follows. An operator runs continuously and
    maintains an internal state that tracks process files or an offset
    (e.g. In Kafka). As more data becomes available, the operator
    performs the appropriate operation and then returns to waiting. In
    this fashion, batched data is processed as soon as it becomes
    available but the process overall is still a batch process since
    it's limited by the production of the source batches.

    There are a couple of examples of this in Malhar, for example the
    AbstractFileInputOperator.

    Your earlier comment with regards to your motivation is
    interesting. Can you elaborate on the load reduction you get with
    your approach? A number of batched small writes to a DB may prove
    to be more efficient from a latency or database utilization
    standpoint when compared with infrequent large batch writes
    particularly if they involve index updates.




    ------------------------------------------------------------------------
    *From:* dashi...@yahoo.com <mailto:dashi...@yahoo.com>
    <dashi...@yahoo.com <mailto:dashi...@yahoo.com>>
    *Sent:* Tuesday, June 13, 2017 6:36:29 PM
    *To:* guilhermeh...@gmail.com <mailto:guilhermeh...@gmail.com>;
    users@apex.apache.org <mailto:users@apex.apache.org>
    *Subject:* Re: Is there a way to schedule an operator?
    I have input operators that reach out to Google, Facebook, Bing,
    Yahoo etc. once a day or an hour and download marketing spend
    statistics. Apex promises batch and streaming to be equal class
    citizens. How is this equality achieved if there's no scheduler
    for batch jobs to rely on? If want the dag to take data stream
    from batch pipeline and affect streaming pipelines running
    alongside. Do you not see this as a valid use case?

    Sent from Yahoo Mail on Android
    <https://overview.mail.yahoo.com/mobile/?.src=Android>

        On Tue, Jun 13, 2017 at 5:29 PM, Guilherme Hott
        <guilhermeh...@gmail.com <mailto:guilhermeh...@gmail.com>> wrote:
        Hi guys,

        Is there a way to schedule an operator? I need an
        operator start the DAG once a day at 00am.

        Best

-- *Guilherme Hott*
        /Software Engineer/
        Skype: guilhermehott
        @guilhermehott
        https://www.linkedin.com/in/guilhermehott
        <https://www.linkedin.com/in/guilhermehott>


    ------------------------------------------------------------------------

    The information contained in this e-mail is confidential and/or
    proprietary to Capital One and/or its affiliates and may only be
    used solely in performance of work or services for Capital One.
    The information transmitted herewith is intended only for use by
    the individual or entity to which it is addressed. If the reader
    of this message is not the intended recipient, you are hereby
    notified that any review, retransmission, dissemination,
    distribution, copying or other use of, or taking of any action in
    reliance upon this information is strictly prohibited. If you have
    received this communication in error, please contact the sender
    and delete the material from your computer.



Reply via email to