Hi Jim,

Yes this is what I had in mind, The manage state needs to have
separate input for each of the 5 operators. The platform does not
support connecting multiple output port to a single input port, but
you could achieve similar effect using stream merge operator
(https://github.com/apache/apex-malhar/blob/3ce83708f795b081d564be357a8333928154398e/library/src/main/java/com/datatorrent/lib/stream/StreamMerger.java)

- Tushar.


On Thu, Sep 1, 2016 at 10:37 PM, Jim <[email protected]> wrote:
> Tushar,
>
> Funny that you described it that way, as that is exactly what I was thinking 
> about this morning.
>
>
> So the flow would be:
>
>
>                                                                               
>                         Router Operator
>
>                                                                               
>                                      |
>
>                                                                               
>                  Managed State Operator
>
>                                                                               
>                                      |
>                                 
> ---------------------------------------------------------------------------------------------------------------------------------------
>                                |                                              
>              |                                                              | 
>                                                      |
>
>          General Acknowledgement             Detailed Acknowledgement         
>               Ship Notification                                  Invoice
>
>                                |                                              
>              |                                                              | 
>                                                      |
>                                 
> ---------------------------------------------------------------------------------------------------------------------------------------
>                                                                               
>                                     |
>                               
> ------------------------------------------------------------------------------------------------------------------------------------------
>                            /    each of the 4 operators at the end of 
> processing emits a  record back to Managed State Operator      /
>                           
> ------------------------------------------------------------------------------------------------------------------------------------------
>
>
> In this scenario, would the managed state operator just have 1 input, that 
> all the other operators emit to, or would it need to have separate inputs for 
> each
> of the 5 operators that would be emitting to it?
>
> This is what you were describing too, correct?
>
> Thanks,
>
> Jim
>
> -----Original Message-----
> From: Tushar Gosavi [mailto:[email protected]]
> Sent: Thursday, September 1, 2016 11:49 AM
> To: [email protected]
> Subject: Re: HDHT question - looking for the datatorrent gurus!
>
> Hi Jim,
>
> Currently HDHT is accessible only to single operator in a DAG. Single HDHT 
> store can not be managed by two different operator at a time which could 
> cause metadata corruption. Theoretically HDHT bucket could be read from 
> multiple operators, but only one writer is allowed.
>
> In your case a stage in transaction is processed completely by different 
> operator and then only next stage can start. It could still be achieved by 
> using a single operator which manages HDHT state, and having a loop in DAG to 
> send completed transaction ids to sequencer.
>
> - Sequence operator will emit transaction to transaction processing operator.
> - If it receives an out of order transaction it will note it down in HDHT.
> - The processing operator will send completed transaction id on a port which 
> is connected back to sequence operator.
> - On receiving data on this loopback port, sequence operator will update HDHT 
> and search for next transaction in order, which could be stored in HDHT and 
> will emit to next processing operator.
>
> - Tushar.
>
>
> On Sat, Aug 27, 2016 at 1:31 AM, Jim <[email protected]> wrote:
>> Good afternoon,
>>
>>
>>
>> I have an apex application where I may receive edi transactions, but
>> sometimes they arrive out of order and I want to hold any out of
>> sequence transactions till the correct time in the flow to process them.
>>
>>
>>
>> For example for a standard order, we will receive from the remote vendor:
>>
>>
>>
>> 1.)    General Acknowledgement
>>
>> 2.)    Detailed Acknowledgement
>>
>> 3.)    Ship Notification
>>
>> 4.)    Invoice
>>
>>
>>
>> They are supposed to be sent and received in that order.
>>
>>
>>
>> However sometimes vendors systems have problems, etc. so they send the
>> all of these at the same time, and then we can receive them out of sequence.
>> Data packets for these are very small, say from 1 to 512 bytes, and
>> the only time they will be out of sequence, we will receive them very
>> closely together.
>>
>>
>>
>> I am trying to think of the best way to do this in my datatorrent /
>> Hadoop / yarn facilities, instead of creating a datatable in
>> postgreSQl and using that.
>>
>>
>>
>> Can I create a flow that works like this (I am not sure if this makes
>> sense, or is the best way to solve my problem, while keeping state,
>> etc. maintained for all the operators):
>>
>>
>>
>> 1.)    In the inbound transaction router, check the hdht store for the order
>> number, if it doesn’t exist, this means it is a new order, if the
>> transaction trying to process is the general acknowledgment, emit the
>> data to the general acknowledgement operator; if it is not – store the
>> transaction data into the correct bucket identifying the transaction
>> is it for, as well as the next step to be the general acknowledgement
>> in HDHT by order number.
>>
>> 2.)    Say the next transaction is the ship notification, in the router, we
>> would check the HDHT store, see this is not the next expected
>> transaction (say it is supposed to be the detail acknowledgement), so
>> we would just post the data for the ship notification into HDHT the store 
>> and say we are done.
>>
>> 3.)    Say we now receive the detailed acknowledgement for an order whose
>> next step IS the detailed acknowledgement, we would see this is the
>> correct next transaction, emit it to the detailed acknowledgement
>> operator, and update the HDHT store to show that the next transaction
>> should be the ship notification.  NOTE:  we can’t emit the ship
>> notification yet, till we have confirmed that the detailed ackkowledgment 
>> has been completed.
>>
>> 4.)    In each of the 4 transaction operators at the end of the processing,
>> we would update the HDHT store to show the next expected step, and if
>> we already received data for the next expected step pull it from the
>> HDHT store, and write the transaction into our SQS queue which is the
>> input into the inbound transaction router at the beginning of the
>> application, so it processes through the system.
>>
>>
>>
>> I believe HDHT can be used to pass data throughout an entire
>> application, and is not limited to just a per operator basis, correct?
>>
>>
>>
>> Any comments / feedback?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Jim

Reply via email to