Hi Jim, Yes this is what I had in mind, The manage state needs to have separate input for each of the 5 operators. The platform does not support connecting multiple output port to a single input port, but you could achieve similar effect using stream merge operator (https://github.com/apache/apex-malhar/blob/3ce83708f795b081d564be357a8333928154398e/library/src/main/java/com/datatorrent/lib/stream/StreamMerger.java)
- Tushar. On Thu, Sep 1, 2016 at 10:37 PM, Jim <[email protected]> wrote: > Tushar, > > Funny that you described it that way, as that is exactly what I was thinking > about this morning. > > > So the flow would be: > > > > Router Operator > > > | > > > Managed State Operator > > > | > > --------------------------------------------------------------------------------------------------------------------------------------- > | > | | > | > > General Acknowledgement Detailed Acknowledgement > Ship Notification Invoice > > | > | | > | > > --------------------------------------------------------------------------------------------------------------------------------------- > > | > > ------------------------------------------------------------------------------------------------------------------------------------------ > / each of the 4 operators at the end of > processing emits a record back to Managed State Operator / > > ------------------------------------------------------------------------------------------------------------------------------------------ > > > In this scenario, would the managed state operator just have 1 input, that > all the other operators emit to, or would it need to have separate inputs for > each > of the 5 operators that would be emitting to it? > > This is what you were describing too, correct? > > Thanks, > > Jim > > -----Original Message----- > From: Tushar Gosavi [mailto:[email protected]] > Sent: Thursday, September 1, 2016 11:49 AM > To: [email protected] > Subject: Re: HDHT question - looking for the datatorrent gurus! > > Hi Jim, > > Currently HDHT is accessible only to single operator in a DAG. Single HDHT > store can not be managed by two different operator at a time which could > cause metadata corruption. Theoretically HDHT bucket could be read from > multiple operators, but only one writer is allowed. > > In your case a stage in transaction is processed completely by different > operator and then only next stage can start. It could still be achieved by > using a single operator which manages HDHT state, and having a loop in DAG to > send completed transaction ids to sequencer. > > - Sequence operator will emit transaction to transaction processing operator. > - If it receives an out of order transaction it will note it down in HDHT. > - The processing operator will send completed transaction id on a port which > is connected back to sequence operator. > - On receiving data on this loopback port, sequence operator will update HDHT > and search for next transaction in order, which could be stored in HDHT and > will emit to next processing operator. > > - Tushar. > > > On Sat, Aug 27, 2016 at 1:31 AM, Jim <[email protected]> wrote: >> Good afternoon, >> >> >> >> I have an apex application where I may receive edi transactions, but >> sometimes they arrive out of order and I want to hold any out of >> sequence transactions till the correct time in the flow to process them. >> >> >> >> For example for a standard order, we will receive from the remote vendor: >> >> >> >> 1.) General Acknowledgement >> >> 2.) Detailed Acknowledgement >> >> 3.) Ship Notification >> >> 4.) Invoice >> >> >> >> They are supposed to be sent and received in that order. >> >> >> >> However sometimes vendors systems have problems, etc. so they send the >> all of these at the same time, and then we can receive them out of sequence. >> Data packets for these are very small, say from 1 to 512 bytes, and >> the only time they will be out of sequence, we will receive them very >> closely together. >> >> >> >> I am trying to think of the best way to do this in my datatorrent / >> Hadoop / yarn facilities, instead of creating a datatable in >> postgreSQl and using that. >> >> >> >> Can I create a flow that works like this (I am not sure if this makes >> sense, or is the best way to solve my problem, while keeping state, >> etc. maintained for all the operators): >> >> >> >> 1.) In the inbound transaction router, check the hdht store for the order >> number, if it doesn’t exist, this means it is a new order, if the >> transaction trying to process is the general acknowledgment, emit the >> data to the general acknowledgement operator; if it is not – store the >> transaction data into the correct bucket identifying the transaction >> is it for, as well as the next step to be the general acknowledgement >> in HDHT by order number. >> >> 2.) Say the next transaction is the ship notification, in the router, we >> would check the HDHT store, see this is not the next expected >> transaction (say it is supposed to be the detail acknowledgement), so >> we would just post the data for the ship notification into HDHT the store >> and say we are done. >> >> 3.) Say we now receive the detailed acknowledgement for an order whose >> next step IS the detailed acknowledgement, we would see this is the >> correct next transaction, emit it to the detailed acknowledgement >> operator, and update the HDHT store to show that the next transaction >> should be the ship notification. NOTE: we can’t emit the ship >> notification yet, till we have confirmed that the detailed ackkowledgment >> has been completed. >> >> 4.) In each of the 4 transaction operators at the end of the processing, >> we would update the HDHT store to show the next expected step, and if >> we already received data for the next expected step pull it from the >> HDHT store, and write the transaction into our SQS queue which is the >> input into the inbound transaction router at the beginning of the >> application, so it processes through the system. >> >> >> >> I believe HDHT can be used to pass data throughout an entire >> application, and is not limited to just a per operator basis, correct? >> >> >> >> Any comments / feedback? >> >> >> >> Thanks, >> >> >> >> Jim
