Its more of a persisted service atm. Ill take a look at defining this the way you spoke of. Thanks!
On Fri, Jul 25, 2014 at 12:11 PM, Siddharth Seth <[email protected]> wrote: > Doing something like that would involve writing a new Outputs / Inputs, or > modifying the existing ones to write to a different sink. We have > prototyped such changes in the past - to write to HDFS as an example, and > the changes are not very complicated. > This involves changing how the existing Outputs write data, modifying > DataMovementEvent payloads to contain relevant data (where to fetch from), > and changing the Inputs to process this DataMovement payload to actually > fetch the data. > One thing to look at though - is that if you're writing directly to your > own service - will the data be persisted there, until it's read be the > downstream vertex - or does the data effectively need to be streamed > through (consumers and producer tasks running independently of each other, > or consumers and producer tasks must run at the same time). > > > On Fri, Jul 25, 2014 at 12:03 PM, David Capwell <[email protected]> > wrote: > >> Was looking into saying that when two vertexes share data, that they can >> choose to share that data over disk, or over our internal system (so share >> over network). In the cases where data persistence isn't needed and the >> vertexes can be on the same node, then to ignore this system. >> >> The use-case isn't really fleshed out at the moment. Looking to >> prototype to see how it would all play together. >> >> >> On Fri, Jul 25, 2014 at 11:53 AM, Siddharth Seth <[email protected]> >> wrote: >> >>> DataSourceType isn't really used at the moment. Eventually, it would >>> serve more as a scheduling and failure recovery mechanism more than >>> deciding how data gets persisted between stages. (This property could >>> potentially be used by some of the Inputs/Outputs to alter the way they >>> persist data - but that isn't currently on the cards). >>> This primarily applies to data written on Edges - are you somehow >>> looking to modify that, or use the data generated by an intermediate Vertex >>> in a separate process ? >>> Getting a little more info on the use case would be helpful in figuring >>> out how Tez can be used. Are you looking to read data from this internal >>> service, publish to it, or something else ? >>> >>> >>> On Fri, Jul 25, 2014 at 11:36 AM, David Capwell <[email protected]> >>> wrote: >>> >>>> Sorry, copy/paste issue. I was looking at DataSourceType and trying to >>>> see how data gets saved and read between tasks. The use-case is that we >>>> have an internal service that might be helpful for us, so wanted to >>>> prototype how possible it would be to share data over different mechanism. >>>> >>>> >>>> On Fri, Jul 25, 2014 at 10:36 AM, Hitesh Shah <[email protected]> >>>> wrote: >>>> >>>>> DataMovementEvent is a construct defined for an Input/Output pair to >>>>> communicate with each other. The actual information being passed between >>>>> the 2 is not understood by the framework except in that, it is a byte >>>>> payload to be handed off from the source to the destination. Users are not >>>>> expected to create derived classes of this type but to use the payload >>>>> within the object to pass information around. >>>>> >>>>> For example, most of the currently implemented Input-Output pairs ( >>>>> for shuffle/broadcast edges ) use the payload to pass the url specifying >>>>> the location of the data to be fetched. >>>>> >>>>> thanks >>>>> — HItesh >>>>> >>>>> On Jul 25, 2014, at 10:23 AM, David Capwell <[email protected]> >>>>> wrote: >>>>> >>>>> > So going through the code and not sure where the real logic of >>>>> DataMovementType gets used. >>>>> > >>>>> > I see that in DagTypeConverts it can convert between >>>>> DataMovementType and PlanEdgeDataMovementType, but once that happens I >>>>> don't really see a way to implement any of these types. Where is the >>>>> implementations defined? Is there any way to define my own impls? >>>>> > >>>>> > Thanks for your time. >>>>> >>>>> >>>> >>> >> >
