Re: is there a way to persist the lineages generated by spark?

2017-04-06 Thread kant kodali
yes Lineage that is actually replayable is what is needed for Validation
process. So we can address questions like how a system arrived at a state S
at a time T. I guess a good analogy is event sourcing.


On Thu, Apr 6, 2017 at 10:30 PM, Jörn Franke  wrote:

> I do think this is the right way, you will have to do testing with test
> data verifying that the expected output of the calculation is the output.
> Even if the logical Plan Is correct your calculation might not be. E.g.
> There can be bugs in Spark, in the UI or (what is very often) the client
> describes a calculation, but in the end the description is wrong.
>
> > On 4. Apr 2017, at 05:19, kant kodali  wrote:
> >
> > Hi All,
> >
> > I am wondering if there a way to persist the lineages generated by spark
> underneath? Some of our clients want us to prove if the result of the
> computation that we are showing on a dashboard is correct and for that If
> we can show the lineage of transformations that are executed to get to the
> result then that can be the Q.E.D moment but I am not even sure if this is
> even possible with spark?
> >
> > Thanks,
> > kant
>


Re: is there a way to persist the lineages generated by spark?

2017-04-06 Thread Jörn Franke
I do think this is the right way, you will have to do testing with test data 
verifying that the expected output of the calculation is the output. 
Even if the logical Plan Is correct your calculation might not be. E.g. There 
can be bugs in Spark, in the UI or (what is very often) the client describes a 
calculation, but in the end the description is wrong.

> On 4. Apr 2017, at 05:19, kant kodali  wrote:
> 
> Hi All,
> 
> I am wondering if there a way to persist the lineages generated by spark 
> underneath? Some of our clients want us to prove if the result of the 
> computation that we are showing on a dashboard is correct and for that If we 
> can show the lineage of transformations that are executed to get to the 
> result then that can be the Q.E.D moment but I am not even sure if this is 
> even possible with spark?
> 
> Thanks,
> kant

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: is there a way to persist the lineages generated by spark?

2017-04-06 Thread Gourav Sengupta
Hi,

I think that every client wants a validation process, but showing lineage
is a approach that they are not asking, and may not be the right way to
prove it.


Regards,
Gourav

On Tue, Apr 4, 2017 at 4:19 AM, kant kodali  wrote:

> Hi All,
>
> I am wondering if there a way to persist the lineages generated by spark
> underneath? Some of our clients want us to prove if the result of the
> computation that we are showing on a dashboard is correct and for that If
> we can show the lineage of transformations that are executed to get to the
> result then that can be the Q.E.D moment but I am not even sure if this is
> even possible with spark?
>
> Thanks,
> kant
>


Re: is there a way to persist the lineages generated by spark?

2017-04-03 Thread ayan guha
How about storing logical plans (or printDebugString, in case of RDD) to an
external file on the driver?

On Tue, Apr 4, 2017 at 1:19 PM, kant kodali  wrote:

> Hi All,
>
> I am wondering if there a way to persist the lineages generated by spark
> underneath? Some of our clients want us to prove if the result of the
> computation that we are showing on a dashboard is correct and for that If
> we can show the lineage of transformations that are executed to get to the
> result then that can be the Q.E.D moment but I am not even sure if this is
> even possible with spark?
>
> Thanks,
> kant
>



-- 
Best Regards,
Ayan Guha


is there a way to persist the lineages generated by spark?

2017-04-03 Thread kant kodali
Hi All,

I am wondering if there a way to persist the lineages generated by spark
underneath? Some of our clients want us to prove if the result of the
computation that we are showing on a dashboard is correct and for that If
we can show the lineage of transformations that are executed to get to the
result then that can be the Q.E.D moment but I am not even sure if this is
even possible with spark?

Thanks,
kant