API to study key cardinality and distribution and other important statistics about data at certain stage

Nirav Patel Fri, 13 May 2016 12:06:12 -0700

Hi,

Problem is every time job fails or perform poorly at certain stages you
need to study your data distribution just before THAT stage. Overall look
at input data set doesn't help very much if you have so many transformation
going on in DAG. I alway end up writing complicated typed code to run
analysis vs actual job to identify this. Shouldn't there be spark api to
examine this in better way. After all it does go through all the records
(in most cases) to perform transformation or action so as a side job it can
gather statistics as well when instructed.


Thanks

-- 


[image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>

<https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn] 
<https://www.linkedin.com/company/xactly-corporation>  [image: Twitter] 
<https://twitter.com/Xactly>  [image: Facebook] 
<https://www.facebook.com/XactlyCorp>  [image: YouTube] 
<http://www.youtube.com/xactlycorporation>

API to study key cardinality and distribution and other important statistics about data at certain stage

Reply via email to