Re: Spark Beginner: Correct approach for use case

2017-03-08 Thread Allan Richards
hat do aggregations on samples of > the data (cf. https://jornfranke.wordpress.com/2015/06/28/big- > data-what-is-next-oltp-olap-predictive-analytics-sampling- > and-probabilistic-databases). E.g. Hive has a tablesample functionality > since a long time. > > On 5 Mar 2017, at 21:49, Allan R

Spark Beginner: Correct approach for use case

2017-03-05 Thread Allan Richards
Hi, I am looking to use Spark to help execute queries against a reasonably large dataset (1 billion rows). I'm a bit lost with all the different libraries / add ons to Spark, and am looking for some direction as to what I should look at / what may be helpful. A couple of relevant points: - The