First it depends on what you want to do exactly. Second, Hive > 1.2, Tez as an 
Execution Engine (I recommend >= 0.8) and Orc as storage format can be pretty 
quick depending on your use case. Additionally you may want to employ 
compression which is a performance boost once you understand how storage 
indexes and bloom filter work. Additionally , you need to think about how you 
sort the data. Cf. also
https://snippetessay.wordpress.com/2015/07/25/hive-optimizations-with-indexes-bloom-filters-and-statistics/

However, you have to rethink how you define your technical data model. A lot of 
prejoinend data in a big flat table can be more performant when using storage 
indexes and bloom filters than using standard indexes and dimensional modeling.

Besides besides tez you can also use other execution engine in your session (eg 
Spark) if this makes sense.

Finally you have to review how yarn manages resources including preemption, 
fair vs capacity scheduler etc.

Btw the same holds also for relational database appliances, such as Exadata. 
The standard approach dimensional modeling + standard indexes there is often 
not anymore the most performant. 



> On 05 Nov 2015, at 20:04, Andrés Ivaldi <iaiva...@gmail.com> wrote:
> 
> Hello, 
> I was looking for Hive as OLAP alternative, but I've read that is quite slow 
> for that, does anybody have experiences about? or a Hive altenative for OLAP? 
> Killin is not an option becouse we need dynamic OLAP like ROLAP
> 
> Regards,
> 
> -- 
> Ing. Ivaldi Andres

Reply via email to