RE: Running the same query on 1 billion rows fact table in Hive on Spark compared to Sybase IQ columnar database

2015-12-31 Thread Mich Talebzadeh
Thanks for the tip Installed Apache drill and need to access hive :) hduser@rhes564::/usr/lib/apache-drill-1.4.0> bin/drill-embedded /work/tmp/libnetty-transport-native-epoll2451215308710744204.so: /lib64/libc.so.6: version `GLIBC_2.10' not found (required by

Re: Running the same query on 1 billion rows fact table in Hive on Spark compared to Sybase IQ columnar database

2015-12-31 Thread Jörn Franke
You are using an old version of Spark and it cannot leverage all optimizations of Hive, so I think that your conclusion cannot be as easy as you might think. > On 31 Dec 2015, at 19:34, Mich Talebzadeh wrote: > > Ok guys. > > I have not succeeded in installing TEZ. Yet

RE: Running the same query on 1 billion rows fact table in Hive on Spark compared to Sybase IQ columnar database

2015-12-31 Thread Mich Talebzadeh
I agree but Spark 1.3.1 on Hive is the only one I have managed to make it work. Still it is twice as fast as Hive on MapReduce. Just to clarify my understanding is that the optimiser is provided by Hive and is the same for both executions engines. Is there anything specific that Spark 1.3.1

RE: Running the same query on 1 billion rows fact table in Hive on Spark compared to Sybase IQ columnar database

2015-12-31 Thread Mich Talebzadeh
Ok guys. I have not succeeded in installing TEZ. Yet so I can try the query on TEZ as well. Just to remind that the query is used is pretty common. Get the total amount sold for each calendar month from sales (I billion rows) and times SELECT t.calendar_month_desc,

Re: Running the same query on 1 billion rows fact table in Hive on Spark compared to Sybase IQ columnar database

2015-12-30 Thread Jörn Franke
Both Sybase and hive run on the same host?? Maybe this is the reason.. I think with older versions of spark, such as the one you have predicate push down (for ORc and/or parquet) does not work. This is another huge performance penalty. Additionally many optimizations probably do not work. I

Re: Running the same query on 1 billion rows fact table in Hive on Spark compared to Sybase IQ columnar database

2015-12-30 Thread Marcin Tustin
I'm afraid I use the HDP distribution so I haven't yet had to compile anything. (Incidentally, this isn't a recommendation of HDP over anything else). On Wed, Dec 30, 2015 at 3:33 PM, Mich Talebzadeh wrote: > Thanks Marcin > > > > Trying to build TEZ 0.7 in > > > >

Re: Running the same query on 1 billion rows fact table in Hive on Spark compared to Sybase IQ columnar database

2015-12-30 Thread Jörn Franke
Hmm i think the execution Engine TEZ has (currently) the most optimizations on Hive. What about your hardware - is it the same? Do you have also compression on Sybase? Alternatively you need to wait for Hive for interactive analytics (tez 0.8 + llap). > On 30 Dec 2015, at 13:47, Mich

RE: Running the same query on 1 billion rows fact table in Hive on Spark compared to Sybase IQ columnar database

2015-12-30 Thread Mich Talebzadeh
Thanks again Jorn. Both Hive and Sybase IQ are running on the same host. Yes for Sybase IQ I have compression enabled. The FACT table in IQ (sales) has LF (read bitmap) indexes on the time_id column. For the dimension table (times) I have time_id defined as primary key. Also Sybase IQ

RE: Running the same query on 1 billion rows fact table in Hive on Spark compared to Sybase IQ columnar database

2015-12-30 Thread Mich Talebzadeh
Thanks Marcin Trying to build TEZ 0.7 in /usr/lib/apache-tez-0.7.0-src using mvn -X clean package -DskipTests=true -Dmaven.javadoc.skip=true with mvn version 3.2.5 (as opposed to 3.3) as I read that I can build it OK with 3.2.5 following the same error ass below mvn

Re: Running the same query on 1 billion rows fact table in Hive on Spark compared to Sybase IQ columnar database

2015-12-30 Thread Jörn Franke
Hdp Should have TEZ already on-Board bye default. > On 30 Dec 2015, at 21:42, Marcin Tustin wrote: > > I'm afraid I use the HDP distribution so I haven't yet had to compile > anything. (Incidentally, this isn't a recommendation of HDP over anything > else). > >> On

Re: Running the same query on 1 billion rows fact table in Hive on Spark compared to Sybase IQ columnar database

2015-12-30 Thread Marcin Tustin
Yes, that's why I haven't had to compile anything. On Wed, Dec 30, 2015 at 4:16 PM, Jörn Franke wrote: > Hdp Should have TEZ already on-Board bye default. > > On 30 Dec 2015, at 21:42, Marcin Tustin wrote: > > I'm afraid I use the HDP distribution

Re: Running the same query on 1 billion rows fact table in Hive on Spark compared to Sybase IQ columnar database

2015-12-30 Thread Jörn Franke
Have you tried it with Hive ob TEZ? It contains (currently) more optimizations than Hive on Spark. I assume you use the latest Hive version. Additionally you may want to think about calculating statistics (depending on your configuration you need to trigger it) - I am not sure if Spark can use

RE: Running the same query on 1 billion rows fact table in Hive on Spark compared to Sybase IQ columnar database

2015-12-30 Thread Mich Talebzadeh
Hi Jorn, Thanks for your reply. My Hive version is 1.2.1 on Spark 1.3.1. I have not tried it on TEZ. I tried the query on MR engine and it did nor fair better. I also ran it without SDDDEV function and found out that the function did not slow it down. I tried a simple query as follows

Running the same query on 1 billion rows fact table in Hive on Spark compared to Sybase IQ columnar database

2015-12-29 Thread Mich Talebzadeh
Hi, I have a fact table in Hive imported from Sybase IQ via SQOOP with 1 billion rows as follows: show create table sales; +--- +--+ |createtab_stmt |