Thanks for the tip
Installed Apache drill and need to access hive :)
hduser@rhes564::/usr/lib/apache-drill-1.4.0> bin/drill-embedded
/work/tmp/libnetty-transport-native-epoll2451215308710744204.so:
/lib64/libc.so.6: version `GLIBC_2.10' not found (required by
You are using an old version of Spark and it cannot leverage all optimizations
of Hive, so I think that your conclusion cannot be as easy as you might think.
> On 31 Dec 2015, at 19:34, Mich Talebzadeh wrote:
>
> Ok guys.
>
> I have not succeeded in installing TEZ. Yet
I agree but Spark 1.3.1 on Hive is the only one I have managed to make it work.
Still it is twice as fast as Hive on MapReduce.
Just to clarify my understanding is that the optimiser is provided by Hive and
is the same for both executions engines. Is there anything specific that Spark
1.3.1
Ok guys.
I have not succeeded in installing TEZ. Yet so I can try the query on TEZ as
well.
Just to remind that the query is used is pretty common. Get the total amount
sold for each calendar month from sales (I billion rows) and times
SELECT t.calendar_month_desc,
Both Sybase and hive run on the same host?? Maybe this is the reason..
I think with older versions of spark, such as the one you have predicate push
down (for ORc and/or parquet) does not work. This is another huge performance
penalty. Additionally many optimizations probably do not work. I
I'm afraid I use the HDP distribution so I haven't yet had to compile
anything. (Incidentally, this isn't a recommendation of HDP over anything
else).
On Wed, Dec 30, 2015 at 3:33 PM, Mich Talebzadeh
wrote:
> Thanks Marcin
>
>
>
> Trying to build TEZ 0.7 in
>
>
>
>
Hmm i think the execution Engine TEZ has (currently) the most optimizations on
Hive. What about your hardware - is it the same? Do you have also compression
on Sybase?
Alternatively you need to wait for Hive for interactive analytics (tez 0.8 +
llap).
> On 30 Dec 2015, at 13:47, Mich
Thanks again Jorn.
Both Hive and Sybase IQ are running on the same host. Yes for Sybase IQ I have
compression enabled. The FACT table in IQ (sales) has LF (read bitmap) indexes
on the time_id column. For the dimension table (times) I have time_id defined
as primary key. Also Sybase IQ
Thanks Marcin
Trying to build TEZ 0.7 in
/usr/lib/apache-tez-0.7.0-src
using
mvn -X clean package -DskipTests=true -Dmaven.javadoc.skip=true
with mvn version 3.2.5 (as opposed to 3.3) as I read that I can build it OK
with 3.2.5 following the same error ass below
mvn
Hdp Should have TEZ already on-Board bye default.
> On 30 Dec 2015, at 21:42, Marcin Tustin wrote:
>
> I'm afraid I use the HDP distribution so I haven't yet had to compile
> anything. (Incidentally, this isn't a recommendation of HDP over anything
> else).
>
>> On
Yes, that's why I haven't had to compile anything.
On Wed, Dec 30, 2015 at 4:16 PM, Jörn Franke wrote:
> Hdp Should have TEZ already on-Board bye default.
>
> On 30 Dec 2015, at 21:42, Marcin Tustin wrote:
>
> I'm afraid I use the HDP distribution
Have you tried it with Hive ob TEZ? It contains (currently) more optimizations
than Hive on Spark.
I assume you use the latest Hive version.
Additionally you may want to think about calculating statistics (depending on
your configuration you need to trigger it) - I am not sure if Spark can use
Hi Jorn,
Thanks for your reply. My Hive version is 1.2.1 on Spark 1.3.1. I have not
tried it on TEZ. I tried the query on MR engine and it did nor fair better. I
also ran it without SDDDEV function and found out that the function did not
slow it down.
I tried a simple query as follows
Hi,
I have a fact table in Hive imported from Sybase IQ via SQOOP with 1 billion
rows as follows:
show create table sales;
+---
+--+
|createtab_stmt
|
14 matches
Mail list logo