> I have a idea  to put temp table in tachyon to speed query.
> I found the jira which is similar to my idea.
> https://issues.apache.org/jira/browse/HIVE-7313

It is an easy thing from a logical stand point - temporary tables can be
hosted in any hadoop fs location.

create temporary table x(x int) location 'tachyon://tmp/_tmp.db/x';


But tachyon is inherently unreliable with an LRU eviction policy which
removes blocks from Tachyon (i.e final tier eviction).

You have to figure out a way to recompute part of a temp-table when
Tachyon throws away a block.

Or are you going to pin everything into memory and potentially fill it
with junk temp-tables, which is probably a bad idea.

The patch you're looking at is only half of what Hive does.

The HDFS in-mem implementation massively improves the write throughput
(since we can write faster than any disk into it), this is flushed to disk
as a 2nd replica in a few seconds asynchronously.

The LLAP in-mem implementation handles the life cycle of the table
in-memory while it's being processed, caching only a fraction of the table
(like only 1 column) into memory or just caching the ORC bloom-filter
indexes into memory instead of the whole file. A filter clause is
evaluated against this bloom filter before it produces a cache miss - if
the bloom filter says "don't read this", it just skips the data read
entirely.

When LLAP evicts, it is only evicting the 3rd replica of the data-set, not
the source of truth (which is the disk replica, which has checksums).

Cheers,
Gopal


Reply via email to