But tachyon is inherently unreliable with an LRU eviction policy which removes 
blocks from Tachyon (i.e final tier eviction).


I found tachyon change file status to pinned. When a file is pinned, its blocks 
will not be evicted. 


[email protected]
 
From: Gopal Vijayaraghavan
Date: 2015-10-28 11:16
To: user
CC: [email protected]
Subject: Re: Allow session-level temp-tables store in tachyon?
 
> I have a idea  to put temp table in tachyon to speed query.
> I found the jira which is similar to my idea.
> https://issues.apache.org/jira/browse/HIVE-7313
 
It is an easy thing from a logical stand point - temporary tables can be
hosted in any hadoop fs location.
 
create temporary table x(x int) location 'tachyon://tmp/_tmp.db/x';
 
 
But tachyon is inherently unreliable with an LRU eviction policy which
removes blocks from Tachyon (i.e final tier eviction).
 
You have to figure out a way to recompute part of a temp-table when
Tachyon throws away a block.
 
Or are you going to pin everything into memory and potentially fill it
with junk temp-tables, which is probably a bad idea.
 
The patch you're looking at is only half of what Hive does.
 
The HDFS in-mem implementation massively improves the write throughput
(since we can write faster than any disk into it), this is flushed to disk
as a 2nd replica in a few seconds asynchronously.
 
The LLAP in-mem implementation handles the life cycle of the table
in-memory while it's being processed, caching only a fraction of the table
(like only 1 column) into memory or just caching the ORC bloom-filter
indexes into memory instead of the whole file. A filter clause is
evaluated against this bloom filter before it produces a cache miss - if
the bloom filter says "don't read this", it just skips the data read
entirely.
 
When LLAP evicts, it is only evicting the 3rd replica of the data-set, not
the source of truth (which is the disk replica, which has checksums).
 
Cheers,
Gopal
 
 

Reply via email to