Re: sparkSQL thread safe?

2014-07-13 Thread Reynold Xin
Ian, The LZFOutputStream's large byte buffer is sort of annoying. It is much smaller if you use the Snappy one. The downside of the Snappy one is slightly less compression (I've seen 10 - 20% larger sizes). If we can find a compression scheme implementation that doesn't do very large buffers,

Re: on shark, is tachyon less efficient than memory_only cache strategy ?

2014-07-13 Thread Haoyuan Li
Qingyang, Are you asking Spark or Shark (The first email was Shark, the last email was Spark.)? Best, Haoyuan On Wed, Jul 9, 2014 at 7:40 PM, qingyang li liqingyang1...@gmail.com wrote: could i set some cache policy to let spark load data from tachyon only one time for all sql query? for

Re: EC2 clusters ready in launch time + 30 seconds

2014-07-13 Thread Shivaram Venkataraman
It should be possible to improve cluster launch time if we are careful about what commands we run during setup. One way to do this would be to walk down the list of things we do for cluster initialization and see if there is anything we can do make things faster. Unfortunately this might be pretty

Re: on shark, is tachyon less efficient than memory_only cache strategy ?

2014-07-13 Thread qingyang li
Shark, thanks for replying. Let's me clear my question again. -- i create a table using create table xxx1 tblproperties(shark.cache=tachyon) as select * from xxx2 when excuting some sql (for example , select * from xxx1) using shark, shark will read