That's my interpretation. On Mon, May 2, 2016 at 9:45 AM, Buntu Dev <buntu...@gmail.com> wrote:
> Thanks Ted, I thought the avg. block size was already low and less than > the usual 128mb. If I need to reduce it further via parquet.block.size, it > would mean an increase in the number of blocks and that should increase the > number of tasks/executors. Is that the correct way to interpret this? > > On Mon, May 2, 2016 at 6:21 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Please consider decreasing block size. >> >> Thanks >> >> > On May 1, 2016, at 9:19 PM, Buntu Dev <buntu...@gmail.com> wrote: >> > >> > I got a 10g limitation on the executors and operating on parquet >> dataset with block size 70M with 200 blocks. I keep hitting the memory >> limits when doing a 'select * from t1 order by c1 limit 1000000' (ie, 1M). >> It works if I limit to say 100k. What are the options to save a large >> dataset without running into memory issues? >> > >> > Thanks! >> > >