Re: Inconsistent read performance with Spark

2019-02-13 Thread Hao Hao
Hi Faraz, What is the order of your primary key? Is it (datetime, ID) or (ID, datatime)? On the contrary, I suspect your scan performance got better for the same query because compaction happened in between, and thus there were less blocks to scan. Also would you mind sharing the screen shot of

Re: Changing number of Kudu worker threads

2019-02-13 Thread Jean-Daniel Cryans
Some comments on the original problem: "we need to process 1000s of operations per second and noticed that our Kudu 1.5 cluster was only using 10 threads while our application spins up 50 clients/threads" I wouldn't directly infer that 20 threads won't be enough to match your needs. The time it

Re: Inconsistent read performance with Spark

2019-02-13 Thread Faraz Mateen
Thanks a lot for the help, Hao. Response Inline: You can use tablet server web UI scans dashboard (/scans) to get a better > understanding of the ongoing/past queries. The flag 'scan_history_count' is > used to configure the size of the buffer. From there, you can get > information such as the