Hi Faraz,
Yes, the order can help with both write and scan performance in your case.
When the inserts are random (as you said the order of IDs is random), there
will be many rowsets that overlap in primary key bounds, which
maintenance manager needs to allocate resource to compact. And you will
Hao,
The order of my primary key is (ID, datetime). My query had 'WHERE' clause
for both these keys. How does the order exactly affect scan performance?
I think restarting the tablet server removed all previous records on scan
dashboard. I can't find any query that took too long to complete.
On
Hi Faraz,
What is the order of your primary key? Is it (datetime, ID) or (ID,
datatime)?
On the contrary, I suspect your scan performance got better for the same
query because compaction happened in between, and thus there were less
blocks to scan. Also would you mind sharing the screen shot of
Thanks a lot for the help, Hao.
Response Inline:
You can use tablet server web UI scans dashboard (/scans) to get a better
> understanding of the ongoing/past queries. The flag 'scan_history_count' is
> used to configure the size of the buffer. From there, you can get
> information such as the
Hi Faraz,
Answered inline below.
Best,
Hao
On Tue, Feb 12, 2019 at 6:59 AM Faraz Mateen wrote:
> Hi all,
>
> I am using spark to pull data from my single node testing kudu setup and
> publish it to kafka. However, my query time is not consistent.
>
> I am querying a table with around *1.1
Hi all,
I am using spark to pull data from my single node testing kudu setup and
publish it to kafka. However, my query time is not consistent.
I am querying a table with around *1.1 million *packets. Initially my query
was taking* 537 seconds to read 51042 records* from kudu and write them to