Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-07-10 Thread Jacek Lewandowski
Given what was said, I propose rephrasing this functionality to limit the memory used to execute a query. We will not expose the page size measured in bytes to the client. Instead, an upper limit will be a guardrail so that we won't fetch more data. Aggregation query with grouping is a special

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-13 Thread Benjamin Lerer
> > So my other question - for aggregation with the "group by" clause, we > return an aggregated row which is computed from a group of rows - with my > current implementation, it is approximated by counting the size of the > largest row in that group - I think it is the safest and simplest >

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Jacek Lewandowski
Josh, that answers my question exactly; thank you. I will not implement limiting the result set in CQL (that is, by LIMIT clause) and stay with just paging. Whether the page size is defined in bytes or rows can be determined by a flag - there are many unused bits for that. So my other question -

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Josh McKenzie
> As long as it is valid in the paging protocol to return a short page, but > still say “there are more pages”, I think that is fine to do that. Thankfully the v3-v5 spec all make it clear that clients need to respect what the server has to say about there being more pages:

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Jeremiah Jordan
As long as it is valid in the paging protocol to return a short page, but still say “there are more pages”, I think that is fine to do that. For an actual LIMIT that is part of the user query, I think the server must always have returned all data that fits into the LIMIT when all pages have been

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Josh McKenzie
Yeah, my bad. I have paging on the brain. Seriously. I can't think of a use-case in which a LIMIT based on # bytes makes sense from a user perspective. On Mon, Jun 12, 2023, at 1:35 PM, Jeff Jirsa wrote: > > > On Mon, Jun 12, 2023 at 9:50 AM Benjamin Lerer wrote: >>> If you have rows that

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Jeff Jirsa
On Mon, Jun 12, 2023 at 9:50 AM Benjamin Lerer wrote: > If you have rows that vary significantly in their size, your latencies >> could end up being pretty unpredictable using a LIMIT BY . Being >> able to specify a limit by bytes at the driver / API level would allow app >> devs to get more

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Benjamin Lerer
> > If you have rows that vary significantly in their size, your latencies > could end up being pretty unpredictable using a LIMIT BY . Being > able to specify a limit by bytes at the driver / API level would allow app > devs to get more deterministic results out of their interaction w/the DB if >

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Josh McKenzie
> I do not have in mind a scenario where it could be useful to specify a LIMIT > in bytes. The LIMIT clause is usually used when you know how many rows you > wish to display or use. Unless somebody has a useful scenario in mind I do > not think that there is a need for that feature. If you have

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Jacek Lewandowski
Yes, LIMIT BY provided by the user in CQL does not make much sense to me either pon., 12 cze 2023 o 11:20 Benedict napisał(a): > I agree that this is more suitable as a paging option, and not as a CQL > LIMIT option. > > If it were to be a CQL LIMIT option though, then it should be accurate >

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Jacek Lewandowski
Limiting the amount of returned data in bytes in addition to the row limit could be helpful when applied transparently by the server as a kind of guardrail. The server could fail the query if it exceeds some administratively imposed limit on the configuration level, WDYT? pon., 12 cze 2023 o

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Benedict
I agree that this is more suitable as a paging option, and not as a CQL LIMIT option. If it were to be a CQL LIMIT option though, then it should be accurate regarding result set IMO; there shouldn’t be any further results that could have been returned within the LIMIT.On 12 Jun 2023, at 10:16,

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Benjamin Lerer
Thanks Jacek for raising that discussion. I do not have in mind a scenario where it could be useful to specify a LIMIT in bytes. The LIMIT clause is usually used when you know how many rows you wish to display or use. Unless somebody has a useful scenario in mind I do not think that there is a

[DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Jacek Lewandowski
Hi, I was working on limiting query results by their size expressed in bytes, and some questions arose that I'd like to bring to the mailing list. The semantics of queries (without aggregation) - data limits are applied on the raw data returned from replicas - while it works fine for the row