Agreed! On Wed, Jul 12, 2017 at 10:46 AM, Michael Stolz <[email protected]> wrote:
> I'm fine with imposing limits on queries from within our own tooling, but > we cannot impose arbitrary limits on queries that are performed by > application code. > > That would be a silent breaking change to existing behavior at any > customer who has large queries. There is no way to know by examining code > or queries if the query is supposed to return 10,000 rows, so only by > testing every query they have could they determine if the imposed limit > breaks the intent of the query. > > Silent breaking changes to public APIs are not acceptable. > > -- > Mike Stolz > Principal Engineer, GemFire Product Manager > Mobile: +1-631-835-4771 <(631)%20835-4771> > > On Wed, Jul 12, 2017 at 1:29 PM, [email protected] <[email protected]> > wrote: > >> We would like to avoid letting user accidentally issues a query that >> would yield large result set even if they are dumping the result into a >> file for performance reasons. If they want a large result set sent back by >> gfsh, they have to do so consciously by adding a large limit in the query >> themselves. >> >> >> >> -------- Original Message -------- >> Subject: Re: refactor query command >> From: Swapnil Bawaskar >> To: [email protected] >> CC: >> >> >> +1 >> One suggestion I would like to make is that if the user specifies that >> the query results should go to a file, we should not apply the limit clause >> on the server. >> >> On Tue, Jul 11, 2017 at 5:19 PM Jinmei Liao <[email protected]> wrote: >> >>> Basically, our reasoning is client-side pagination is not as useful as >>> people would think, you can either get all the results dumped to the >>> console, and use scroll bar to move back and forth, or dump it into a file, >>> and uses whatever piping mechanism supported by your environment. The >>> server side retrieves everything at once anyway and saves the entire result >>> set in the backend. It's not like we are saving any server side work here. >>> >>> On Tue, Jul 11, 2017 at 4:22 PM, Jinmei Liao <[email protected]> wrote: >>> >>>> Currently the way it's implementing the client-side pagination is >>>> convoluted and doubtfully useful. We are proposing to get rid of the >>>> client-side pagination and only have the server side impose a limit (and >>>> maybe implement pagination on the server side later on). >>>> >>>> The new behavior should look like this: >>>> >>>> gfsh> set APP_FETCH_SIZE 50; >>>> gfsh> query --query="select * from /A" // suppose entry size is 3 >>>> >>>> Result : true >>>> Limit : 50 >>>> Rows : 3 >>>> >>>> Result >>>> -------- >>>> value1 >>>> value2 >>>> value3 >>>> >>>> >>>> gfsh> query --query="select * from /A" // suppose entry size is 1000 >>>> >>>> Result : true >>>> Limit : 50 >>>> Rows : 50 >>>> >>>> Result >>>> -------- >>>> value1 >>>> ... >>>> value50 >>>> >>>> gfsh> query --query="select * from /A limit 100" // suppose entry size >>>> is 1000 >>>> Result : true >>>> Rows : 100 >>>> >>>> Result >>>> -------- >>>> value1 >>>> ... >>>> value100 >>>> >>>> >>>> gfsh> query --query="select * from /A limit 500" --file="output.txt" >>>> // suppose entry size is 1000 >>>> Result : true >>>> Rows : 500 >>>> >>>> Query results output to /var/tempFolder/output.txt >>>> >>>> (And the output.txt content to be: >>>> Result >>>> -------- >>>> value1 >>>> ... >>>> value500) >>>> >>>> >>>> Bear in mind that we are trying to get rid of client side pagination, >>>> so the --page-size or --limit option would not apply anymore. Only the >>>> limit inside the query will be honored by the server side. If they query >>>> does not have a limit clause, the server side will impose a limit (default >>>> to 100). The limit can only be explicitly overridden if user chooses to do >>>> so. So that user would not accidentally execute a query that would result >>>> in a large result set. >>>> >>>> Would this be sufficient to replace the client-side pagination? >>>> >>>> >>>> >>>> >>>> On Tue, Jul 11, 2017 at 2:26 PM, Anilkumar Gingade <[email protected] >>>> > wrote: >>>> >>>>> To make it clear, gfsh could print the query it sent to server in the >>>>> result summary (it shows if it got executed with the limit): >>>>> Query : >>>>> Result : true >>>>> startCount : 0 >>>>> endCount : 20 >>>>> Rows : 1 >>>>> >>>>> -Anil. >>>>> >>>>> >>>>> On Tue, Jul 11, 2017 at 12:48 PM, John Blum <[email protected]> wrote: >>>>> >>>>>> I think it might be worth differentiating the result "LIMIT" (as >>>>>> used in the OQL query statement like so... "SELECT * FROM /Region >>>>>> WHERE ... LIMIT 1000") from what is actually "streamed" back to >>>>>> *Gfsh* as the default (e.g. 100). >>>>>> >>>>>> Clearly sending all the results back is quite expensive depending on >>>>>> the number of results/LIMIT specified. Therefore, whatever "--option" >>>>>> is provided to the `query` command is a further reduction in what is >>>>>> actually streamed back to the client (e.g. *Gfsh*) initially, sort >>>>>> of like paging, therefore ... `gfsh> query --query="SELECT * FROM >>>>>> /Region WHERE ... LIMIT 1000" --page-size=25`... perhaps? >>>>>> >>>>>> Therefore, I think having 2 limits, as in OQL LIMIT and a --limit >>>>>> option would just be confusing to users. LIMIT like sort (ORDER BY) can >>>>>> only be effectively applied to the OQL as it determines what results the >>>>>> query actually returns. >>>>>> >>>>>> >>>>>> On Tue, Jul 11, 2017 at 11:24 AM, Anilkumar Gingade < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> >> Actually a really nice thing would be to put the pagination >>>>>>> feature into the OQL engine where it belongs. >>>>>>> +1 on this. >>>>>>> >>>>>>> >> if they query mode is interactive, it sends the first 20 >>>>>>> (page-size, not configurable) records. and user uses "n" to go to the >>>>>>> next >>>>>>> page, >>>>>>> >> once it hits the last page (showing all 1000 record or get to the >>>>>>> end of the result set), the command finishes. >>>>>>> >>>>>>> We could provide one more option to end user to quit getting to next >>>>>>> page and go-back to gfsh command for new commands (if its not there). >>>>>>> >>>>>>> I think providing multiple options to view large result set, is a >>>>>>> nice feature from tooling perspective (interactive result batching, >>>>>>> dumping >>>>>>> into an external file, etc...) >>>>>>> >>>>>>> >> It’s fairly common in query tooling to be able to set a result >>>>>>> set limit. >>>>>>> Yes...many of the interactive query tools allow pagination/batching >>>>>>> as part of the result display. >>>>>>> >>>>>>> >> gfsh> query --query='select * from /A limit 10' --limit=100 >>>>>>> We need to make sure that user can differentiate query commands from >>>>>>> options provided by tool. >>>>>>> >>>>>>> -Anil. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Jul 11, 2017 at 9:56 AM, William Markito Oliveira < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> The way I read this is: One is limiting on the server side, the >>>>>>>> other is limiting the client side. IOW within the query string is >>>>>>>> acting >>>>>>>> on server side. >>>>>>>> >>>>>>>> On Tue, Jul 11, 2017 at 11:19 AM, Jinmei Liao <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> what if user wants to do: >>>>>>>>> gfsh> query --query='select * from /A limit 10' --limit=100 >>>>>>>>> >>>>>>>>> What's the difference between put it inside the query string or >>>>>>>>> outside? I think eventually it's adding the limit clause to the query. >>>>>>>>> >>>>>>>>> On Tue, Jul 11, 2017 at 8:41 AM, Anthony Baker <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> It’s fairly common in query tooling to be able to set a result >>>>>>>>>> set limit. I would make this a first class option within gfsh >>>>>>>>>> instead of >>>>>>>>>> an environment variable. >>>>>>>>>> >>>>>>>>>> gfsh> set query-limit=1000 >>>>>>>>>> >>>>>>>>>> or >>>>>>>>>> >>>>>>>>>> gfsh> query --query='select * from /A’ --limit=1000 >>>>>>>>>> >>>>>>>>>> The result set limit is semantically different from specifying a >>>>>>>>>> LIMIT on the OQL query itself. >>>>>>>>>> >>>>>>>>>> Anthony >>>>>>>>>> >>>>>>>>>> On Jul 11, 2017, at 7:53 AM, William Markito Oliveira < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>> +1 for the combination of 1 and 2 as well. It would be >>>>>>>>>> interesting to explore at least a couple output formats, csv being >>>>>>>>>> one of >>>>>>>>>> the most common for people that wants to import or analyze the data >>>>>>>>>> using >>>>>>>>>> other tools. >>>>>>>>>> >>>>>>>>>> On Tue, Jul 11, 2017 at 8:31 AM, Michael Stolz <[email protected] >>>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>>> Actually a really nice thing would be to put the pagination >>>>>>>>>>> feature into the OQL engine where it belongs. Clients shouldn't >>>>>>>>>>> have to >>>>>>>>>>> implement pagination. >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Mike Stolz >>>>>>>>>>> Principal Engineer, GemFire Product Manager >>>>>>>>>>> Mobile: +1-631-835-4771 <(631)%20835-4771> >>>>>>>>>>> >>>>>>>>>>> On Tue, Jul 11, 2017 at 12:00 AM, Michael William Dodge < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> I prefer to redirect output to a file when there is any chance >>>>>>>>>>>> that the results might be huge. Thus I find the combination of #1 >>>>>>>>>>>> and #2 to >>>>>>>>>>>> be sufficient for me. >>>>>>>>>>>> >>>>>>>>>>>> Sarge >>>>>>>>>>>> >>>>>>>>>>>> > On 10 Jul, 2017, at 17:13, Jinmei Liao <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> > >>>>>>>>>>>> > Hi, all gfsh-users, >>>>>>>>>>>> > >>>>>>>>>>>> > In our refactor week, we are trying to refactor how >>>>>>>>>>>> multi-step command is implemented. The currently implementation is >>>>>>>>>>>> hard to >>>>>>>>>>>> understand to begin with. The implementation breaks the OO design >>>>>>>>>>>> principals in multiple ways. It's not thread-safe either. This is >>>>>>>>>>>> an >>>>>>>>>>>> internal command type, and and only our "query" command uses it. >>>>>>>>>>>> > >>>>>>>>>>>> > This is how our current "query" command works: >>>>>>>>>>>> > 1) user issues a "query --query='select * from /A'" command, >>>>>>>>>>>> > 2) server retrieves the first 1000 (fetch-size, not >>>>>>>>>>>> configurable) rows, >>>>>>>>>>>> > 3) if the query mode is NOT interactive, it sends back all >>>>>>>>>>>> the result at one. >>>>>>>>>>>> > 4) if they query mode is interactive, it sends the first 20 >>>>>>>>>>>> (page-size, not configurable) records. and user uses "n" to go to >>>>>>>>>>>> the next >>>>>>>>>>>> page, once it hits the last page (showing all 1000 record or get >>>>>>>>>>>> to the end >>>>>>>>>>>> of the result set), the command finishes. >>>>>>>>>>>> > >>>>>>>>>>>> > we would like to ask how useful is this interactive feature. >>>>>>>>>>>> Is it critical for you? Would the following simplification be >>>>>>>>>>>> sufficient? >>>>>>>>>>>> > >>>>>>>>>>>> > 1) query command always returns the entire fetch size. We can >>>>>>>>>>>> make it configurable through environment variables, default to be >>>>>>>>>>>> 100, and >>>>>>>>>>>> you can also reset it in each individual query command using "query >>>>>>>>>>>> --query='select * from /A limit 10' >>>>>>>>>>>> > >>>>>>>>>>>> > 2) provide an option for you to specify a file where we can >>>>>>>>>>>> dump all the query result in and you can use shell pagination to >>>>>>>>>>>> list the >>>>>>>>>>>> content of the file. >>>>>>>>>>>> > >>>>>>>>>>>> > Please let us know your thoughts/comments. Thanks! >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > -- >>>>>>>>>>>> > Cheers >>>>>>>>>>>> > >>>>>>>>>>>> > Jinmei >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> ~/William >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Cheers >>>>>>>>> >>>>>>>>> Jinmei >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ~/William >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> -John >>>>>> john.blum10101 (skype) >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Cheers >>>> >>>> Jinmei >>>> >>> >>> >>> >>> -- >>> Cheers >>> >>> Jinmei >>> >> > -- Cheers Jinmei
