Re: refactor query command

Michael Stolz Wed, 12 Jul 2017 10:54:41 -0700

I'm fine with imposing limits on queries from within our own tooling, but
we cannot impose arbitrary limits on queries that are performed by
application code.


That would be a silent breaking change to existing behavior at any customer
who has large queries. There is no way to know by examining code or queries
if the query is supposed to return 10,000 rows, so only by testing every
query they have could they determine if the imposed limit breaks the intent
of the query.

Silent breaking changes to public APIs are not acceptable.

--
Mike Stolz
Principal Engineer, GemFire Product Manager
Mobile: +1-631-835-4771

On Wed, Jul 12, 2017 at 1:29 PM, [email protected] <[email protected]>
wrote:

> We would like to avoid letting user accidentally issues a query that would
> yield large result set even if they are dumping the result into a file for
> performance reasons. If they want a large result set sent back by gfsh,
> they have to do so consciously by adding a large limit in the query
> themselves.
>
>
>
> -------- Original Message --------
> Subject: Re: refactor query command
> From: Swapnil Bawaskar
> To: [email protected]
> CC:
>
>
> +1
> One suggestion I would like to make is that if the user specifies that the
> query results should go to a file, we should not apply the limit clause on
> the server.
>
> On Tue, Jul 11, 2017 at 5:19 PM Jinmei Liao <[email protected]> wrote:
>
>> Basically, our reasoning is client-side pagination is not as useful as
>> people would think, you can either get all the results dumped to the
>> console, and use scroll bar to move back and forth, or dump it into a file,
>> and uses whatever piping mechanism supported by your environment. The
>> server side retrieves everything at once anyway and saves the entire result
>> set in the backend. It's not like we are saving any server side work here.
>>
>> On Tue, Jul 11, 2017 at 4:22 PM, Jinmei Liao <[email protected]> wrote:
>>
>>> Currently the way it's implementing the client-side pagination is
>>> convoluted and doubtfully useful. We are proposing to get rid of the
>>> client-side pagination and only have the server side impose a limit (and
>>> maybe implement pagination on the server side later on).
>>>
>>> The new behavior should look like this:
>>>
>>> gfsh> set  APP_FETCH_SIZE  50;
>>> gfsh> query --query="select * from /A"  // suppose entry size is 3
>>>
>>> Result : true
>>> Limit  : 50
>>> Rows   : 3
>>>
>>> Result
>>> --------
>>> value1
>>> value2
>>> value3
>>>
>>>
>>> gfsh> query --query="select * from /A"  // suppose entry size is 1000
>>>
>>> Result : true
>>> Limit  : 50
>>> Rows   : 50
>>>
>>> Result
>>> --------
>>> value1
>>> ...
>>> value50
>>>
>>> gfsh> query --query="select * from /A limit 100"  // suppose entry size
>>> is 1000
>>> Result : true
>>> Rows   : 100
>>>
>>> Result
>>> --------
>>> value1
>>> ...
>>> value100
>>>
>>>
>>> gfsh> query --query="select * from /A limit 500" --file="output.txt"  //
>>> suppose entry size is 1000
>>> Result : true
>>> Rows   : 500
>>>
>>> Query results output to /var/tempFolder/output.txt
>>>
>>> (And the output.txt content to be:
>>> Result
>>> --------
>>> value1
>>> ...
>>> value500)
>>>
>>>
>>> Bear in mind that we are trying to get rid of client side pagination, so
>>> the --page-size or --limit option would not apply anymore. Only the limit
>>> inside the query will be honored by the server side. If they query does not
>>> have a limit clause, the server side will impose a limit (default to 100).
>>> The limit can only be explicitly overridden if user chooses to do so. So
>>> that user would not accidentally execute a query that would result in a
>>> large result set.
>>>
>>> Would this be sufficient to replace the client-side pagination?
>>>
>>>
>>>
>>>
>>> On Tue, Jul 11, 2017 at 2:26 PM, Anilkumar Gingade <[email protected]>
>>> wrote:
>>>
>>>> To make it clear, gfsh could print the query it sent to server in the
>>>> result summary (it shows if it got executed with the limit):
>>>> Query     :
>>>> Result     : true
>>>> startCount : 0
>>>> endCount   : 20
>>>> Rows       : 1
>>>>
>>>> -Anil.
>>>>
>>>>
>>>> On Tue, Jul 11, 2017 at 12:48 PM, John Blum <[email protected]> wrote:
>>>>
>>>>> I think it might be worth differentiating the result "LIMIT" (as used
>>>>> in the OQL query statement like so... "SELECT * FROM /Region WHERE
>>>>> ... LIMIT 1000")  from what is actually "streamed" back to *Gfsh* as
>>>>> the default (e.g. 100).
>>>>>
>>>>> Clearly sending all the results back is quite expensive depending on
>>>>> the number of results/LIMIT specified.  Therefore, whatever "--option"
>>>>> is provided to the `query` command is a further reduction in what is
>>>>> actually streamed back to the client (e.g. *Gfsh*) initially, sort of
>>>>> like paging, therefore ... `gfsh> query --query="SELECT * FROM
>>>>> /Region WHERE ... LIMIT 1000" --page-size=25`... perhaps?
>>>>>
>>>>> Therefore, I think having 2 limits, as in OQL LIMIT and a --limit
>>>>> option would just be confusing to users.  LIMIT like sort (ORDER BY) can
>>>>> only be effectively applied to the OQL as it determines what results the
>>>>> query actually returns.
>>>>>
>>>>>
>>>>> On Tue, Jul 11, 2017 at 11:24 AM, Anilkumar Gingade <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> >> Actually a really nice thing would be to put the pagination
>>>>>> feature into the OQL engine where it belongs.
>>>>>> +1 on this.
>>>>>>
>>>>>> >> if they query mode is interactive, it sends the first 20
>>>>>> (page-size, not configurable) records. and user uses "n" to go to the 
>>>>>> next
>>>>>> page,
>>>>>> >> once it hits the last page (showing all 1000 record or get to the
>>>>>> end of the result set), the command finishes.
>>>>>>
>>>>>> We could provide one more option to end user to quit getting to next
>>>>>> page and go-back to gfsh command for new commands (if its not there).
>>>>>>
>>>>>> I think providing multiple options to view large result set, is a
>>>>>> nice feature from tooling perspective (interactive result batching, 
>>>>>> dumping
>>>>>> into an external file, etc...)
>>>>>>
>>>>>> >> It’s fairly common in query tooling to be able to set a result
>>>>>> set limit.
>>>>>> Yes...many of the interactive query tools allow pagination/batching
>>>>>> as part of the result display.
>>>>>>
>>>>>> >> gfsh> query --query='select * from /A limit 10' --limit=100
>>>>>> We need to make sure that user can differentiate query commands from
>>>>>> options provided by tool.
>>>>>>
>>>>>> -Anil.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 11, 2017 at 9:56 AM, William Markito Oliveira <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> The way I read this is: One is limiting on the server side, the
>>>>>>> other is limiting the client side.  IOW within the query string is 
>>>>>>> acting
>>>>>>> on server side.
>>>>>>>
>>>>>>> On Tue, Jul 11, 2017 at 11:19 AM, Jinmei Liao <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> what if user wants to do:
>>>>>>>> gfsh> query --query='select * from /A limit 10' --limit=100
>>>>>>>>
>>>>>>>> What's the difference between put it inside the query string or
>>>>>>>> outside? I think eventually it's adding the limit clause to the query.
>>>>>>>>
>>>>>>>> On Tue, Jul 11, 2017 at 8:41 AM, Anthony Baker <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> It’s fairly common in query tooling to be able to set a result set
>>>>>>>>> limit.  I would make this a first class option within gfsh instead of 
>>>>>>>>> an
>>>>>>>>> environment variable.
>>>>>>>>>
>>>>>>>>> gfsh> set query-limit=1000
>>>>>>>>>
>>>>>>>>> or
>>>>>>>>>
>>>>>>>>> gfsh> query --query='select * from /A’ --limit=1000
>>>>>>>>>
>>>>>>>>> The result set limit is semantically different from specifying a
>>>>>>>>> LIMIT on the OQL query itself.
>>>>>>>>>
>>>>>>>>> Anthony
>>>>>>>>>
>>>>>>>>> On Jul 11, 2017, at 7:53 AM, William Markito Oliveira <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>> +1 for the combination of 1 and 2 as well.  It would be
>>>>>>>>> interesting to explore at least a couple output formats, csv being 
>>>>>>>>> one of
>>>>>>>>> the most common for people that wants to import or analyze the data 
>>>>>>>>> using
>>>>>>>>> other tools.
>>>>>>>>>
>>>>>>>>> On Tue, Jul 11, 2017 at 8:31 AM, Michael Stolz <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Actually a really nice thing would be to put the pagination
>>>>>>>>>> feature into the OQL engine where it belongs. Clients shouldn't have 
>>>>>>>>>> to
>>>>>>>>>> implement pagination.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Mike Stolz
>>>>>>>>>> Principal Engineer, GemFire Product Manager
>>>>>>>>>> Mobile: +1-631-835-4771 <(631)%20835-4771>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jul 11, 2017 at 12:00 AM, Michael William Dodge <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> I prefer to redirect output to a file when there is any chance
>>>>>>>>>>> that the results might be huge. Thus I find the combination of #1 
>>>>>>>>>>> and #2 to
>>>>>>>>>>> be sufficient for me.
>>>>>>>>>>>
>>>>>>>>>>> Sarge
>>>>>>>>>>>
>>>>>>>>>>> > On 10 Jul, 2017, at 17:13, Jinmei Liao <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>> >
>>>>>>>>>>> > Hi, all gfsh-users,
>>>>>>>>>>> >
>>>>>>>>>>> > In our refactor week, we are trying to refactor how multi-step
>>>>>>>>>>> command is implemented. The currently implementation is hard to 
>>>>>>>>>>> understand
>>>>>>>>>>> to begin with. The implementation breaks the OO design principals in
>>>>>>>>>>> multiple ways. It's not thread-safe either. This is an internal 
>>>>>>>>>>> command
>>>>>>>>>>> type, and and only our "query" command uses it.
>>>>>>>>>>> >
>>>>>>>>>>> > This is how our current "query" command works:
>>>>>>>>>>> > 1) user issues a "query --query='select * from /A'" command,
>>>>>>>>>>> > 2) server retrieves the first 1000 (fetch-size, not
>>>>>>>>>>> configurable) rows,
>>>>>>>>>>> > 3) if the query mode is NOT interactive, it sends back all the
>>>>>>>>>>> result at one.
>>>>>>>>>>> > 4) if they query mode is interactive, it sends the first 20
>>>>>>>>>>> (page-size, not configurable) records. and user uses "n" to go to 
>>>>>>>>>>> the next
>>>>>>>>>>> page, once it hits the last page (showing all 1000 record or get to 
>>>>>>>>>>> the end
>>>>>>>>>>> of the result set), the command finishes.
>>>>>>>>>>> >
>>>>>>>>>>> > we would like to ask how useful is this interactive feature.
>>>>>>>>>>> Is it critical for you? Would the following simplification be 
>>>>>>>>>>> sufficient?
>>>>>>>>>>> >
>>>>>>>>>>> > 1) query command always returns the entire fetch size. We can
>>>>>>>>>>> make it configurable through environment variables, default to be 
>>>>>>>>>>> 100, and
>>>>>>>>>>> you can also reset it in each individual query command using "query
>>>>>>>>>>> --query='select * from /A limit 10'
>>>>>>>>>>> >
>>>>>>>>>>> > 2) provide an option for you to specify a file where we can
>>>>>>>>>>> dump all the query result in and you can use shell pagination to 
>>>>>>>>>>> list the
>>>>>>>>>>> content of the file.
>>>>>>>>>>> >
>>>>>>>>>>> > Please let us know your thoughts/comments. Thanks!
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > --
>>>>>>>>>>> > Cheers
>>>>>>>>>>> >
>>>>>>>>>>> > Jinmei
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> ~/William
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Cheers
>>>>>>>>
>>>>>>>> Jinmei
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ~/William
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> -John
>>>>> john.blum10101 (skype)
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Cheers
>>>
>>> Jinmei
>>>
>>
>>
>>
>> --
>> Cheers
>>
>> Jinmei
>>
>

Re: refactor query command

Reply via email to