Re: refactor query command

Jinmei Liao Wed, 12 Jul 2017 12:31:32 -0700

Agreed!

On Wed, Jul 12, 2017 at 10:46 AM, Michael Stolz <[email protected]> wrote:


> I'm fine with imposing limits on queries from within our own tooling, but
> we cannot impose arbitrary limits on queries that are performed by
> application code.
>
> That would be a silent breaking change to existing behavior at any
> customer who has large queries. There is no way to know by examining code
> or queries if the query is supposed to return 10,000 rows, so only by
> testing every query they have could they determine if the imposed limit
> breaks the intent of the query.
>
> Silent breaking changes to public APIs are not acceptable.
>
> --
> Mike Stolz
> Principal Engineer, GemFire Product Manager
> Mobile: +1-631-835-4771 <(631)%20835-4771>
>
> On Wed, Jul 12, 2017 at 1:29 PM, [email protected] <[email protected]>
> wrote:
>
>> We would like to avoid letting user accidentally issues a query that
>> would yield large result set even if they are dumping the result into a
>> file for performance reasons. If they want a large result set sent back by
>> gfsh, they have to do so consciously by adding a large limit in the query
>> themselves.
>>
>>
>>
>> -------- Original Message --------
>> Subject: Re: refactor query command
>> From: Swapnil Bawaskar
>> To: [email protected]
>> CC:
>>
>>
>> +1
>> One suggestion I would like to make is that if the user specifies that
>> the query results should go to a file, we should not apply the limit clause
>> on the server.
>>
>> On Tue, Jul 11, 2017 at 5:19 PM Jinmei Liao <[email protected]> wrote:
>>
>>> Basically, our reasoning is client-side pagination is not as useful as
>>> people would think, you can either get all the results dumped to the
>>> console, and use scroll bar to move back and forth, or dump it into a file,
>>> and uses whatever piping mechanism supported by your environment. The
>>> server side retrieves everything at once anyway and saves the entire result
>>> set in the backend. It's not like we are saving any server side work here.
>>>
>>> On Tue, Jul 11, 2017 at 4:22 PM, Jinmei Liao <[email protected]> wrote:
>>>
>>>> Currently the way it's implementing the client-side pagination is
>>>> convoluted and doubtfully useful. We are proposing to get rid of the
>>>> client-side pagination and only have the server side impose a limit (and
>>>> maybe implement pagination on the server side later on).
>>>>
>>>> The new behavior should look like this:
>>>>
>>>> gfsh> set  APP_FETCH_SIZE  50;
>>>> gfsh> query --query="select * from /A"  // suppose entry size is 3
>>>>
>>>> Result : true
>>>> Limit  : 50
>>>> Rows   : 3
>>>>
>>>> Result
>>>> --------
>>>> value1
>>>> value2
>>>> value3
>>>>
>>>>
>>>> gfsh> query --query="select * from /A"  // suppose entry size is 1000
>>>>
>>>> Result : true
>>>> Limit  : 50
>>>> Rows   : 50
>>>>
>>>> Result
>>>> --------
>>>> value1
>>>> ...
>>>> value50
>>>>
>>>> gfsh> query --query="select * from /A limit 100"  // suppose entry size
>>>> is 1000
>>>> Result : true
>>>> Rows   : 100
>>>>
>>>> Result
>>>> --------
>>>> value1
>>>> ...
>>>> value100
>>>>
>>>>
>>>> gfsh> query --query="select * from /A limit 500" --file="output.txt"
>>>>  // suppose entry size is 1000
>>>> Result : true
>>>> Rows   : 500
>>>>
>>>> Query results output to /var/tempFolder/output.txt
>>>>
>>>> (And the output.txt content to be:
>>>> Result
>>>> --------
>>>> value1
>>>> ...
>>>> value500)
>>>>
>>>>
>>>> Bear in mind that we are trying to get rid of client side pagination,
>>>> so the --page-size or --limit option would not apply anymore. Only the
>>>> limit inside the query will be honored by the server side. If they query
>>>> does not have a limit clause, the server side will impose a limit (default
>>>> to 100). The limit can only be explicitly overridden if user chooses to do
>>>> so. So that user would not accidentally execute a query that would result
>>>> in a large result set.
>>>>
>>>> Would this be sufficient to replace the client-side pagination?
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jul 11, 2017 at 2:26 PM, Anilkumar Gingade <[email protected]
>>>> > wrote:
>>>>
>>>>> To make it clear, gfsh could print the query it sent to server in the
>>>>> result summary (it shows if it got executed with the limit):
>>>>> Query     :
>>>>> Result     : true
>>>>> startCount : 0
>>>>> endCount   : 20
>>>>> Rows       : 1
>>>>>
>>>>> -Anil.
>>>>>
>>>>>
>>>>> On Tue, Jul 11, 2017 at 12:48 PM, John Blum <[email protected]> wrote:
>>>>>
>>>>>> I think it might be worth differentiating the result "LIMIT" (as
>>>>>> used in the OQL query statement like so... "SELECT * FROM /Region
>>>>>> WHERE ... LIMIT 1000")  from what is actually "streamed" back to
>>>>>> *Gfsh* as the default (e.g. 100).
>>>>>>
>>>>>> Clearly sending all the results back is quite expensive depending on
>>>>>> the number of results/LIMIT specified.  Therefore, whatever "--option"
>>>>>> is provided to the `query` command is a further reduction in what is
>>>>>> actually streamed back to the client (e.g. *Gfsh*) initially, sort
>>>>>> of like paging, therefore ... `gfsh> query --query="SELECT * FROM
>>>>>> /Region WHERE ... LIMIT 1000" --page-size=25`... perhaps?
>>>>>>
>>>>>> Therefore, I think having 2 limits, as in OQL LIMIT and a --limit
>>>>>> option would just be confusing to users.  LIMIT like sort (ORDER BY) can
>>>>>> only be effectively applied to the OQL as it determines what results the
>>>>>> query actually returns.
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 11, 2017 at 11:24 AM, Anilkumar Gingade <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> >> Actually a really nice thing would be to put the pagination
>>>>>>> feature into the OQL engine where it belongs.
>>>>>>> +1 on this.
>>>>>>>
>>>>>>> >> if they query mode is interactive, it sends the first 20
>>>>>>> (page-size, not configurable) records. and user uses "n" to go to the 
>>>>>>> next
>>>>>>> page,
>>>>>>> >> once it hits the last page (showing all 1000 record or get to the
>>>>>>> end of the result set), the command finishes.
>>>>>>>
>>>>>>> We could provide one more option to end user to quit getting to next
>>>>>>> page and go-back to gfsh command for new commands (if its not there).
>>>>>>>
>>>>>>> I think providing multiple options to view large result set, is a
>>>>>>> nice feature from tooling perspective (interactive result batching, 
>>>>>>> dumping
>>>>>>> into an external file, etc...)
>>>>>>>
>>>>>>> >> It’s fairly common in query tooling to be able to set a result
>>>>>>> set limit.
>>>>>>> Yes...many of the interactive query tools allow pagination/batching
>>>>>>> as part of the result display.
>>>>>>>
>>>>>>> >> gfsh> query --query='select * from /A limit 10' --limit=100
>>>>>>> We need to make sure that user can differentiate query commands from
>>>>>>> options provided by tool.
>>>>>>>
>>>>>>> -Anil.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jul 11, 2017 at 9:56 AM, William Markito Oliveira <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> The way I read this is: One is limiting on the server side, the
>>>>>>>> other is limiting the client side.  IOW within the query string is 
>>>>>>>> acting
>>>>>>>> on server side.
>>>>>>>>
>>>>>>>> On Tue, Jul 11, 2017 at 11:19 AM, Jinmei Liao <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> what if user wants to do:
>>>>>>>>> gfsh> query --query='select * from /A limit 10' --limit=100
>>>>>>>>>
>>>>>>>>> What's the difference between put it inside the query string or
>>>>>>>>> outside? I think eventually it's adding the limit clause to the query.
>>>>>>>>>
>>>>>>>>> On Tue, Jul 11, 2017 at 8:41 AM, Anthony Baker <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> It’s fairly common in query tooling to be able to set a result
>>>>>>>>>> set limit.  I would make this a first class option within gfsh 
>>>>>>>>>> instead of
>>>>>>>>>> an environment variable.
>>>>>>>>>>
>>>>>>>>>> gfsh> set query-limit=1000
>>>>>>>>>>
>>>>>>>>>> or
>>>>>>>>>>
>>>>>>>>>> gfsh> query --query='select * from /A’ --limit=1000
>>>>>>>>>>
>>>>>>>>>> The result set limit is semantically different from specifying a
>>>>>>>>>> LIMIT on the OQL query itself.
>>>>>>>>>>
>>>>>>>>>> Anthony
>>>>>>>>>>
>>>>>>>>>> On Jul 11, 2017, at 7:53 AM, William Markito Oliveira <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>> +1 for the combination of 1 and 2 as well.  It would be
>>>>>>>>>> interesting to explore at least a couple output formats, csv being 
>>>>>>>>>> one of
>>>>>>>>>> the most common for people that wants to import or analyze the data 
>>>>>>>>>> using
>>>>>>>>>> other tools.
>>>>>>>>>>
>>>>>>>>>> On Tue, Jul 11, 2017 at 8:31 AM, Michael Stolz <[email protected]
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Actually a really nice thing would be to put the pagination
>>>>>>>>>>> feature into the OQL engine where it belongs. Clients shouldn't 
>>>>>>>>>>> have to
>>>>>>>>>>> implement pagination.
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Mike Stolz
>>>>>>>>>>> Principal Engineer, GemFire Product Manager
>>>>>>>>>>> Mobile: +1-631-835-4771 <(631)%20835-4771>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jul 11, 2017 at 12:00 AM, Michael William Dodge <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I prefer to redirect output to a file when there is any chance
>>>>>>>>>>>> that the results might be huge. Thus I find the combination of #1 
>>>>>>>>>>>> and #2 to
>>>>>>>>>>>> be sufficient for me.
>>>>>>>>>>>>
>>>>>>>>>>>> Sarge
>>>>>>>>>>>>
>>>>>>>>>>>> > On 10 Jul, 2017, at 17:13, Jinmei Liao <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> >
>>>>>>>>>>>> > Hi, all gfsh-users,
>>>>>>>>>>>> >
>>>>>>>>>>>> > In our refactor week, we are trying to refactor how
>>>>>>>>>>>> multi-step command is implemented. The currently implementation is 
>>>>>>>>>>>> hard to
>>>>>>>>>>>> understand to begin with. The implementation breaks the OO design
>>>>>>>>>>>> principals in multiple ways. It's not thread-safe either. This is 
>>>>>>>>>>>> an
>>>>>>>>>>>> internal command type, and and only our "query" command uses it.
>>>>>>>>>>>> >
>>>>>>>>>>>> > This is how our current "query" command works:
>>>>>>>>>>>> > 1) user issues a "query --query='select * from /A'" command,
>>>>>>>>>>>> > 2) server retrieves the first 1000 (fetch-size, not
>>>>>>>>>>>> configurable) rows,
>>>>>>>>>>>> > 3) if the query mode is NOT interactive, it sends back all
>>>>>>>>>>>> the result at one.
>>>>>>>>>>>> > 4) if they query mode is interactive, it sends the first 20
>>>>>>>>>>>> (page-size, not configurable) records. and user uses "n" to go to 
>>>>>>>>>>>> the next
>>>>>>>>>>>> page, once it hits the last page (showing all 1000 record or get 
>>>>>>>>>>>> to the end
>>>>>>>>>>>> of the result set), the command finishes.
>>>>>>>>>>>> >
>>>>>>>>>>>> > we would like to ask how useful is this interactive feature.
>>>>>>>>>>>> Is it critical for you? Would the following simplification be 
>>>>>>>>>>>> sufficient?
>>>>>>>>>>>> >
>>>>>>>>>>>> > 1) query command always returns the entire fetch size. We can
>>>>>>>>>>>> make it configurable through environment variables, default to be 
>>>>>>>>>>>> 100, and
>>>>>>>>>>>> you can also reset it in each individual query command using "query
>>>>>>>>>>>> --query='select * from /A limit 10'
>>>>>>>>>>>> >
>>>>>>>>>>>> > 2) provide an option for you to specify a file where we can
>>>>>>>>>>>> dump all the query result in and you can use shell pagination to 
>>>>>>>>>>>> list the
>>>>>>>>>>>> content of the file.
>>>>>>>>>>>> >
>>>>>>>>>>>> > Please let us know your thoughts/comments. Thanks!
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> > --
>>>>>>>>>>>> > Cheers
>>>>>>>>>>>> >
>>>>>>>>>>>> > Jinmei
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> ~/William
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Cheers
>>>>>>>>>
>>>>>>>>> Jinmei
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> ~/William
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> -John
>>>>>> john.blum10101 (skype)
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Cheers
>>>>
>>>> Jinmei
>>>>
>>>
>>>
>>>
>>> --
>>> Cheers
>>>
>>> Jinmei
>>>
>>
>


-- 
Cheers

Jinmei

Re: refactor query command

Reply via email to