Re: refactor query command

John Blum Wed, 12 Jul 2017 11:15:05 -0700

Agreed!

What I explained and meant before was... if a user wants to "limit" the
results of the query, then they should do so "explicitly" by using the
"LIMIT" OQL keyword within the query itself.  This is not difficult to do...


gfsh> query -query="SELECT * FROM /Region WHERE ... LIMIT 1000"


There should not be some superficial *Gfsh* query command option, like '
--limit' or a System property (ugh! We really need to get away from this
System property nonsense).

Imagine for a moment the user wants to run...

SELECT count(*) FROM /Region

And we impose a default LIMIT of 1000, or 100.  Then what?

No.  It is simple/common enough in any Query language (SQL alike) without
adding arbitrary options to the `query` command in *Gfsh* to limit the
results of a query if that is what the user wants. Besides, there is
nothing preventing a user from circumventing the arbitrary/default limit
anyway, by saying...

SELECT * FROM /Region WHERE ... LIMIT 2147483647

or...

gfsh> query -query=".." --limit=2147483647

I believe our users are smart enough and conscientious to the fact that 'SELECT
* FROM /Region' (without a predicate) is not a smart query.

However, I do think there is value in not streaming back the entire result
set to a client "tool" (e.g. *Gfsh*, *Pulse*, etc).

Actual GemFire cache client applications should not affected by anything a
tool does unless it is applied to the system itself as a whole (like
Cluster Config).

$0.02

-j



On Wed, Jul 12, 2017 at 10:46 AM, Michael Stolz <[email protected]> wrote:

> I'm fine with imposing limits on queries from within our own tooling, but
> we cannot impose arbitrary limits on queries that are performed by
> application code.
>
> That would be a silent breaking change to existing behavior at any
> customer who has large queries. There is no way to know by examining code
> or queries if the query is supposed to return 10,000 rows, so only by
> testing every query they have could they determine if the imposed limit
> breaks the intent of the query.
>
> Silent breaking changes to public APIs are not acceptable.
>
> --
> Mike Stolz
> Principal Engineer, GemFire Product Manager
> Mobile: +1-631-835-4771 <(631)%20835-4771>
>
> On Wed, Jul 12, 2017 at 1:29 PM, [email protected] <[email protected]>
> wrote:
>
>> We would like to avoid letting user accidentally issues a query that
>> would yield large result set even if they are dumping the result into a
>> file for performance reasons. If they want a large result set sent back by
>> gfsh, they have to do so consciously by adding a large limit in the query
>> themselves.
>>
>>
>>
>> -------- Original Message --------
>> Subject: Re: refactor query command
>> From: Swapnil Bawaskar
>> To: [email protected]
>> CC:
>>
>>
>> +1
>> One suggestion I would like to make is that if the user specifies that
>> the query results should go to a file, we should not apply the limit clause
>> on the server.
>>
>> On Tue, Jul 11, 2017 at 5:19 PM Jinmei Liao <[email protected]> wrote:
>>
>>> Basically, our reasoning is client-side pagination is not as useful as
>>> people would think, you can either get all the results dumped to the
>>> console, and use scroll bar to move back and forth, or dump it into a file,
>>> and uses whatever piping mechanism supported by your environment. The
>>> server side retrieves everything at once anyway and saves the entire result
>>> set in the backend. It's not like we are saving any server side work here.
>>>
>>> On Tue, Jul 11, 2017 at 4:22 PM, Jinmei Liao <[email protected]> wrote:
>>>
>>>> Currently the way it's implementing the client-side pagination is
>>>> convoluted and doubtfully useful. We are proposing to get rid of the
>>>> client-side pagination and only have the server side impose a limit (and
>>>> maybe implement pagination on the server side later on).
>>>>
>>>> The new behavior should look like this:
>>>>
>>>> gfsh> set  APP_FETCH_SIZE  50;
>>>> gfsh> query --query="select * from /A"  // suppose entry size is 3
>>>>
>>>> Result : true
>>>> Limit  : 50
>>>> Rows   : 3
>>>>
>>>> Result
>>>> --------
>>>> value1
>>>> value2
>>>> value3
>>>>
>>>>
>>>> gfsh> query --query="select * from /A"  // suppose entry size is 1000
>>>>
>>>> Result : true
>>>> Limit  : 50
>>>> Rows   : 50
>>>>
>>>> Result
>>>> --------
>>>> value1
>>>> ...
>>>> value50
>>>>
>>>> gfsh> query --query="select * from /A limit 100"  // suppose entry size
>>>> is 1000
>>>> Result : true
>>>> Rows   : 100
>>>>
>>>> Result
>>>> --------
>>>> value1
>>>> ...
>>>> value100
>>>>
>>>>
>>>> gfsh> query --query="select * from /A limit 500" --file="output.txt"
>>>>  // suppose entry size is 1000
>>>> Result : true
>>>> Rows   : 500
>>>>
>>>> Query results output to /var/tempFolder/output.txt
>>>>
>>>> (And the output.txt content to be:
>>>> Result
>>>> --------
>>>> value1
>>>> ...
>>>> value500)
>>>>
>>>>
>>>> Bear in mind that we are trying to get rid of client side pagination,
>>>> so the --page-size or --limit option would not apply anymore. Only the
>>>> limit inside the query will be honored by the server side. If they query
>>>> does not have a limit clause, the server side will impose a limit (default
>>>> to 100). The limit can only be explicitly overridden if user chooses to do
>>>> so. So that user would not accidentally execute a query that would result
>>>> in a large result set.
>>>>
>>>> Would this be sufficient to replace the client-side pagination?
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jul 11, 2017 at 2:26 PM, Anilkumar Gingade <[email protected]
>>>> > wrote:
>>>>
>>>>> To make it clear, gfsh could print the query it sent to server in the
>>>>> result summary (it shows if it got executed with the limit):
>>>>> Query     :
>>>>> Result     : true
>>>>> startCount : 0
>>>>> endCount   : 20
>>>>> Rows       : 1
>>>>>
>>>>> -Anil.
>>>>>
>>>>>
>>>>> On Tue, Jul 11, 2017 at 12:48 PM, John Blum <[email protected]> wrote:
>>>>>
>>>>>> I think it might be worth differentiating the result "LIMIT" (as
>>>>>> used in the OQL query statement like so... "SELECT * FROM /Region
>>>>>> WHERE ... LIMIT 1000")  from what is actually "streamed" back to
>>>>>> *Gfsh* as the default (e.g. 100).
>>>>>>
>>>>>> Clearly sending all the results back is quite expensive depending on
>>>>>> the number of results/LIMIT specified.  Therefore, whatever "--option"
>>>>>> is provided to the `query` command is a further reduction in what is
>>>>>> actually streamed back to the client (e.g. *Gfsh*) initially, sort
>>>>>> of like paging, therefore ... `gfsh> query --query="SELECT * FROM
>>>>>> /Region WHERE ... LIMIT 1000" --page-size=25`... perhaps?
>>>>>>
>>>>>> Therefore, I think having 2 limits, as in OQL LIMIT and a --limit
>>>>>> option would just be confusing to users.  LIMIT like sort (ORDER BY) can
>>>>>> only be effectively applied to the OQL as it determines what results the
>>>>>> query actually returns.
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 11, 2017 at 11:24 AM, Anilkumar Gingade <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> >> Actually a really nice thing would be to put the pagination
>>>>>>> feature into the OQL engine where it belongs.
>>>>>>> +1 on this.
>>>>>>>
>>>>>>> >> if they query mode is interactive, it sends the first 20
>>>>>>> (page-size, not configurable) records. and user uses "n" to go to the 
>>>>>>> next
>>>>>>> page,
>>>>>>> >> once it hits the last page (showing all 1000 record or get to the
>>>>>>> end of the result set), the command finishes.
>>>>>>>
>>>>>>> We could provide one more option to end user to quit getting to next
>>>>>>> page and go-back to gfsh command for new commands (if its not there).
>>>>>>>
>>>>>>> I think providing multiple options to view large result set, is a
>>>>>>> nice feature from tooling perspective (interactive result batching, 
>>>>>>> dumping
>>>>>>> into an external file, etc...)
>>>>>>>
>>>>>>> >> It’s fairly common in query tooling to be able to set a result
>>>>>>> set limit.
>>>>>>> Yes...many of the interactive query tools allow pagination/batching
>>>>>>> as part of the result display.
>>>>>>>
>>>>>>> >> gfsh> query --query='select * from /A limit 10' --limit=100
>>>>>>> We need to make sure that user can differentiate query commands from
>>>>>>> options provided by tool.
>>>>>>>
>>>>>>> -Anil.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jul 11, 2017 at 9:56 AM, William Markito Oliveira <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> The way I read this is: One is limiting on the server side, the
>>>>>>>> other is limiting the client side.  IOW within the query string is 
>>>>>>>> acting
>>>>>>>> on server side.
>>>>>>>>
>>>>>>>> On Tue, Jul 11, 2017 at 11:19 AM, Jinmei Liao <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> what if user wants to do:
>>>>>>>>> gfsh> query --query='select * from /A limit 10' --limit=100
>>>>>>>>>
>>>>>>>>> What's the difference between put it inside the query string or
>>>>>>>>> outside? I think eventually it's adding the limit clause to the query.
>>>>>>>>>
>>>>>>>>> On Tue, Jul 11, 2017 at 8:41 AM, Anthony Baker <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> It’s fairly common in query tooling to be able to set a result
>>>>>>>>>> set limit.  I would make this a first class option within gfsh 
>>>>>>>>>> instead of
>>>>>>>>>> an environment variable.
>>>>>>>>>>
>>>>>>>>>> gfsh> set query-limit=1000
>>>>>>>>>>
>>>>>>>>>> or
>>>>>>>>>>
>>>>>>>>>> gfsh> query --query='select * from /A’ --limit=1000
>>>>>>>>>>
>>>>>>>>>> The result set limit is semantically different from specifying a
>>>>>>>>>> LIMIT on the OQL query itself.
>>>>>>>>>>
>>>>>>>>>> Anthony
>>>>>>>>>>
>>>>>>>>>> On Jul 11, 2017, at 7:53 AM, William Markito Oliveira <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>> +1 for the combination of 1 and 2 as well.  It would be
>>>>>>>>>> interesting to explore at least a couple output formats, csv being 
>>>>>>>>>> one of
>>>>>>>>>> the most common for people that wants to import or analyze the data 
>>>>>>>>>> using
>>>>>>>>>> other tools.
>>>>>>>>>>
>>>>>>>>>> On Tue, Jul 11, 2017 at 8:31 AM, Michael Stolz <[email protected]
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Actually a really nice thing would be to put the pagination
>>>>>>>>>>> feature into the OQL engine where it belongs. Clients shouldn't 
>>>>>>>>>>> have to
>>>>>>>>>>> implement pagination.
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Mike Stolz
>>>>>>>>>>> Principal Engineer, GemFire Product Manager
>>>>>>>>>>> Mobile: +1-631-835-4771 <(631)%20835-4771>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jul 11, 2017 at 12:00 AM, Michael William Dodge <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I prefer to redirect output to a file when there is any chance
>>>>>>>>>>>> that the results might be huge. Thus I find the combination of #1 
>>>>>>>>>>>> and #2 to
>>>>>>>>>>>> be sufficient for me.
>>>>>>>>>>>>
>>>>>>>>>>>> Sarge
>>>>>>>>>>>>
>>>>>>>>>>>> > On 10 Jul, 2017, at 17:13, Jinmei Liao <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> >
>>>>>>>>>>>> > Hi, all gfsh-users,
>>>>>>>>>>>> >
>>>>>>>>>>>> > In our refactor week, we are trying to refactor how
>>>>>>>>>>>> multi-step command is implemented. The currently implementation is 
>>>>>>>>>>>> hard to
>>>>>>>>>>>> understand to begin with. The implementation breaks the OO design
>>>>>>>>>>>> principals in multiple ways. It's not thread-safe either. This is 
>>>>>>>>>>>> an
>>>>>>>>>>>> internal command type, and and only our "query" command uses it.
>>>>>>>>>>>> >
>>>>>>>>>>>> > This is how our current "query" command works:
>>>>>>>>>>>> > 1) user issues a "query --query='select * from /A'" command,
>>>>>>>>>>>> > 2) server retrieves the first 1000 (fetch-size, not
>>>>>>>>>>>> configurable) rows,
>>>>>>>>>>>> > 3) if the query mode is NOT interactive, it sends back all
>>>>>>>>>>>> the result at one.
>>>>>>>>>>>> > 4) if they query mode is interactive, it sends the first 20
>>>>>>>>>>>> (page-size, not configurable) records. and user uses "n" to go to 
>>>>>>>>>>>> the next
>>>>>>>>>>>> page, once it hits the last page (showing all 1000 record or get 
>>>>>>>>>>>> to the end
>>>>>>>>>>>> of the result set), the command finishes.
>>>>>>>>>>>> >
>>>>>>>>>>>> > we would like to ask how useful is this interactive feature.
>>>>>>>>>>>> Is it critical for you? Would the following simplification be 
>>>>>>>>>>>> sufficient?
>>>>>>>>>>>> >
>>>>>>>>>>>> > 1) query command always returns the entire fetch size. We can
>>>>>>>>>>>> make it configurable through environment variables, default to be 
>>>>>>>>>>>> 100, and
>>>>>>>>>>>> you can also reset it in each individual query command using "query
>>>>>>>>>>>> --query='select * from /A limit 10'
>>>>>>>>>>>> >
>>>>>>>>>>>> > 2) provide an option for you to specify a file where we can
>>>>>>>>>>>> dump all the query result in and you can use shell pagination to 
>>>>>>>>>>>> list the
>>>>>>>>>>>> content of the file.
>>>>>>>>>>>> >
>>>>>>>>>>>> > Please let us know your thoughts/comments. Thanks!
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> > --
>>>>>>>>>>>> > Cheers
>>>>>>>>>>>> >
>>>>>>>>>>>> > Jinmei
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> ~/William
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Cheers
>>>>>>>>>
>>>>>>>>> Jinmei
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> ~/William
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> -John
>>>>>> john.blum10101 (skype)
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Cheers
>>>>
>>>> Jinmei
>>>>
>>>
>>>
>>>
>>> --
>>> Cheers
>>>
>>> Jinmei
>>>
>>
>


-- 
-John
john.blum10101 (skype)

Re: refactor query command

Reply via email to