Re: Is there any other way? DataProviders must hit the Db twice for (possible) large datasets

Stefan Fußenegger Thu, 27 Nov 2008 10:29:27 -0800

Hi Igor,

thanks for implementing this minimal version. i totally agree with your
reasoning. Is there any chance though that this goes into 1.3 branch as
well? I'd really appreciate that.


you mentioned that you implemented such a repeater yourself. didn't you use
any navigation or did you write that yourself? just wondering.

shall i open a ticket against 1.5 to track this issue/enhancement?

best regards, stefan



igor.vaynberg wrote:
> 
> On Thu, Nov 27, 2008 at 12:46 AM, Stefan Fußenegger
> <[EMAIL PROTECTED]> wrote:
>>
>> I don't think IDataProvider is only about databases.
> 
> you started off with your core assumption being wrong. idataprovider
> was written exclusively for accessing databases. my thinking, at the
> time, was that 99% of people use wicket to build applications that
> access databases, and i dare say it was a good guess because in its ~3
> years of existence only a handful of people had a problem with the way
> it works.
> 
>> There are other data
>> sources and some return the total amount and the desired subset at the
>> same
>> time (Lucene as a famous example). Such data sources would really benefit
>> of
>> a single-query-approach.
> 
> i am not disputing this fact. i am simply saying that we are not going
> to fix this right now because this is not a bug. you are trying to use
> the components for something they were not designed to be used. in 1.5
> we may address this.
> 
>> I faced this issue myself in a search (read Lucene)
>> centered application. I successfully went down the road of implementing a
>> custom repeater.
> 
> i had to do the same myself.
> 
>> But when the repeater was working as desired, I figured out
>> that PagingNavigationLink is the real showstopper, not IDataProvider (see
>> my
>> JIRA comment [0]). The fix would be rather trivial, as
>> PagingNavigationLink
>> is doing something it needn't do (checking the requested page against the
>> valid range of pages).
>>
>> Let me answer 2 possible questions in advance:
>>
>> Q: Why is this check in PagingNavigationLink a problem?
>> A: Obviously, you can't fetch size and data as long as the page isn't
>> known.
> 
> the check is there because we code defensively. we do not assume that
> every implementation of ipageable will cull the number when you call
> setcurrentpage(x).
> 
>> Q: How would a custom repeater that fetches data and size at the same
>> tame
>> handle invalid (out of range) pages?
>> A: Out of range pages will return the size and an empty dataset. In this
>> case, the repeater would change the page number to the last valid and do
>> a
>> second query. Yeah, two queries again. But this should only happen rarely
>> though.
> 
> this will change the existing behavior. if you are on page 5 and click
> page 10 (which happens to not exist) you would end up back on 5 with
> your suggestion where as currently you would properly end up on 9.
> 
> looking at WICKET-1784, i extracted the code you want into an
> overridable int cullPageNumber(int x). so feel free to subclass the
> link and override that to return x without any extra checks.
> 
> we may properly fix this in 1.5, but for right now this is too big a
> refactor because it changes the basic assumptions with which the code
> was written.
> 
> -igor
> 
>>
>> Best regards, Stefan
>>
>> [0]
>> https://issues.apache.org/jira/browse/WICKET-1784?focusedCommentId=12651278#action_12651278
>>
>>
>> igor.vaynberg wrote:
>>>
>>> On Wed, Nov 26, 2008 at 9:32 AM, Wayne Pope
>>> <[EMAIL PROTECTED]> wrote:
>>>>>so you think pushing all that extra data over the network is actually
>>>>>more efficient then doing another query???? wtf.
>>>> The point is I'd rather avoid 2 calls where 1 will do.
>>>> AbstractPageableView
>>>> will do fine I believe.
>>>
>>> the number of calls itself is meaningless, i dont comprehend why
>>> people have a hard time understanding this simple fact.
>>>
>>> if you have one call that takes 1000ms and ten calls that each take
>>> 10ms you should concentrate on the one call that takes a long time
>>> rather then eliminating all ten 10ms calls which only saves you 100ms.
>>> if you can optimize the 1000ms and shave off 20% then your eleven
>>> calls are still faster then the one call.
>>>
>>> and since connection pools have been inventind many years ago there is
>>> no more overhead of establishing network connections, just pushing
>>> bits around. maybe that is still a problem in php, but in java it has
>>> been solved a long time ago.
>>>
>>> -igor
>>>
>>>
>>>>
>>>>>i can only assume that you have actually profiled your app and that
>>>>>one select count() call was what was taking a significant chunk of
>>>>>processing time in the database server? to the point where eliminating
>>>>>it will actually reduce enough load on the database to increase your
>>>>>throughput?
>>>>
>>>> No I haven't, as mentioned before, I just want to avoid 2 calls when
>>>> one
>>>> will do.  I have however seen several times in production systems
>>>> waiting
>>>> on
>>>> i/o's reduces your scalability. I'd rather keep server count down as
>>>> money
>>>> is tight.
>>>> I'll be mindfull not to ask 'stupid' questions again.
>>>>
>>>>
>>>>
>>>> On Wed, Nov 26, 2008 at 6:19 PM, Igor Vaynberg
>>>> <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> On Wed, Nov 26, 2008 at 9:06 AM, Wayne Pope
>>>>> <[EMAIL PROTECTED]> wrote:
>>>>> > Hi Igor,
>>>>> >
>>>>> >>what? why would you ever load the whole dataset?
>>>>> > just to avoid 2 calls on smallish datasets, especially when there
>>>>> are
>>>>> > multiple joins and database isnt on the same box.
>>>>>
>>>>> so you think pushing all that extra data over the network is actually
>>>>> more efficient then doing another query???? wtf.
>>>>>
>>>>> >>yeah. because select count() queries are the most expensive queries
>>>>> >>you can run on the database. you are right, its totally going to
>>>>> kill
>>>>> >>it. you know how all those sites on the internet that have a pager
>>>>> >>above the pageable view that shows you the number of the last
>>>>> >>available page...you know how those work without doing a select
>>>>> >>count()?
>>>>> >
>>>>> > Ouch.
>>>>> > I just want to limit calls if possible to the database as waiting
>>>>> for
>>>>> i/o's
>>>>> > is never great for scalability. I'm not 'having a go' at wicket or
>>>>> DataViews
>>>>> > or anything, just trying to understand it. I never claimed to be a
>>>>> guru -
>>>>> > far from it.
>>>>>
>>>>> i can only assume that you have actually profiled your app and that
>>>>> one select count() call was what was taking a significant chunk of
>>>>> processing time in the database server? to the point where eliminating
>>>>> it will actually reduce enough load on the database to increase your
>>>>> throughput?
>>>>>
>>>>> -igor
>>>>>
>>>>> >
>>>>> > Wayne
>>>>> >
>>>>> >
>>>>> > On Wed, Nov 26, 2008 at 5:58 PM, Igor Vaynberg
>>>>> <[EMAIL PROTECTED]
>>>>> >wrote:
>>>>> >
>>>>> >> On Wed, Nov 26, 2008 at 7:32 AM, Wayne Pope
>>>>> >> <[EMAIL PROTECTED]> wrote:
>>>>> >> > I'm sure I must be missing something still, as I can't beleive
>>>>> that
>>>>> we
>>>>> >> need
>>>>> >> > to either a) load the whole data set
>>>>> >>
>>>>> >> what? why would you ever load the whole dataset?
>>>>> >>
>>>>> >> b) call count on the Db , then load in
>>>>> >> > the iterator mehod. Thats going to kill the database in prod (or
>>>>> really
>>>>> >> not
>>>>> >> > help.)
>>>>> >>
>>>>> >> yeah. because select count() queries are the most expensive queries
>>>>> >> you can run on the database. you are right, its totally going to
>>>>> kill
>>>>> >> it. you know how all those sites on the internet that have a pager
>>>>> >> above the pageable view that shows you the number of the last
>>>>> >> available page...you know how those work without doing a select
>>>>> >> count()?
>>>>> >>
>>>>> >> -igor
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> >
>>>>> >> > On Wed, Nov 26, 2008 at 3:58 PM, Michael Sparer <
>>>>> [EMAIL PROTECTED]
>>>>> >> >wrote:
>>>>> >> >
>>>>> >> >>
>>>>> >> >> have a look at https://issues.apache.org/jira/browse/WICKET-1784
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> Wayne Pope-2 wrote:
>>>>> >> >> >
>>>>> >> >> > Ok,
>>>>> >> >> >
>>>>> >> >> > I was just having a bit of code clean up and I realized that
>>>>> in
>>>>> our
>>>>> >> >> > IDataProviders we are loading all rows for a given dataset.
>>>>> >> >> > So looking at the iterator method I see we can limit the
>>>>> result
>>>>> (and
>>>>> >> the
>>>>> >> >> > offset). Great I thought - however I see that that the size()
>>>>> method
>>>>> >> is
>>>>> >> >> > called as part of the getViewSize() in the
>>>>> AbstractPageableView.
>>>>> Thus
>>>>> >> I
>>>>> >> >> > need
>>>>> >> >> > to call the database here to figure out the size.
>>>>> >> >> >
>>>>> >> >> > Am I doing sonething wrong or have I got to hit the database
>>>>> twice
>>>>> for
>>>>> >> >> > each
>>>>> >> >> > DataProvider render.
>>>>> >> >> >
>>>>> >> >> > Obvously I don't want to hard code a size. Is there any other
>>>>> way ?
>>>>> >> >> >
>>>>> >> >> > Thanks
>>>>> >> >> > Wayne
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> -----
>>>>> >> >> Michael Sparer
>>>>> >> >> http://talk-on-tech.blogspot.com
>>>>> >> >> --
>>>>> >> >> View this message in context:
>>>>> >> >>
>>>>> >>
>>>>> http://www.nabble.com/Is-there-any-other-way--DataProviders-must-hit-the-Db-twice-for-%28possible%29-large-datasets-tp20701684p20702476.html
>>>>> >> >> Sent from the Wicket - User mailing list archive at Nabble.com.
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> ---------------------------------------------------------------------
>>>>> >> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>>> >> >> For additional commands, e-mail: [EMAIL PROTECTED]
>>>>> >> >>
>>>>> >> >>
>>>>> >> >
>>>>> >>
>>>>> >>
>>>>> ---------------------------------------------------------------------
>>>>> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>>> >> For additional commands, e-mail: [EMAIL PROTECTED]
>>>>> >>
>>>>> >>
>>>>> >
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>>>
>>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>>
>>>
>>
>>
>> -----
>> -------
>> Stefan Fußenegger
>> http://talk-on-tech.blogspot.com // looking for a nicer domain ;)
>> --
>> View this message in context:
>> http://www.nabble.com/Is-there-any-other-way--DataProviders-must-hit-the-Db-twice-for-%28possible%29-large-datasets-tp20701684p20715382.html
>> Sent from the Wicket - User mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 


-----
-------
Stefan Fußenegger
http://talk-on-tech.blogspot.com // looking for a nicer domain ;)
-- 
View this message in context: 
http://www.nabble.com/Is-there-any-other-way--DataProviders-must-hit-the-Db-twice-for-%28possible%29-large-datasets-tp20701684p20723759.html
Sent from the Wicket - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Is there any other way? DataProviders must hit the Db twice for (possible) large datasets

Reply via email to