Re: Is there any other way? DataProviders must hit the Db twice for (possible) large datasets

Igor Vaynberg Thu, 27 Nov 2008 09:35:49 -0800

On Thu, Nov 27, 2008 at 12:46 AM, Stefan Fußenegger
<[EMAIL PROTECTED]> wrote:
>
> I don't think IDataProvider is only about databases.


you started off with your core assumption being wrong. idataprovider
was written exclusively for accessing databases. my thinking, at the
time, was that 99% of people use wicket to build applications that
access databases, and i dare say it was a good guess because in its ~3
years of existence only a handful of people had a problem with the way
it works.

> There are other data
> sources and some return the total amount and the desired subset at the same
> time (Lucene as a famous example). Such data sources would really benefit of
> a single-query-approach.

i am not disputing this fact. i am simply saying that we are not going
to fix this right now because this is not a bug. you are trying to use
the components for something they were not designed to be used. in 1.5
we may address this.

> I faced this issue myself in a search (read Lucene)
> centered application. I successfully went down the road of implementing a
> custom repeater.

i had to do the same myself.

> But when the repeater was working as desired, I figured out
> that PagingNavigationLink is the real showstopper, not IDataProvider (see my
> JIRA comment [0]). The fix would be rather trivial, as PagingNavigationLink
> is doing something it needn't do (checking the requested page against the
> valid range of pages).
>
> Let me answer 2 possible questions in advance:
>
> Q: Why is this check in PagingNavigationLink a problem?
> A: Obviously, you can't fetch size and data as long as the page isn't known.

the check is there because we code defensively. we do not assume that
every implementation of ipageable will cull the number when you call
setcurrentpage(x).

> Q: How would a custom repeater that fetches data and size at the same tame
> handle invalid (out of range) pages?
> A: Out of range pages will return the size and an empty dataset. In this
> case, the repeater would change the page number to the last valid and do a
> second query. Yeah, two queries again. But this should only happen rarely
> though.

this will change the existing behavior. if you are on page 5 and click
page 10 (which happens to not exist) you would end up back on 5 with
your suggestion where as currently you would properly end up on 9.

looking at WICKET-1784, i extracted the code you want into an
overridable int cullPageNumber(int x). so feel free to subclass the
link and override that to return x without any extra checks.

we may properly fix this in 1.5, but for right now this is too big a
refactor because it changes the basic assumptions with which the code
was written.

-igor

>
> Best regards, Stefan
>
> [0]
> https://issues.apache.org/jira/browse/WICKET-1784?focusedCommentId=12651278#action_12651278
>
>
> igor.vaynberg wrote:
>>
>> On Wed, Nov 26, 2008 at 9:32 AM, Wayne Pope
>> <[EMAIL PROTECTED]> wrote:
>>>>so you think pushing all that extra data over the network is actually
>>>>more efficient then doing another query???? wtf.
>>> The point is I'd rather avoid 2 calls where 1 will do.
>>> AbstractPageableView
>>> will do fine I believe.
>>
>> the number of calls itself is meaningless, i dont comprehend why
>> people have a hard time understanding this simple fact.
>>
>> if you have one call that takes 1000ms and ten calls that each take
>> 10ms you should concentrate on the one call that takes a long time
>> rather then eliminating all ten 10ms calls which only saves you 100ms.
>> if you can optimize the 1000ms and shave off 20% then your eleven
>> calls are still faster then the one call.
>>
>> and since connection pools have been inventind many years ago there is
>> no more overhead of establishing network connections, just pushing
>> bits around. maybe that is still a problem in php, but in java it has
>> been solved a long time ago.
>>
>> -igor
>>
>>
>>>
>>>>i can only assume that you have actually profiled your app and that
>>>>one select count() call was what was taking a significant chunk of
>>>>processing time in the database server? to the point where eliminating
>>>>it will actually reduce enough load on the database to increase your
>>>>throughput?
>>>
>>> No I haven't, as mentioned before, I just want to avoid 2 calls when one
>>> will do.  I have however seen several times in production systems waiting
>>> on
>>> i/o's reduces your scalability. I'd rather keep server count down as
>>> money
>>> is tight.
>>> I'll be mindfull not to ask 'stupid' questions again.
>>>
>>>
>>>
>>> On Wed, Nov 26, 2008 at 6:19 PM, Igor Vaynberg
>>> <[EMAIL PROTECTED]>wrote:
>>>
>>>> On Wed, Nov 26, 2008 at 9:06 AM, Wayne Pope
>>>> <[EMAIL PROTECTED]> wrote:
>>>> > Hi Igor,
>>>> >
>>>> >>what? why would you ever load the whole dataset?
>>>> > just to avoid 2 calls on smallish datasets, especially when there are
>>>> > multiple joins and database isnt on the same box.
>>>>
>>>> so you think pushing all that extra data over the network is actually
>>>> more efficient then doing another query???? wtf.
>>>>
>>>> >>yeah. because select count() queries are the most expensive queries
>>>> >>you can run on the database. you are right, its totally going to kill
>>>> >>it. you know how all those sites on the internet that have a pager
>>>> >>above the pageable view that shows you the number of the last
>>>> >>available page...you know how those work without doing a select
>>>> >>count()?
>>>> >
>>>> > Ouch.
>>>> > I just want to limit calls if possible to the database as waiting for
>>>> i/o's
>>>> > is never great for scalability. I'm not 'having a go' at wicket or
>>>> DataViews
>>>> > or anything, just trying to understand it. I never claimed to be a
>>>> guru -
>>>> > far from it.
>>>>
>>>> i can only assume that you have actually profiled your app and that
>>>> one select count() call was what was taking a significant chunk of
>>>> processing time in the database server? to the point where eliminating
>>>> it will actually reduce enough load on the database to increase your
>>>> throughput?
>>>>
>>>> -igor
>>>>
>>>> >
>>>> > Wayne
>>>> >
>>>> >
>>>> > On Wed, Nov 26, 2008 at 5:58 PM, Igor Vaynberg
>>>> <[EMAIL PROTECTED]
>>>> >wrote:
>>>> >
>>>> >> On Wed, Nov 26, 2008 at 7:32 AM, Wayne Pope
>>>> >> <[EMAIL PROTECTED]> wrote:
>>>> >> > I'm sure I must be missing something still, as I can't beleive that
>>>> we
>>>> >> need
>>>> >> > to either a) load the whole data set
>>>> >>
>>>> >> what? why would you ever load the whole dataset?
>>>> >>
>>>> >> b) call count on the Db , then load in
>>>> >> > the iterator mehod. Thats going to kill the database in prod (or
>>>> really
>>>> >> not
>>>> >> > help.)
>>>> >>
>>>> >> yeah. because select count() queries are the most expensive queries
>>>> >> you can run on the database. you are right, its totally going to kill
>>>> >> it. you know how all those sites on the internet that have a pager
>>>> >> above the pageable view that shows you the number of the last
>>>> >> available page...you know how those work without doing a select
>>>> >> count()?
>>>> >>
>>>> >> -igor
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> >
>>>> >> > On Wed, Nov 26, 2008 at 3:58 PM, Michael Sparer <
>>>> [EMAIL PROTECTED]
>>>> >> >wrote:
>>>> >> >
>>>> >> >>
>>>> >> >> have a look at https://issues.apache.org/jira/browse/WICKET-1784
>>>> >> >>
>>>> >> >>
>>>> >> >> Wayne Pope-2 wrote:
>>>> >> >> >
>>>> >> >> > Ok,
>>>> >> >> >
>>>> >> >> > I was just having a bit of code clean up and I realized that in
>>>> our
>>>> >> >> > IDataProviders we are loading all rows for a given dataset.
>>>> >> >> > So looking at the iterator method I see we can limit the result
>>>> (and
>>>> >> the
>>>> >> >> > offset). Great I thought - however I see that that the size()
>>>> method
>>>> >> is
>>>> >> >> > called as part of the getViewSize() in the AbstractPageableView.
>>>> Thus
>>>> >> I
>>>> >> >> > need
>>>> >> >> > to call the database here to figure out the size.
>>>> >> >> >
>>>> >> >> > Am I doing sonething wrong or have I got to hit the database
>>>> twice
>>>> for
>>>> >> >> > each
>>>> >> >> > DataProvider render.
>>>> >> >> >
>>>> >> >> > Obvously I don't want to hard code a size. Is there any other
>>>> way ?
>>>> >> >> >
>>>> >> >> > Thanks
>>>> >> >> > Wayne
>>>> >> >> >
>>>> >> >> >
>>>> >> >>
>>>> >> >>
>>>> >> >> -----
>>>> >> >> Michael Sparer
>>>> >> >> http://talk-on-tech.blogspot.com
>>>> >> >> --
>>>> >> >> View this message in context:
>>>> >> >>
>>>> >>
>>>> http://www.nabble.com/Is-there-any-other-way--DataProviders-must-hit-the-Db-twice-for-%28possible%29-large-datasets-tp20701684p20702476.html
>>>> >> >> Sent from the Wicket - User mailing list archive at Nabble.com.
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> ---------------------------------------------------------------------
>>>> >> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>> >> >> For additional commands, e-mail: [EMAIL PROTECTED]
>>>> >> >>
>>>> >> >>
>>>> >> >
>>>> >>
>>>> >> ---------------------------------------------------------------------
>>>> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>> >> For additional commands, e-mail: [EMAIL PROTECTED]
>>>> >>
>>>> >>
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>>
>>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>>
>
>
> -----
> -------
> Stefan Fußenegger
> http://talk-on-tech.blogspot.com // looking for a nicer domain ;)
> --
> View this message in context: 
> http://www.nabble.com/Is-there-any-other-way--DataProviders-must-hit-the-Db-twice-for-%28possible%29-large-datasets-tp20701684p20715382.html
> Sent from the Wicket - User mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Is there any other way? DataProviders must hit the Db twice for (possible) large datasets

Reply via email to