Re: Is there any other way? DataProviders must hit the Db twice for (possible) large datasets

Stefan Fußenegger Thu, 27 Nov 2008 00:46:35 -0800

Hi Igor,

I don't think IDataProvider is only about databases. There are other data
sources and some return the total amount and the desired subset at the same
time (Lucene as a famous example). Such data sources would really benefit of
a single-query-approach. I faced this issue myself in a search (read Lucene)
centered application. I successfully went down the road of implementing a
custom repeater. But when the repeater was working as desired, I figured out
that PagingNavigationLink is the real showstopper, not IDataProvider (see my
JIRA comment [0]). The fix would be rather trivial, as PagingNavigationLink
is doing something it needn't do (checking the requested page against the
valid range of pages).


Let me answer 2 possible questions in advance:

Q: Why is this check in PagingNavigationLink a problem?
A: Obviously, you can't fetch size and data as long as the page isn't known.

Q: How would a custom repeater that fetches data and size at the same tame
handle invalid (out of range) pages?
A: Out of range pages will return the size and an empty dataset. In this
case, the repeater would change the page number to the last valid and do a
second query. Yeah, two queries again. But this should only happen rarely
though.

Best regards, Stefan

[0]
https://issues.apache.org/jira/browse/WICKET-1784?focusedCommentId=12651278#action_12651278


igor.vaynberg wrote:
> 
> On Wed, Nov 26, 2008 at 9:32 AM, Wayne Pope
> <[EMAIL PROTECTED]> wrote:
>>>so you think pushing all that extra data over the network is actually
>>>more efficient then doing another query???? wtf.
>> The point is I'd rather avoid 2 calls where 1 will do.
>> AbstractPageableView
>> will do fine I believe.
> 
> the number of calls itself is meaningless, i dont comprehend why
> people have a hard time understanding this simple fact.
> 
> if you have one call that takes 1000ms and ten calls that each take
> 10ms you should concentrate on the one call that takes a long time
> rather then eliminating all ten 10ms calls which only saves you 100ms.
> if you can optimize the 1000ms and shave off 20% then your eleven
> calls are still faster then the one call.
> 
> and since connection pools have been inventind many years ago there is
> no more overhead of establishing network connections, just pushing
> bits around. maybe that is still a problem in php, but in java it has
> been solved a long time ago.
> 
> -igor
> 
> 
>>
>>>i can only assume that you have actually profiled your app and that
>>>one select count() call was what was taking a significant chunk of
>>>processing time in the database server? to the point where eliminating
>>>it will actually reduce enough load on the database to increase your
>>>throughput?
>>
>> No I haven't, as mentioned before, I just want to avoid 2 calls when one
>> will do.  I have however seen several times in production systems waiting
>> on
>> i/o's reduces your scalability. I'd rather keep server count down as
>> money
>> is tight.
>> I'll be mindfull not to ask 'stupid' questions again.
>>
>>
>>
>> On Wed, Nov 26, 2008 at 6:19 PM, Igor Vaynberg
>> <[EMAIL PROTECTED]>wrote:
>>
>>> On Wed, Nov 26, 2008 at 9:06 AM, Wayne Pope
>>> <[EMAIL PROTECTED]> wrote:
>>> > Hi Igor,
>>> >
>>> >>what? why would you ever load the whole dataset?
>>> > just to avoid 2 calls on smallish datasets, especially when there are
>>> > multiple joins and database isnt on the same box.
>>>
>>> so you think pushing all that extra data over the network is actually
>>> more efficient then doing another query???? wtf.
>>>
>>> >>yeah. because select count() queries are the most expensive queries
>>> >>you can run on the database. you are right, its totally going to kill
>>> >>it. you know how all those sites on the internet that have a pager
>>> >>above the pageable view that shows you the number of the last
>>> >>available page...you know how those work without doing a select
>>> >>count()?
>>> >
>>> > Ouch.
>>> > I just want to limit calls if possible to the database as waiting for
>>> i/o's
>>> > is never great for scalability. I'm not 'having a go' at wicket or
>>> DataViews
>>> > or anything, just trying to understand it. I never claimed to be a
>>> guru -
>>> > far from it.
>>>
>>> i can only assume that you have actually profiled your app and that
>>> one select count() call was what was taking a significant chunk of
>>> processing time in the database server? to the point where eliminating
>>> it will actually reduce enough load on the database to increase your
>>> throughput?
>>>
>>> -igor
>>>
>>> >
>>> > Wayne
>>> >
>>> >
>>> > On Wed, Nov 26, 2008 at 5:58 PM, Igor Vaynberg
>>> <[EMAIL PROTECTED]
>>> >wrote:
>>> >
>>> >> On Wed, Nov 26, 2008 at 7:32 AM, Wayne Pope
>>> >> <[EMAIL PROTECTED]> wrote:
>>> >> > I'm sure I must be missing something still, as I can't beleive that
>>> we
>>> >> need
>>> >> > to either a) load the whole data set
>>> >>
>>> >> what? why would you ever load the whole dataset?
>>> >>
>>> >> b) call count on the Db , then load in
>>> >> > the iterator mehod. Thats going to kill the database in prod (or
>>> really
>>> >> not
>>> >> > help.)
>>> >>
>>> >> yeah. because select count() queries are the most expensive queries
>>> >> you can run on the database. you are right, its totally going to kill
>>> >> it. you know how all those sites on the internet that have a pager
>>> >> above the pageable view that shows you the number of the last
>>> >> available page...you know how those work without doing a select
>>> >> count()?
>>> >>
>>> >> -igor
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> >
>>> >> > On Wed, Nov 26, 2008 at 3:58 PM, Michael Sparer <
>>> [EMAIL PROTECTED]
>>> >> >wrote:
>>> >> >
>>> >> >>
>>> >> >> have a look at https://issues.apache.org/jira/browse/WICKET-1784
>>> >> >>
>>> >> >>
>>> >> >> Wayne Pope-2 wrote:
>>> >> >> >
>>> >> >> > Ok,
>>> >> >> >
>>> >> >> > I was just having a bit of code clean up and I realized that in
>>> our
>>> >> >> > IDataProviders we are loading all rows for a given dataset.
>>> >> >> > So looking at the iterator method I see we can limit the result
>>> (and
>>> >> the
>>> >> >> > offset). Great I thought - however I see that that the size()
>>> method
>>> >> is
>>> >> >> > called as part of the getViewSize() in the AbstractPageableView.
>>> Thus
>>> >> I
>>> >> >> > need
>>> >> >> > to call the database here to figure out the size.
>>> >> >> >
>>> >> >> > Am I doing sonething wrong or have I got to hit the database
>>> twice
>>> for
>>> >> >> > each
>>> >> >> > DataProvider render.
>>> >> >> >
>>> >> >> > Obvously I don't want to hard code a size. Is there any other
>>> way ?
>>> >> >> >
>>> >> >> > Thanks
>>> >> >> > Wayne
>>> >> >> >
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >> -----
>>> >> >> Michael Sparer
>>> >> >> http://talk-on-tech.blogspot.com
>>> >> >> --
>>> >> >> View this message in context:
>>> >> >>
>>> >>
>>> http://www.nabble.com/Is-there-any-other-way--DataProviders-must-hit-the-Db-twice-for-%28possible%29-large-datasets-tp20701684p20702476.html
>>> >> >> Sent from the Wicket - User mailing list archive at Nabble.com.
>>> >> >>
>>> >> >>
>>> >> >>
>>> ---------------------------------------------------------------------
>>> >> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> >> >> For additional commands, e-mail: [EMAIL PROTECTED]
>>> >> >>
>>> >> >>
>>> >> >
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> >> For additional commands, e-mail: [EMAIL PROTECTED]
>>> >>
>>> >>
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 


-----
-------
Stefan Fußenegger
http://talk-on-tech.blogspot.com // looking for a nicer domain ;)
-- 
View this message in context: 
http://www.nabble.com/Is-there-any-other-way--DataProviders-must-hit-the-Db-twice-for-%28possible%29-large-datasets-tp20701684p20715382.html
Sent from the Wicket - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Is there any other way? DataProviders must hit the Db twice for (possible) large datasets

Reply via email to