subject:"Re\: scaling up Exhibit \- an early experiment"

Re: scaling up Exhibit - an early experiment

2008-02-29 Thread David Huynh

Mark Diggory wrote:
> Most vendors appear to be implementing their own as extensions.   
> There is "ORDER_BY" in SPARQL.   Threads I've read suggest its no so  
> clean a mapping what a MIN or MAX URI might be, while COUNT should be  
> fairly straight forward.
>
> On a side note, I tracked allot of the development of XPath/XSLT and  
> in that realm, extensions played a heavy part in the evolution of  
> what functions or aggregates were most popular in the language. XSLT  
> 1.1 had the EXSLT project (http://www.exslt.org/) defining extensions  
> like this. Then XSLT 2.0 absorbed them into the standard.
>
> I think it wise of you to use whatever is most efficient directly on  
> Sesame this time.
>   
I see. It seems that the OpenLinks folks have already created such an 
extension to SPARQL
http://docs.openlinksw.com/virtuoso/rdfsparqlaggregate.html
So there's hope :-)

David

___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-27 Thread Mark Diggory

Most vendors appear to be implementing their own as extensions.   
There is "ORDER_BY" in SPARQL.   Threads I've read suggest its no so  
clean a mapping what a MIN or MAX URI might be, while COUNT should be  
fairly straight forward.

On a side note, I tracked allot of the development of XPath/XSLT and  
in that realm, extensions played a heavy part in the evolution of  
what functions or aggregates were most popular in the language. XSLT  
1.1 had the EXSLT project (http://www.exslt.org/) defining extensions  
like this. Then XSLT 2.0 absorbed them into the standard.

I think it wise of you to use whatever is most efficient directly on  
Sesame this time.

-Mark

On Feb 10, 2008, at 3:08 PM, David Huynh wrote:

> That's very, very unfortunate. Do you know why those aggregate  
> functions
> are not supported?
>
> Without aggregation performed at the source of the data, more data  
> than
> necessary has to be transfered over to the sink. The larger the  
> data set
> (the more useful aggregation is), the larger the transfer. The lack of
> support for aggregation pretty much cripples any advanced browsing
> functionality on top of SPARQL data sources.
>
> The first draft of SPARQL was in October 2004 [1]. I remember  
> informally
> suggesting adding aggregate functions to it in Summer/Fall 2006.
>
> David
> [1] http://www.w3.org/TR/2004/WD-rdf-sparql-query-20041012/
>
> Brian Caruso wrote:
>> The 2008 SPARQL recommendation definitely doesn't support COUNT  
>> and I don't
>> think it supports MIN, MAX or what you would think of as a SQL style
>> GROUP BY.
>>
>>
>> David Huynh wrote:
>>
>>> Right now server-side Backstage formulates its queries to the
>>> triple store by putting together Sesame "query algebra trees". If  
>>> SPARQL
>>> is as expressive as Sesame's query algebra (supporting GROUP, COUNT,
>>> MIN, MAX), then it shouldn't be hard to swap in a SPARQL end point
>>>
>>>
>> ___
>> General mailing list
>> General@simile.mit.edu
>> http://simile.mit.edu/mailman/listinfo/general
>>
>
> ___
> General mailing list
> General@simile.mit.edu
> http://simile.mit.edu/mailman/listinfo/general

___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-19 Thread Sabrish Subramanian


hi , 

If i incorporate a offline database such as SQLite GoogleGears instead for
database.js , can i overcome the problem of exhibit's lowered responsivness
due to large datasets. Kindly help...

thanks
Sabarish S

 

David Huynh-2 wrote:
> 
> Mark Diggory wrote:
>> A few more added comments...
>>
>> On Feb 7, 2008, at 4:44 PM, David Huynh wrote:
>>
>>   
>>> Note that server-side Backstage uses the URL of the exhibit as well as
>>> the set of data link URLs as the key to cache the triple store. This  
>>> is
>>> so that when several users view the same exhibit, only one triple  
>>> store
>>> is instantiated (to save space and time). There are technical  
>>> challenges
>>> here to address the case where the data at any one of those URLs is
>>> changed after the triple store is instantiated and loaded.
>>> 
>>
>> Are you using the Sesame "Context" in terms of the key you are  
>> referring to above?
>>   
> No, there are several separate triple stores, each instantiated and 
> loaded on demand (whenever a user views an exhibit that no other users 
> are currently viewing).
> 
>>> There is another technical challenge here: if the server session
>>> expires, all the interactive sessions get thrown away on the server.  
>>> The
>>> triple store might even get thrown away if no other user is looking at
>>> the same exhibit.
>>> 
>> It makes sense then to keep the store in "memory/on disk" separate  
>> from the users session so that it can always be available for new  
>> users, Longwell is designed this way and simply takes an RDF source  
>> and loads it into the triplestore on initialization of the server long  
>> before users interact with it.
>>
>> The area that longwell that does initialize when the first user  
>> interacts with it are the facets that are calculated across the whole  
>> triplestore, held in memory and presented in the start pages.
>>   
> The first user to view a particular backstaged exhibit will suffer all 
> that initial cost. Subsequent users will benefit from it.
> 
> Facets found in exhibits can actually be a lot more complicated than 
> facets in Longwell instances. In Longwell, a facet can only be defined 
> by a property (an RDF predicate). In Exhibit, it can be defined by an 
> Exhibit expression.
> 
>>> But you might still have that exhibit shown on your
>>> browser, and it's entirely reasonable to want to resume your  
>>> interaction
>>> with it after leaving it alone for a long time. At this point, your UI
>>> action (e.g., clicking in a facet) will cause client-side Backstage to
>>> call server-side Backstage, who has lost all its states about your
>>> interactive session. Server-side Backstage returns a particular error,
>>> which causes client-side Backstage to send over its whole state so  
>>> that
>>> server-side Backstage can reinitialize itself and pick up where it has
>>> left off.
>>> 
>>
>> When I think about our needs at MIT Libraries for Longwell and DSpace,  
>> the data will always be local or its source very controlled on the  
>> server side.  So the above cases would not be requirements for our  
>> usage of the application. Though I'm sure such "dynamic loading" might  
>> be of strong interest for others in the community.
>>   
> Right. I think what you care about is the ability to customize the site 
> using just HTML rather than hacking Velocity templates, JSP pages, etc.
> 
> Note that a single DSpace instance might still have sub-communities who 
> want to customize their own front pages, or even users who want 
> different pages for certain sub-collections. RDF lets users afford 
> flexible data models, and Exhibit style of UI configuration lets users 
> afford flexible UIs.
> 
> Or even better, a user might want to combine some DSpace data with her 
> own data (stored on her web site)...
> 
> David
> 
> ___
> General mailing list
> General@simile.mit.edu
> http://simile.mit.edu/mailman/listinfo/general
> 
> 

-- 
View this message in context: 
http://www.nabble.com/scaling-up-Exhibit---an-early-experiment-tp15339573p15560503.html
Sent from the SIMILE - General mailing list archive at Nabble.com.

___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-17 Thread Sabrish Subramanian


hi , 

Can anyone suggest the idea of what has to be done if the data feed is more
than 1000 recs ? either if i replace the database.js with another offline db
or it cannot be resolved at all ?...


thanks
Sabarish S




David Huynh-2 wrote:
> 
> Mark Diggory wrote:
>> A few more added comments...
>>
>> On Feb 7, 2008, at 4:44 PM, David Huynh wrote:
>>
>>   
>>> Note that server-side Backstage uses the URL of the exhibit as well as
>>> the set of data link URLs as the key to cache the triple store. This  
>>> is
>>> so that when several users view the same exhibit, only one triple  
>>> store
>>> is instantiated (to save space and time). There are technical  
>>> challenges
>>> here to address the case where the data at any one of those URLs is
>>> changed after the triple store is instantiated and loaded.
>>> 
>>
>> Are you using the Sesame "Context" in terms of the key you are  
>> referring to above?
>>   
> No, there are several separate triple stores, each instantiated and 
> loaded on demand (whenever a user views an exhibit that no other users 
> are currently viewing).
> 
>>> There is another technical challenge here: if the server session
>>> expires, all the interactive sessions get thrown away on the server.  
>>> The
>>> triple store might even get thrown away if no other user is looking at
>>> the same exhibit.
>>> 
>> It makes sense then to keep the store in "memory/on disk" separate  
>> from the users session so that it can always be available for new  
>> users, Longwell is designed this way and simply takes an RDF source  
>> and loads it into the triplestore on initialization of the server long  
>> before users interact with it.
>>
>> The area that longwell that does initialize when the first user  
>> interacts with it are the facets that are calculated across the whole  
>> triplestore, held in memory and presented in the start pages.
>>   
> The first user to view a particular backstaged exhibit will suffer all 
> that initial cost. Subsequent users will benefit from it.
> 
> Facets found in exhibits can actually be a lot more complicated than 
> facets in Longwell instances. In Longwell, a facet can only be defined 
> by a property (an RDF predicate). In Exhibit, it can be defined by an 
> Exhibit expression.
> 
>>> But you might still have that exhibit shown on your
>>> browser, and it's entirely reasonable to want to resume your  
>>> interaction
>>> with it after leaving it alone for a long time. At this point, your UI
>>> action (e.g., clicking in a facet) will cause client-side Backstage to
>>> call server-side Backstage, who has lost all its states about your
>>> interactive session. Server-side Backstage returns a particular error,
>>> which causes client-side Backstage to send over its whole state so  
>>> that
>>> server-side Backstage can reinitialize itself and pick up where it has
>>> left off.
>>> 
>>
>> When I think about our needs at MIT Libraries for Longwell and DSpace,  
>> the data will always be local or its source very controlled on the  
>> server side.  So the above cases would not be requirements for our  
>> usage of the application. Though I'm sure such "dynamic loading" might  
>> be of strong interest for others in the community.
>>   
> Right. I think what you care about is the ability to customize the site 
> using just HTML rather than hacking Velocity templates, JSP pages, etc.
> 
> Note that a single DSpace instance might still have sub-communities who 
> want to customize their own front pages, or even users who want 
> different pages for certain sub-collections. RDF lets users afford 
> flexible data models, and Exhibit style of UI configuration lets users 
> afford flexible UIs.
> 
> Or even better, a user might want to combine some DSpace data with her 
> own data (stored on her web site)...
> 
> David
> 
> ___
> General mailing list
> General@simile.mit.edu
> http://simile.mit.edu/mailman/listinfo/general
> 
> 

-- 
View this message in context: 
http://www.nabble.com/scaling-up-Exhibit---an-early-experiment-tp15339573p15539815.html
Sent from the SIMILE - General mailing list archive at Nabble.com.

___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-13 Thread Mark Diggory

David,

One other question, is Backstage doing any "data caching" outside of  
Sesame? For instance, in Longwell the start.vt facets are calculated  
from Sesame and cached in memory to enhance performance.   This makes  
it hard to put longwell on top of a triplestore that is managed/ 
updated independently.

-Mark

On Feb 12, 2008, at 7:51 AM, David Huynh wrote:

> Mark Diggory wrote:
>> One other question:
>>
>> Do the statements in your triple-store match your Exhibit data model
>> directly, or would it operate on any RDF statements in the triple-
>> store regardless of schema?
>>
>> http://simile.mit.edu/wiki/Exhibit/Understanding_Exhibit_Database
>>
> The hope is that it would operate on any RDF data set. But we might  
> have
> to enforce one restriction in order to facilitate certain  
> optimization.
> Namely, all objects in RDF statements with the same predicate must all
> be literals, or must all be resources.
>
> David
>
> ___
> General mailing list
> General@simile.mit.edu
> http://simile.mit.edu/mailman/listinfo/general

___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-12 Thread Mark Diggory



On Feb 12, 2008, at 7:43 AM, David Huynh wrote:


Mark Diggory wrote:

[snip]
To be able to combine data-sets in one server
and deliver multiple client configurations on that combined data.
I.E. that managing the data and delivering a User experience on that
are activities performed by separate but overlapping roles.

I hear you :) And I think "server-side" is where Stefano and Ryan  
have a

lot more experience than I do. I'm more focused on integration on the
client side, which is perhaps less mature than integration on the  
server

side.


What I like about the Backstage implementation at this point, is how  
light it is, its not to hard to follow the code and see how the  
expressions and database are being handled.  Webapplications don't  
"need" to be heavy and should be focused on that integration layer,  
optimization and configurability can come later (and ideally based on  
feedback from those trying to actually maintain the tool in production).



Facets found in exhibits can actually be a lot more complicated than
facets in Longwell instances. In Longwell, a facet can only be  
defined
by a property (an RDF predicate). In Exhibit, it can be defined  
by an

Exhibit expression.


That would be a very powerful capability and one I wish was present
in Longwell, It would not only allow us greater control the inclusion
of fields into a facet (and thus possibly improved performance) but
also would allow us to make conditional decisions about the inclusion
of a value based on the state of of items that have > one degree of
separation.  To me this means less need to "shape" or "dumb down" the
RDF going into the Longwell triple-store so that is renders in facets
appropriately.


Yup, it goes along with my little silly slogan

   your data, your mess, your business

meaning that however you model your data, we can make our tools  
adapt to
your data model. That's easier said than done. Exhibit does pretty  
well

on that front since we're not trying to scale. But it's not the same
story for Longwell (and Backstage).



What's your plan?



I actually think in the long run, it makes for a cleaner  
implementation to have clear delineation's between the integration  
your referring to above and the more mundane and standard management  
of resources on the backend.


In trying to productionize Longwell for [EMAIL PROTECTED], I've learned It  
would be benficial to be able to run Sesame as a middle-tier service  
rather than in the web-application instance.


So, I'd make some recommendations to work on keeping the CRUD  
functionality separate from the type of Repository being accessed so  
that HttpRepositories or in process MemoryStore SailRepositories are  
a matter of configuration and not coding. I would say this would be  
beneficial for both Backstage and Longwell.  Both have the  
triplestore hardcoded in process. This doesn't mean that the middle- 
tier isn't MemoryStore based, but just that its left open to  
configuration and can be changed independently of the deployed webapp.


-Mark

~
Mark R. Diggory - DSpace Systems Manager
MIT Libraries, Technology Research and Development
Massachusetts Institute of Technology



___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-12 Thread Brian Caruso

I have no idea why they left it out.  Many implementations have extensions
that support count and group so don't loose hope yet.

Jena's ARQ has one:
http://seaborne.blogspot.com/2007/09/counting-and-group-by.html

David Huynh wrote:
> That's very, very unfortunate. Do you know why those aggregate functions 
> are not supported?
>
> Without aggregation performed at the source of the data, more data than 
> necessary has to be transfered over to the sink. The larger the data set 
> (the more useful aggregation is), the larger the transfer. The lack of 
> support for aggregation pretty much cripples any advanced browsing 
> functionality on top of SPARQL data sources.
>
> The first draft of SPARQL was in October 2004 [1]. I remember informally 
> suggesting adding aggregate functions to it in Summer/Fall 2006.
>
> David
> [1] http://www.w3.org/TR/2004/WD-rdf-sparql-query-20041012/
>
>
>   
___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-12 Thread Mark Diggory


On Feb 12, 2008, at 7:51 AM, David Huynh wrote:

> Mark Diggory wrote:
>> One other question:
>>
>> Do the statements in your triple-store match your Exhibit data model
>> directly, or would it operate on any RDF statements in the triple-
>> store regardless of schema?
>>
>> http://simile.mit.edu/wiki/Exhibit/Understanding_Exhibit_Database
>>
> The hope is that it would operate on any RDF data set. But we might  
> have
> to enforce one restriction in order to facilitate certain  
> optimization.
> Namely, all objects in RDF statements with the same predicate must all
> be literals, or must all be resources.
>
> David

Since we maintain our own RDF'ization tooling, I think we can live  
with that restriction.

-Mark
___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-12 Thread David Huynh

Mark Diggory wrote:
> One other question:
>
> Do the statements in your triple-store match your Exhibit data model  
> directly, or would it operate on any RDF statements in the triple- 
> store regardless of schema?
>
> http://simile.mit.edu/wiki/Exhibit/Understanding_Exhibit_Database
>   
The hope is that it would operate on any RDF data set. But we might have 
to enforce one restriction in order to facilitate certain optimization. 
Namely, all objects in RDF statements with the same predicate must all 
be literals, or must all be resources.

David

___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-12 Thread David Huynh

Mark Diggory wrote:
> [snip]
>
> Based on your comments to Neil and on comments Stefano and others  
> have made about maybe refactoring Longwell into a "Service" and  
> separate "Clients". I'd say the important use-case for those of us  
> who are already running java enterprise / web-applications is that we  
> would like to have flexibility and configurability on the server side  
> as much as possible.  To be able to combine data-sets in one server  
> and deliver multiple client configurations on that combined data.  
> I.E. that managing the data and delivering a User experience on that  
> are activities performed by separate but overlapping roles.
>   
I hear you :) And I think "server-side" is where Stefano and Ryan have a 
lot more experience than I do. I'm more focused on integration on the 
client side, which is perhaps less mature than integration on the server 
side.

>> Facets found in exhibits can actually be a lot more complicated than
>> facets in Longwell instances. In Longwell, a facet can only be defined
>> by a property (an RDF predicate). In Exhibit, it can be defined by an
>> Exhibit expression.
>> 
> That would be a very powerful capability and one I wish was present  
> in Longwell, It would not only allow us greater control the inclusion  
> of fields into a facet (and thus possibly improved performance) but  
> also would allow us to make conditional decisions about the inclusion  
> of a value based on the state of of items that have > one degree of  
> separation.  To me this means less need to "shape" or "dumb down" the  
> RDF going into the Longwell triple-store so that is renders in facets  
> appropriately.
>   
Yup, it goes along with my little silly slogan

your data, your mess, your business

meaning that however you model your data, we can make our tools adapt to 
your data model. That's easier said than done. Exhibit does pretty well 
on that front since we're not trying to scale. But it's not the same 
story for Longwell (and Backstage).

> [snip]
> I hope you can glean from what I've said above, that what we are  
> working to accomplish is much more than theme and branding HTML/CSS,  
> we are actually trying to give the user greater ability to explore a  
> metadata space that is managed by more than one role...
>
> For instance, with the following roles on a Digital Object (Admin,  
> Manager, Submitter and Viewer) We may have one "Item" with statements  
> made by the different users in those different roles, I.E. Users or  
> Submitters may make a "Comment" about the "Item", while Managers and  
> Admin would be able to change the "Item" more directly.  And even  
> then, we might retain a "History" of those changes that did happen to  
> the "Item".
>
> After all these roles are creating statements in their various  
> domains. We then need a expose the whole space in a way that can be  
> both explored by both user choice and also have controlled access by  
> users rights in the system.
>   
As a seasoned professor here loves to say, "one bottomless pit at a 
time!" :-) I think the most important thing now is to figure out the 
order in which you explore these bottomless pits. What's your plan?

David

___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-11 Thread Mark Diggory

One other question:

Do the statements in your triple-store match your Exhibit data model  
directly, or would it operate on any RDF statements in the triple- 
store regardless of schema?

http://simile.mit.edu/wiki/Exhibit/Understanding_Exhibit_Database

-Mark

On Feb 10, 2008, at 3:08 PM, David Huynh wrote:

> That's very, very unfortunate. Do you know why those aggregate  
> functions
> are not supported?
>
> Without aggregation performed at the source of the data, more data  
> than
> necessary has to be transfered over to the sink. The larger the data  
> set
> (the more useful aggregation is), the larger the transfer. The lack of
> support for aggregation pretty much cripples any advanced browsing
> functionality on top of SPARQL data sources.
>
> The first draft of SPARQL was in October 2004 [1]. I remember  
> informally
> suggesting adding aggregate functions to it in Summer/Fall 2006.
>
> David
> [1] http://www.w3.org/TR/2004/WD-rdf-sparql-query-20041012/
>
> Brian Caruso wrote:
>> The 2008 SPARQL recommendation definitely doesn't support COUNT and  
>> I don't
>> think it supports MIN, MAX or what you would think of as a SQL style
>> GROUP BY.
>>
>>
>> David Huynh wrote:
>>
>>> Right now server-side Backstage formulates its queries to the
>>> triple store by putting together Sesame "query algebra trees". If  
>>> SPARQL
>>> is as expressive as Sesame's query algebra (supporting GROUP, COUNT,
>>> MIN, MAX), then it shouldn't be hard to swap in a SPARQL end point
>>>
>>>
>> ___
>> General mailing list
>> General@simile.mit.edu
>> http://simile.mit.edu/mailman/listinfo/general
>>
>
> ___
> General mailing list
> General@simile.mit.edu
> http://simile.mit.edu/mailman/listinfo/general

___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-11 Thread Mark Diggory

On Feb 8, 2008, at 2:03 PM, David Huynh wrote:

> Mark Diggory wrote:
>> A few more added comments...
>>
>> On Feb 7, 2008, at 4:44 PM, David Huynh wrote:
>>> ...
>>> here to address the case where the data at any one of those URLs is
>>> changed after the triple store is instantiated and loaded.
>>>
>> Are you using the Sesame "Context" in terms of the key you are
>> referring to above?
>>
> No, there are several separate triple stores, each instantiated and
> loaded on demand (whenever a user views an exhibit that no other users
> are currently viewing).

Certainly a valid approach that works as long as you have very  
separate "exhibits", In our case we want to provide a shared search/ 
discovery space across all our collections so it sounds like we would  
want a system that used one triple-store so that queries could be run  
across data coming from all collections, this is what Longwell is  
currently giving us.

Based on your comments to Neil and on comments Stefano and others  
have made about maybe refactoring Longwell into a "Service" and  
separate "Clients". I'd say the important use-case for those of us  
who are already running java enterprise / web-applications is that we  
would like to have flexibility and configurability on the server side  
as much as possible.  To be able to combine data-sets in one server  
and deliver multiple client configurations on that combined data.  
I.E. that managing the data and delivering a User experience on that  
are activities performed by separate but overlapping roles.

> Facets found in exhibits can actually be a lot more complicated than
> facets in Longwell instances. In Longwell, a facet can only be defined
> by a property (an RDF predicate). In Exhibit, it can be defined by an
> Exhibit expression.

That would be a very powerful capability and one I wish was present  
in Longwell, It would not only allow us greater control the inclusion  
of fields into a facet (and thus possibly improved performance) but  
also would allow us to make conditional decisions about the inclusion  
of a value based on the state of of items that have > one degree of  
separation.  To me this means less need to "shape" or "dumb down" the  
RDF going into the Longwell triple-store so that is renders in facets  
appropriately.

>>> ...
>>
>> When I think about our needs at MIT Libraries for Longwell and  
>> DSpace,
>> the data will always be local or its source very controlled on the
>> server side.  So the above cases would not be requirements for our
>> usage of the application. Though I'm sure such "dynamic loading"  
>> might
>> be of strong interest for others in the community.
>>
> Right. I think what you care about is the ability to customize the  
> site
> using just HTML rather than hacking Velocity templates, JSP pages,  
> etc.

I hope you can glean from what I've said above, that what we are  
working to accomplish is much more than theme and branding HTML/CSS,  
we are actually trying to give the user greater ability to explore a  
metadata space that is managed by more than one role...

For instance, with the following roles on a Digital Object (Admin,  
Manager, Submitter and Viewer) We may have one "Item" with statements  
made by the different users in those different roles, I.E. Users or  
Submitters may make a "Comment" about the "Item", while Managers and  
Admin would be able to change the "Item" more directly.  And even  
then, we might retain a "History" of those changes that did happen to  
the "Item".

After all these roles are creating statements in their various  
domains. We then need a expose the whole space in a way that can be  
both explored by both user choice and also have controlled access by  
users rights in the system.

> Note that a single DSpace instance might still have sub-communities  
> who
> want to customize their own front pages, or even users who want
> different pages for certain sub-collections. RDF lets users afford
> flexible data models, and Exhibit style of UI configuration lets users
> afford flexible UIs.
>
>
> Or even better, a user might want to combine some DSpace data with her
> own data (stored on her web site)...

Sure to allow more adventurous users to create their own exhibits. we  
might expose our backstage as a service that their externally  
maintained exhibit can connect to. But that well outside our current  
roadmap.

This is mostly just an exercise, our current push for a production  
prototype is Longwell focused. But its good to explore this other  
option and see if it may inform our final solution.  Anyways,  
Backstage looks like a great tool with a promising future.

Cheers,
Mark
___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-10 Thread David Huynh

That's very, very unfortunate. Do you know why those aggregate functions 
are not supported?

Without aggregation performed at the source of the data, more data than 
necessary has to be transfered over to the sink. The larger the data set 
(the more useful aggregation is), the larger the transfer. The lack of 
support for aggregation pretty much cripples any advanced browsing 
functionality on top of SPARQL data sources.

The first draft of SPARQL was in October 2004 [1]. I remember informally 
suggesting adding aggregate functions to it in Summer/Fall 2006.

David
[1] http://www.w3.org/TR/2004/WD-rdf-sparql-query-20041012/

Brian Caruso wrote:
> The 2008 SPARQL recommendation definitely doesn't support COUNT and I don't
> think it supports MIN, MAX or what you would think of as a SQL style
> GROUP BY.
>
>
> David Huynh wrote:
>   
>> Right now server-side Backstage formulates its queries to the 
>> triple store by putting together Sesame "query algebra trees". If SPARQL 
>> is as expressive as Sesame's query algebra (supporting GROUP, COUNT, 
>> MIN, MAX), then it shouldn't be hard to swap in a SPARQL end point
>>   
>> 
> ___
> General mailing list
> General@simile.mit.edu
> http://simile.mit.edu/mailman/listinfo/general
>   

___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-10 Thread David Huynh

Hi Neil,

There are clearly different user groups who need different tools and 
I'll be glad to address as many needs as possible. The principle of 
light-weight publishing should be re-interpreted for each case.

Longwell 2 has already eased database setup and configuration hassles: 
you just run it pointing to a directory of RDF files and you can right 
away get faceted browsing, albeit in a generic way. Longwell's other 
rough edges hide away this novelty, unfortunately.

Longwell-CSI makes it possible to integrate that generic faceted 
browsing UI into any web app framework that you have by positioning the 
point of integration on the client side in HTML and Javascript. We 
started to allow for configuring Longwell-CSI's UI in Javascript, but 
that hasn't gotten far.

Backstage was aimed to resolve 2 unknowns
1. how to spawn and set up database-backed web apps on the fly (which is 
not what you desire the most since you want to have your own database 
already set up)
2. how to bring the Exhibit style of configuration (HTML-based) back 
into a database-backed web app (which you want)
The demo showed #1 and just a little bit of #2. The rest of #2 involves 
supporting lens templates and richer views like maps and timelines.

So, Exhibit alone offers the small-scale, zero-setup option. Exhibit + 
public Backstage service offer the medium-scale, zero-setup option. And 
Exhibit + private Backstage installation offer the large-scale, 
some-setup option.

If you count David Karger's suggestion to build Backstage into a Firefox 
extension, that's another option, particularly for boosting performance 
of medium-scale exhibits whose authors do not use the public Backstage 
service. For example, if you have a medium-sized local data file that 
cannot leave your computer, then that extension will be an easier 
solution than running Backstage separately. In a sense, that's coming 
back a full circle to Piggy Bank [1], our earliest attempt to bring data 
browsing technologies to casual users. We've learned some lessons along 
the way :-)

David
[1] http://simile.mit.edu/wiki/Piggy_Bank

Neil Ireson wrote:
> Hi David,
>
> Wow, that sounds fantastic.
>
> Regarding your mentioning of...
> Regarding using Backstage on licensed/private data, it is possible to 
> install Backstage on your own server and tell Exhibit to use that 
> instance rather than a public Backstage service. The experiment right 
> now does not include that option, but there is no technical challenge 
> there. You actually might even want to run Backstage yourself and 
> connect it directly to a local Sesame store if your data is too large to 
> transfer as a JSON file.
>  
> I would be very interested in this functionality (the linking to a 
> local Sesame Store), I know in some ways this goes against your 
> principles of "light-weight" publishing. However I would have thought 
> that it is only natural that as the user demands more "advanced" 
> functionality and customisation then they might reasonable expect to 
> have to do more than alter some parameters, or even overload a function.
>
> Though I fear I would be little or no help in terms of development, I 
> would gladly help in and way my time and ability allow (e.g. user 
> testing).
>
> N
>
>
>
> 
> Think you know your TV, music and film? Try Search Charades! 
> 
> 
>
> ___
> General mailing list
> General@simile.mit.edu
> http://simile.mit.edu/mailman/listinfo/general
>   

___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-08 Thread Brian Caruso

The 2008 SPARQL recommendation definitely doesn't support COUNT and I don't
think it supports MIN, MAX or what you would think of as a SQL style
GROUP BY.


David Huynh wrote:
> Right now server-side Backstage formulates its queries to the 
> triple store by putting together Sesame "query algebra trees". If SPARQL 
> is as expressive as Sesame's query algebra (supporting GROUP, COUNT, 
> MIN, MAX), then it shouldn't be hard to swap in a SPARQL end point
>   
___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-08 Thread David Huynh

Mark Diggory wrote:
> A few more added comments...
>
> On Feb 7, 2008, at 4:44 PM, David Huynh wrote:
>
>   
>> Note that server-side Backstage uses the URL of the exhibit as well as
>> the set of data link URLs as the key to cache the triple store. This  
>> is
>> so that when several users view the same exhibit, only one triple  
>> store
>> is instantiated (to save space and time). There are technical  
>> challenges
>> here to address the case where the data at any one of those URLs is
>> changed after the triple store is instantiated and loaded.
>> 
>
> Are you using the Sesame "Context" in terms of the key you are  
> referring to above?
>   
No, there are several separate triple stores, each instantiated and 
loaded on demand (whenever a user views an exhibit that no other users 
are currently viewing).

>> There is another technical challenge here: if the server session
>> expires, all the interactive sessions get thrown away on the server.  
>> The
>> triple store might even get thrown away if no other user is looking at
>> the same exhibit.
>> 
> It makes sense then to keep the store in "memory/on disk" separate  
> from the users session so that it can always be available for new  
> users, Longwell is designed this way and simply takes an RDF source  
> and loads it into the triplestore on initialization of the server long  
> before users interact with it.
>
> The area that longwell that does initialize when the first user  
> interacts with it are the facets that are calculated across the whole  
> triplestore, held in memory and presented in the start pages.
>   
The first user to view a particular backstaged exhibit will suffer all 
that initial cost. Subsequent users will benefit from it.

Facets found in exhibits can actually be a lot more complicated than 
facets in Longwell instances. In Longwell, a facet can only be defined 
by a property (an RDF predicate). In Exhibit, it can be defined by an 
Exhibit expression.

>> But you might still have that exhibit shown on your
>> browser, and it's entirely reasonable to want to resume your  
>> interaction
>> with it after leaving it alone for a long time. At this point, your UI
>> action (e.g., clicking in a facet) will cause client-side Backstage to
>> call server-side Backstage, who has lost all its states about your
>> interactive session. Server-side Backstage returns a particular error,
>> which causes client-side Backstage to send over its whole state so  
>> that
>> server-side Backstage can reinitialize itself and pick up where it has
>> left off.
>> 
>
> When I think about our needs at MIT Libraries for Longwell and DSpace,  
> the data will always be local or its source very controlled on the  
> server side.  So the above cases would not be requirements for our  
> usage of the application. Though I'm sure such "dynamic loading" might  
> be of strong interest for others in the community.
>   
Right. I think what you care about is the ability to customize the site 
using just HTML rather than hacking Velocity templates, JSP pages, etc.

Note that a single DSpace instance might still have sub-communities who 
want to customize their own front pages, or even users who want 
different pages for certain sub-collections. RDF lets users afford 
flexible data models, and Exhibit style of UI configuration lets users 
afford flexible UIs.

Or even better, a user might want to combine some DSpace data with her 
own data (stored on her web site)...

David

___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-08 Thread David Huynh

Mark Diggory wrote:
> On Feb 7, 2008, at 4:44 PM, David Huynh wrote:
>
>   
>> The JSONP protocol will be pretty specific to Backstage. If there's a
>> desire to load data through SPARQL query, the right place to hook in
>> would be between the server-side code of Backstage and the SPARQL end
>> point. Right now server-side Backstage formulates its queries to the
>> triple store by putting together Sesame "query algebra trees". If  
>> SPARQL
>> is as expressive as Sesame's query algebra (supporting GROUP, COUNT,
>> MIN, MAX), then it shouldn't be hard to swap in a SPARQL end point.
>> 
>
> If I recall correctly, Longwell itself doesn't use SPARQL queries  
> right now as thats a recent addition to Sesame.  It doesn't sound like  
> your work is that far removed from the way that Longwell initializes a  
> Sesame memory or native store off a source of RDF data.
>
> Although a tangent... I thought it would be a great idea to expose the  
> SPARQL engine of the Sesame instance in Longwell as an endpoint.  This  
> would allow for a rather powerful and flexible exposure of a DSpace  
> instances content within the semantic web.  On top of this, if  
> services can be implemented on top of that engine, then they can be  
> much more encapsulated and reusable, for instance a mapping between  
> SRU/CQL and SPARQL and/or OAI and SPARQL would provide a the sysadmin  
> greater configurability over how content is represented in those  
> interfaces.
>   
Yes, exposing a SPARQL end point would be trivial as we use Sesame 
underneath. For a fixed, large data set, that would be a conventional 
thing to do. For dynamic data sets, that's kinda fun: you "rent" a 
triple store just to do SPARQL on data at some URL.

David

___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

RE: scaling up Exhibit - an early experiment

2008-02-08 Thread C Anthony Lewis

David,

I'm seeing this when I follow the link you gave in your initial email (as 
below)... I'm not yet at the stage to get my grey matter around trying it on my 
own exhibit, in fact I'm still getting to grips with Exhibit itself!

Anthony

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David Huynh
Sent: 08 February 2008 16:00
To: General List
Subject: Re: scaling up Exhibit - an early experiment

Anthony,

I'm not sure about that bug you're seeing. Is this on the demo URL or
are you trying Backstage on your own exhibit? Backstage doesn't work on
an arbitrary exhibit right now.

David

C Anthony Lewis wrote:
> David,
>
> This sounds really interesting but I keep getting an alert box telling me:
> Caught exception: Error firing event of name onRootCollectionSet to wildcard 
> handler
> Details: TypeError : stategroupDoms[groupLevel - 1] has no properties
>
> Maybe I'm doing something wrong (I'm using Windows/Firefox 2.0.0.12)?
>
> Keep up the good work... I've played with a number of Simile projects so far 
> and they're all fantastic!
>
> Anthony
>
>
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David Huynh
> Sent: 07 February 2008 17:49
> To: General List
> Subject: scaling up Exhibit - an early experiment
>
> Hi all,
>
> Some people have expressed a desire to use Exhibit on larger data sets,
> and I have mentioned that there is an effort to address that need. This
> is not a trivial engineering effort--it'll take months. But I'd like to
> show you a very, very early experiment (codenamed Backstage) to explain
> where we're heading.
>
> Point your Firefox browser at:
> http://people.csail.mit.edu/dfhuynh/misc/backstage-demo.html
> (I will keep this demo up for 1 day only as this is running on my
> own development machine.)
> Note that there are 2383 items (only 20 are displayed, but the facets
> are complete).
>

< --- Snip --->

___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-08 Thread David Huynh

Anthony,

I'm not sure about that bug you're seeing. Is this on the demo URL or 
are you trying Backstage on your own exhibit? Backstage doesn't work on 
an arbitrary exhibit right now.

David

C Anthony Lewis wrote:
> David,
>
> This sounds really interesting but I keep getting an alert box telling me:
> Caught exception: Error firing event of name onRootCollectionSet to wildcard 
> handler
> Details: TypeError : stategroupDoms[groupLevel - 1] has no properties
>
> Maybe I'm doing something wrong (I'm using Windows/Firefox 2.0.0.12)?
>
> Keep up the good work... I've played with a number of Simile projects so far 
> and they're all fantastic!
>
> Anthony
>
>
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David Huynh
> Sent: 07 February 2008 17:49
> To: General List
> Subject: scaling up Exhibit - an early experiment
>
> Hi all,
>
> Some people have expressed a desire to use Exhibit on larger data sets,
> and I have mentioned that there is an effort to address that need. This
> is not a trivial engineering effort--it'll take months. But I'd like to
> show you a very, very early experiment (codenamed Backstage) to explain
> where we're heading.
>
> Point your Firefox browser at:
> http://people.csail.mit.edu/dfhuynh/misc/backstage-demo.html
> (I will keep this demo up for 1 day only as this is running on my
> own development machine.)
> Note that there are 2383 items (only 20 are displayed, but the facets
> are complete).
>
> Take a look at the HTML source code. You'll see the usual simplicity
> found in exhibits' HTML source code. Right now 2 different APIs are included
>
>  src="http://static.simile.mit.edu/exhibit/api-2.0/exhibit-api.js?autoCreate=false";>
>  src="http://dfhuynh.csail.mit.edu:8181/backstage/api/backstage-api.js";>
>
> The Backstage API consists of Javascript code as well as Java code
> running on my machine. In the future, the two APIs will be blended
> together so that you'll only need to include exhibit-api.js and set a
> flag, e.g.,
>
>  src="http://static.simile.mit.edu/exhibit/api-2.0/exhibit-api.js?backstage=true";>
>
> But for now, the 2 APIs actually serve to make a point. There are 3
> parties involved
> - the data comes from wingerz.com
> - the configuration of the exhibit comes from people.csail.mit.edu
> - the actual computation (think facets) comes from
> dfhuynh.csail.mit.edu:8181
> This is an advanced form of mash-up where you "borrow" data from one
> party (just by linking to it), "delegate" computations to another party,
> and tie it all together with some simple HTML code. "Delegation" is done
> automatically for you, and those computational resources you get for
> free actually include a real database, spawned and configured on the fly
> to meet your needs.
>
> The current performance should be better than Exhibit for this data set,
> but it has not been optimized, especially for several concurrent users,
> and especially because I have an old machine. But it's conceivable that
> we'll have a farm of fast machines all running Backstage, to which
> exhibits with large data sets can delegate automatically.
>
> (I can explain the inner technical workings of Backstage in a subsequent
> email if anyone is interested to know.)
>
> Cheers,
>
> David
>
> ___
> General mailing list
> General@simile.mit.edu
> http://simile.mit.edu/mailman/listinfo/general
>
> ___
> General mailing list
> General@simile.mit.edu
> http://simile.mit.edu/mailman/listinfo/general
>   

___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-08 Thread Mark Diggory

A few more added comments...

On Feb 7, 2008, at 4:44 PM, David Huynh wrote:

> Note that server-side Backstage uses the URL of the exhibit as well as
> the set of data link URLs as the key to cache the triple store. This  
> is
> so that when several users view the same exhibit, only one triple  
> store
> is instantiated (to save space and time). There are technical  
> challenges
> here to address the case where the data at any one of those URLs is
> changed after the triple store is instantiated and loaded.

Are you using the Sesame "Context" in terms of the key you are  
referring to above?

> There is another technical challenge here: if the server session
> expires, all the interactive sessions get thrown away on the server.  
> The
> triple store might even get thrown away if no other user is looking at
> the same exhibit.

It makes sense then to keep the store in "memory/on disk" separate  
from the users session so that it can always be available for new  
users, Longwell is designed this way and simply takes an RDF source  
and loads it into the triplestore on initialization of the server long  
before users interact with it.

The area that longwell that does initialize when the first user  
interacts with it are the facets that are calculated across the whole  
triplestore, held in memory and presented in the start pages.

> But you might still have that exhibit shown on your
> browser, and it's entirely reasonable to want to resume your  
> interaction
> with it after leaving it alone for a long time. At this point, your UI
> action (e.g., clicking in a facet) will cause client-side Backstage to
> call server-side Backstage, who has lost all its states about your
> interactive session. Server-side Backstage returns a particular error,
> which causes client-side Backstage to send over its whole state so  
> that
> server-side Backstage can reinitialize itself and pick up where it has
> left off.

When I think about our needs at MIT Libraries for Longwell and DSpace,  
the data will always be local or its source very controlled on the  
server side.  So the above cases would not be requirements for our  
usage of the application. Though I'm sure such "dynamic loading" might  
be of strong interest for others in the community.

Cheers,
Mark
___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-08 Thread David Huynh

Stefano Mazzocchi wrote:
> David Huynh wrote:
> [snip]
>   
>> The "interactive session" is different from the normal server session. 
>> If you open two browser tabs or two browser windows pointing to 2 
>> different backstaged exhibits, you have only 1 server session but 2 
>> "interactive sessions". If there is no interactive session concept, your 
>> interactions with those 2 exhibits will get mixed up. This is a 
>> technical challenge not too often encountered in web applications.
>> 
> What's the difference between such 'interactive sessions' and a webapp 
> continuation?
>   
I think they both boil down to the same implementation--basically some 
ID to identify a particular ongoing transaction. Conceptually, however, 
it's easier for me personally to think about interactive sessions than 
about continuations. It shouldn't be too hard to re-factor it to use 
continuations.

David

___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-08 Thread Mark Diggory

On Feb 7, 2008, at 4:44 PM, David Huynh wrote:

> The JSONP protocol will be pretty specific to Backstage. If there's a
> desire to load data through SPARQL query, the right place to hook in
> would be between the server-side code of Backstage and the SPARQL end
> point. Right now server-side Backstage formulates its queries to the
> triple store by putting together Sesame "query algebra trees". If  
> SPARQL
> is as expressive as Sesame's query algebra (supporting GROUP, COUNT,
> MIN, MAX), then it shouldn't be hard to swap in a SPARQL end point.

If I recall correctly, Longwell itself doesn't use SPARQL queries  
right now as thats a recent addition to Sesame.  It doesn't sound like  
your work is that far removed from the way that Longwell initializes a  
Sesame memory or native store off a source of RDF data.

Although a tangent... I thought it would be a great idea to expose the  
SPARQL engine of the Sesame instance in Longwell as an endpoint.  This  
would allow for a rather powerful and flexible exposure of a DSpace  
instances content within the semantic web.  On top of this, if  
services can be implemented on top of that engine, then they can be  
much more encapsulated and reusable, for instance a mapping between  
SRU/CQL and SPARQL and/or OAI and SPARQL would provide a the sysadmin  
greater configurability over how content is represented in those  
interfaces.

-Mark

~
Mark R. Diggory - DSpace Developer and Systems Manager
MIT Libraries, Systems and Technology Services
Massachusetts Institute of Technology

___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

RE: scaling up Exhibit - an early experiment

2008-02-08 Thread C Anthony Lewis

David,

This sounds really interesting but I keep getting an alert box telling me:
Caught exception: Error firing event of name onRootCollectionSet to wildcard 
handler
Details: TypeError : stategroupDoms[groupLevel - 1] has no properties

Maybe I'm doing something wrong (I'm using Windows/Firefox 2.0.0.12)?

Keep up the good work... I've played with a number of Simile projects so far 
and they're all fantastic!

Anthony


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David Huynh
Sent: 07 February 2008 17:49
To: General List
Subject: scaling up Exhibit - an early experiment

Hi all,

Some people have expressed a desire to use Exhibit on larger data sets,
and I have mentioned that there is an effort to address that need. This
is not a trivial engineering effort--it'll take months. But I'd like to
show you a very, very early experiment (codenamed Backstage) to explain
where we're heading.

Point your Firefox browser at:
http://people.csail.mit.edu/dfhuynh/misc/backstage-demo.html
(I will keep this demo up for 1 day only as this is running on my
own development machine.)
Note that there are 2383 items (only 20 are displayed, but the facets
are complete).

Take a look at the HTML source code. You'll see the usual simplicity
found in exhibits' HTML source code. Right now 2 different APIs are included

http://static.simile.mit.edu/exhibit/api-2.0/exhibit-api.js?autoCreate=false";>
http://dfhuynh.csail.mit.edu:8181/backstage/api/backstage-api.js";>

The Backstage API consists of Javascript code as well as Java code
running on my machine. In the future, the two APIs will be blended
together so that you'll only need to include exhibit-api.js and set a
flag, e.g.,

http://static.simile.mit.edu/exhibit/api-2.0/exhibit-api.js?backstage=true";>

But for now, the 2 APIs actually serve to make a point. There are 3
parties involved
- the data comes from wingerz.com
- the configuration of the exhibit comes from people.csail.mit.edu
- the actual computation (think facets) comes from
dfhuynh.csail.mit.edu:8181
This is an advanced form of mash-up where you "borrow" data from one
party (just by linking to it), "delegate" computations to another party,
and tie it all together with some simple HTML code. "Delegation" is done
automatically for you, and those computational resources you get for
free actually include a real database, spawned and configured on the fly
to meet your needs.

The current performance should be better than Exhibit for this data set,
but it has not been optimized, especially for several concurrent users,
and especially because I have an old machine. But it's conceivable that
we'll have a farm of fast machines all running Backstage, to which
exhibits with large data sets can delegate automatically.

(I can explain the inner technical workings of Backstage in a subsequent
email if anyone is interested to know.)

Cheers,

David

___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-07 Thread Herman Tolentino

David,

This is way cool! I am also interested to know how Backstage works.

Herman

On Feb 7, 2008 12:49 PM, David Huynh <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> Some people have expressed a desire to use Exhibit on larger data sets,
> and I have mentioned that there is an effort to address that need. This
> is not a trivial engineering effort--it'll take months. But I'd like to
> show you a very, very early experiment (codenamed Backstage) to explain
> where we're heading.
>
> Point your Firefox browser at:
> http://people.csail.mit.edu/dfhuynh/misc/backstage-demo.html
> (I will keep this demo up for 1 day only as this is running on my
> own development machine.)
> Note that there are 2383 items (only 20 are displayed, but the facets
> are complete).
>
> Take a look at the HTML source code. You'll see the usual simplicity
> found in exhibits' HTML source code. Right now 2 different APIs are included
>
>  src="http://static.simile.mit.edu/exhibit/api-2.0/exhibit-api.js?autoCreate=false";>
>  src="http://dfhuynh.csail.mit.edu:8181/backstage/api/backstage-api.js";>
>
> The Backstage API consists of Javascript code as well as Java code
> running on my machine. In the future, the two APIs will be blended
> together so that you'll only need to include exhibit-api.js and set a
> flag, e.g.,
>
>  src="http://static.simile.mit.edu/exhibit/api-2.0/exhibit-api.js?backstage=true";>
>
> But for now, the 2 APIs actually serve to make a point. There are 3
> parties involved
> - the data comes from wingerz.com
> - the configuration of the exhibit comes from people.csail.mit.edu
> - the actual computation (think facets) comes from
> dfhuynh.csail.mit.edu:8181
> This is an advanced form of mash-up where you "borrow" data from one
> party (just by linking to it), "delegate" computations to another party,
> and tie it all together with some simple HTML code. "Delegation" is done
> automatically for you, and those computational resources you get for
> free actually include a real database, spawned and configured on the fly
> to meet your needs.
>
> The current performance should be better than Exhibit for this data set,
> but it has not been optimized, especially for several concurrent users,
> and especially because I have an old machine. But it's conceivable that
> we'll have a farm of fast machines all running Backstage, to which
> exhibits with large data sets can delegate automatically.
>
> (I can explain the inner technical workings of Backstage in a subsequent
> email if anyone is interested to know.)
>
> Cheers,
>
> David
>
> ___
> General mailing list
> General@simile.mit.edu
> http://simile.mit.edu/mailman/listinfo/general
>
___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-07 Thread Neil Ireson


Hi David,

Wow, that sounds fantastic. 

Regarding your mentioning of...

Regarding using Backstage on licensed/private data, it is possible to 
install Backstage on your own server and tell Exhibit to use that 
instance rather than a public Backstage service. The experiment right 
now does not include that option, but there is no technical challenge 
there. You actually might even want to run Backstage yourself and 
connect it directly to a local Sesame store if your data is too large to 
transfer as a JSON file.
 I would be very interested in this functionality (the linking to a local 
Sesame Store), I know in some ways this goes against your principles of 
"light-weight" publishing. However I would have thought that it is only natural 
that as the user demands more "advanced" functionality and customisation then 
they might reasonable expect to have to do more than alter some parameters, or 
even overload a function.

Though I fear I would be little or no help in terms of development, I would 
gladly help in and way my time and ability allow (e.g. user testing).

N



_
Get Hotmail on your mobile, text MSN to 63463!
http://mobile.uk.msn.com/pc/mail.aspx___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-07 Thread Stefano Mazzocchi

David Huynh wrote:

[snip]

> The "interactive session" is different from the normal server session. 
> If you open two browser tabs or two browser windows pointing to 2 
> different backstaged exhibits, you have only 1 server session but 2 
> "interactive sessions". If there is no interactive session concept, your 
> interactions with those 2 exhibits will get mixed up. This is a 
> technical challenge not too often encountered in web applications.

What's the difference between such 'interactive sessions' and a webapp 
continuation?

-- 
Stefano.

___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-07 Thread Robert E. Moran

Ditto

Scott Longberry wrote:
> Very nice David!
>
>   
>> (I can explain the inner technical workings of Backstage in a  
>> subsequent
>> email if anyone is interested to know.)
>>
>> 
>
> Please put my name on the list for when this email goes out. I'm very  
> interested in knowing the details.
>
> We currently use Exhibit to display data that could potentially get  
> into the thousands of records and I limit the number of records  
> returned to a couple of hundred to avoid performance problems (the  
> records are fairly big and contain a bit of data, so a couple of  
> hundred is a low number but anything larger tends to lock up the  
> browser).
>
> Thanks,
>
> Scott
>
>
> ___
> General mailing list
> General@simile.mit.edu
> http://simile.mit.edu/mailman/listinfo/general
>
>
>   
___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

2008-02-07 Thread David Huynh

Please let me try to address all the questions that came up in one shot 
here. First, as I said, this is a very, very early experiment, so there 
are still many technical unknowns and few design decisions totally 
committed [waving hands] :-) I'm glad to see there's so much interest in 
this scalability issue.

The goal of Backstage from my perspective is to support larger data 
sets, compared to Exhibit, while retaining the same ease of authoring as 
Exhibit. This translates to "no server-side software installation, no 
database to set up/configure, only a bit of HTML, etc." The demo you see 
is technological progress on this front--the conclusion to draw is that 
this goal is achievable. :-)

Regarding the transition from small to large data sets, when everything 
is ready, for the common case, hopefully all you have to do is append 
?backstage=true to exhibit-api.js.

Regarding the relationship with Longwell and Longwell-CSI, that's 
harder/too early to say. Well, each project was created for a different 
purpose with different criteria, etc.

Regarding using Backstage on licensed/private data, it is possible to 
install Backstage on your own server and tell Exhibit to use that 
instance rather than a public Backstage service. The experiment right 
now does not include that option, but there is no technical challenge 
there. You actually might even want to run Backstage yourself and 
connect it directly to a local Sesame store if your data is too large to 
transfer as a JSON file.

David Karger suggests building Backstage into a Firefox extension...

-

Now onto the technical details...

When backstage-demo.html is loaded onto your browser, the Javascript 
code of Exhibit and Backstage gets loaded and executed.

Client-side Backstage randomizes an "interactive session ID" and 
requests an interactive session with server-side Backstage at
http://dfhuynh.csail.mit.edu:8181/
through a JSONP transport. That just means that to call the server 
portion of Backstage, the client portion of Backstage appends

Re: scaling up Exhibit - an early experiment

2008-02-07 Thread Jon Crump

David,

This is great news! I too am interested in the relationship between 
backstage and longwell-csi. I presume you'll be suggesting migration paths 
from one to the other? do you intend for backstage to have the same kind 
of timeline support as is present in exhibit2.0 ?

I will follow your explanations of the inner workings of backstage with 
interest.

Great work!
Best,
Jon

On Thu, 7 Feb 2008, David Huynh wrote:

> Hi all,
>
> Some people have expressed a desire to use Exhibit on larger data sets,
> and I have mentioned that there is an effort to address that need. This
> is not a trivial engineering effort--it'll take months. But I'd like to
> show you a very, very early experiment (codenamed Backstage) to explain
> where we're heading.
>
> Point your Firefox browser at:
>http://people.csail.mit.edu/dfhuynh/misc/backstage-demo.html
>(I will keep this demo up for 1 day only as this is running on my
> own development machine.)
> Note that there are 2383 items (only 20 are displayed, but the facets
> are complete).
>
> Take a look at the HTML source code. You'll see the usual simplicity
> found in exhibits' HTML source code. Right now 2 different APIs are included
>
> src="http://static.simile.mit.edu/exhibit/api-2.0/exhibit-api.js?autoCreate=false";>
> src="http://dfhuynh.csail.mit.edu:8181/backstage/api/backstage-api.js";>
>
> The Backstage API consists of Javascript code as well as Java code
> running on my machine. In the future, the two APIs will be blended
> together so that you'll only need to include exhibit-api.js and set a
> flag, e.g.,
>
> src="http://static.simile.mit.edu/exhibit/api-2.0/exhibit-api.js?backstage=true";>
>
> But for now, the 2 APIs actually serve to make a point. There are 3
> parties involved
>- the data comes from wingerz.com
>- the configuration of the exhibit comes from people.csail.mit.edu
>- the actual computation (think facets) comes from
> dfhuynh.csail.mit.edu:8181
> This is an advanced form of mash-up where you "borrow" data from one
> party (just by linking to it), "delegate" computations to another party,
> and tie it all together with some simple HTML code. "Delegation" is done
> automatically for you, and those computational resources you get for
> free actually include a real database, spawned and configured on the fly
> to meet your needs.
>
> The current performance should be better than Exhibit for this data set,
> but it has not been optimized, especially for several concurrent users,
> and especially because I have an old machine. But it's conceivable that
> we'll have a farm of fast machines all running Backstage, to which
> exhibits with large data sets can delegate automatically.
>
> (I can explain the inner technical workings of Backstage in a subsequent
> email if anyone is interested to know.)
>
> Cheers,
>
> David
>
> ___
> General mailing list
> General@simile.mit.edu
> http://simile.mit.edu/mailman/listinfo/general
>
___
General mailing list
General@simile.mit.edu
http://simile.mit.edu/mailman/listinfo/general

RE: scaling up Exhibit - an early experiment

2008-02-07 Thread Smith, Christopher (GE Indust, ConsInd)

Very, nice.

David,

Outstanding, just can't wait for this to be in production!

Btw, I am indeed interested  the details of the Backstage api technical
inner workings.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of David Huynh
Sent: Thursday, February 07, 2008 12:49 PM
To: General List
Subject: scaling up Exhibit - an early experiment

Hi all,

Some people have expressed a desire to use Exhibit on larger data sets,
and I have mentioned that there is an effort to address that need. This
is not a trivial engineering effort--it'll take months. But I'd like to
show you a very, very early experiment (codenamed Backstage) to explain
where we're heading.

Point your Firefox browser at:
http://people.csail.mit.edu/dfhuynh/misc/backstage-demo.html
(I will keep this demo up for 1 day only as this is running on my
own development machine.) Note that there are 2383 items (only 20 are
displayed, but the facets are complete).

Take a look at the HTML source code. You'll see the usual simplicity
found in exhibits' HTML source code. Right now 2 different APIs are
included

http://static.simile.mit.edu/exhibit/api-2.0/exhibit-api.js?autoCre
ate=false">
http://dfhuynh.csail.mit.edu:8181/backstage/api/backstage-api.js";><
/script>

The Backstage API consists of Javascript code as well as Java code
running on my machine. In the future, the two APIs will be blended
together so that you'll only need to include exhibit-api.js and set a
flag, e.g.,