Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-26 Thread helix84
For running from cron it's easier to specify a start date and duration rather than end date (which you have to calculate). Although we'd need to make sure that plays well with differing lengths of months, i.e. we should guarantee that if you specify 31 days in a month that has 28, the 3 days that

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-26 Thread Tim Donohue
I wonder how hard the incremental export is to implement? If it's really not that complex overall, then it seems like it'd be a quick win for just doing the Solr Stats backups in general. If I want to ensure my Solr Stats are safe in DSpace 5, my only real option is to back them up via a CSV

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-26 Thread Mark H. Wood
This is a great tool for recovering from a sticky situation of our own making. Thank you! I think, though, that at this point we ought to consider it a one-time repair rather than a routine maintenance tool. Leaving Solr as the primary long-term storage for these data seems unwise, and if we

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-26 Thread Andrea Schweer
Hi again, just a quick update before my weekend starts -- I've updated my pull request with code that does a loss-less reindex and also uses the time field in the export queries. It can't actually do incremental exports yet, and the new reindex functionality has to be run using dsrun for now

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-26 Thread Andrea Schweer
Hi, On 27/03/15 02:31, Tim Donohue wrote: I wonder how hard the incremental export is to implement? If it's really not that complex overall, then it seems like it'd be a quick win for just doing the Solr Stats backups in general. It's easy-ish, I just can't think of a way that isn't at

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-26 Thread helix84
On Thu, Mar 26, 2015 at 2:31 PM, Tim Donohue tdono...@duraspace.org wrote: As a sidenote: with regards to the Authority index, it seems like the data in that index is possible to *repopulate* from the existing metadata in the database (using ./dspace index-authority). So, it seems like that

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-26 Thread helix84
As discussed in DevMtg yesterday, I summed up the current state and possible development directions of statistics on the wiki: https://wiki.duraspace.org/display/DSPACE/DSpace+statistics+-+current+status+and+future+development Regards, ~~helix84

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-25 Thread helix84
Hi Tim, CSV export may be adequate for backup, but one important thing suggested here was an event consumer that would write to a persistent store (which could be CSV files). We currently don't have a persistent store. Regards, ~~helix84 Compulsory reading: DSpace Mailing List Etiquette

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-25 Thread Mark H. Wood
On Tue, Mar 24, 2015 at 11:34:44AM -0400, Peter Dietz wrote: Also. What are people thinking would be a safe preservation location for usage events? i.e. for people concerned about resources. What I've been thinking is duplicated DVD-ROMs in fire-insulated storages, right alongside of content

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-25 Thread Tim Donohue
Hi All, Just to bring this thread back to the original question of how we use Solr to store statistics (and also authority info for that matter). Personally, I agree that having statistics authority information stored *solely* in Solr is dangerous. As mentioned, Solr is primarily meant as an

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-25 Thread Monika C. Mevenkamp
I second many of helix’s points Storing Stats data in an external product relies on that product to be around - something that google does not guarantee Whether to use google analytics for anything other than getting nice stats right now very much depends on whether google provides a data

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-25 Thread helix84
On Tue, Mar 24, 2015 at 5:27 PM, TAYLOR Robin robin.tay...@ed.ac.uk wrote: Hi Peter, The short answer is I don't know, but a quick bit of investigation suggests possibly maybe :) . There does appear to be an import facility https://support.google.com/analytics/answer/3191589?hl=en-GB ,

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-25 Thread Tim Donohue
Hi helix84, So, it seems like there's two possible routes to take here: 1. An event consumer writes directly to Solr. The persistent store is then simply a dump from Solr to CSV. 2. An event consumer writes directly to CSV. Solr then indexes those CSVs. So, my question is whether #2 is really

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-25 Thread helix84
Hi Tim, sorry, I think you missed my point about the persistent store. Your 1) is not a persistent store, it's a snapshot of a cache. Yes, I know we've been treating it as if it were a permanent store, but that's what this whole issue is about. It's really not as important what the form or

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-25 Thread Mark H. Wood
On Wed, Mar 25, 2015 at 09:54:02AM -0500, Tim Donohue wrote: Hi helix84, So, it seems like there's two possible routes to take here: 1. An event consumer writes directly to Solr. The persistent store is then simply a dump from Solr to CSV. 2. An event consumer writes directly to CSV.

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-25 Thread Andrea Schweer
Hi all, On 26/03/15 03:07, Tim Donohue wrote: In DSpace 5, we obviously already have a basic version of a backup to CSV for statistics:

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-25 Thread Andrea Schweer
Hi again, On 26/03/15 11:19, Andrea Schweer wrote: Another gap in my code is incremental exports. At the moment, the export part of my code dumps all of the data. I think it would be nice for back-up purposes to be able to specify a start date from which to export, so that people can

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-24 Thread Peter Dietz
Just to ask a follow up question about Google Analytics. Say I have all of my data (comm, coll, item views, bitstream downloads) for as long as I've been collecting it in SOLR or Elastic Search (many years). Is it possible to write a converter, and push this legacy information to Google

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-24 Thread TAYLOR Robin
Hi Peter, The short answer is I don't know, but a quick bit of investigation suggests possibly maybe :) . There does appear to be an import facility https://support.google.com/analytics/answer/3191589?hl=en-GB , but what is not clear to me at first reading is whether it just allows you to

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-17 Thread TAYLOR Robin
Hi Andrea, You are quite right about the download stats, I had forgotten that. Cheers. Robin Taylor Main Library University of Edinburgh From: Andrea Schweer schw...@waikato.ac.nz Sent: 16 March 2015 02:31 To: TAYLOR Robin Cc: DSpace Developers Subject:

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-15 Thread Andrea Schweer
Hi Robin, On 14/03/15 05:22, TAYLOR Robin wrote: Just a wee point about GA stats with apologies if I am stating the obvious. You can present data going back as long as you have been collecting it, not just from the moment you enable the DSpace GA Stats XMLUI aspect. As long as you have been

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-13 Thread TAYLOR Robin
Hi Andrea, Just a wee point about GA stats with apologies if I am stating the obvious. You can present data going back as long as you have been collecting it, not just from the moment you enable the DSpace GA Stats XMLUI aspect. Cheers, Robin. Robin Taylor Main Library University of Edinburgh

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-12 Thread Peter Dietz
ES is equally guilty of being a statistics data source, by storing original/raw. So, statistics is something that complicates DSpace's role in preserving assets, since stats are a value-add, and not a core repository function. But, since repo managers enjoy statistics, we can't not offer

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-12 Thread Andrea Schweer
Hi Peter, all, On 13/03/15 07:35, Peter Dietz wrote: ES is equally guilty of being a statistics data source, by storing original/raw. So, statistics is something that complicates DSpace's role in preserving assets, since stats are a value-add, and not a core repository function. But, since

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-12 Thread Hilton Gibson
On 12 March 2015 at 20:35, Peter Dietz pe...@longsight.com wrote: But, since repo managers enjoy statistics, we can't not offer statistics. I would however like to offload the role of stats to a third party, such as Google Analytics though. ​Or Piwik, see: http://piwik.org​ *Hilton Gibson*

Re: [Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-12 Thread Brian Freels-Stendel
I've always been leery of statistics like those DSpace keeps. They're more akin to a library patron picking a book off a shelf and setting it on a table, rather than actually using it for anything. (I know, getting rid of them would bring masses of pitchfork-toting authors to all our doors.)

[Dspace-devel] We need to think a bit more about how we use the 'statistics' Solr core

2015-03-11 Thread Mark H. Wood
Several recent issues (DS-2337, DS-2487, and perhaps DS-2488) suggest that we should step back and take a long look at how we are using the Solr 'statistics' core. Solr seems designed for use as a cache. That's how the other cores are used: they can be refreshed from data in the database and