For running from cron it's easier to specify a start date and
duration rather than end date (which you have to calculate). Although
we'd need to make sure that plays well with differing lengths of
months, i.e. we should guarantee that if you specify 31 days in a
month that has 28, the 3 days that
I wonder how hard the incremental export is to implement? If it's
really not that complex overall, then it seems like it'd be a quick win
for just doing the Solr Stats backups in general.
If I want to ensure my Solr Stats are safe in DSpace 5, my only real
option is to back them up via a CSV
This is a great tool for recovering from a sticky situation of our own
making. Thank you!
I think, though, that at this point we ought to consider it a one-time
repair rather than a routine maintenance tool. Leaving Solr as the
primary long-term storage for these data seems unwise, and if we
Hi again,
just a quick update before my weekend starts -- I've updated my pull
request with code that does a loss-less reindex and also uses the time
field in the export queries. It can't actually do incremental exports
yet, and the new reindex functionality has to be run using dsrun for now
Hi,
On 27/03/15 02:31, Tim Donohue wrote:
I wonder how hard the incremental export is to implement? If it's
really not that complex overall, then it seems like it'd be a quick
win for just doing the Solr Stats backups in general.
It's easy-ish, I just can't think of a way that isn't at
On Thu, Mar 26, 2015 at 2:31 PM, Tim Donohue tdono...@duraspace.org wrote:
As a sidenote: with regards to the Authority index, it seems like the
data in that index is possible to *repopulate* from the existing
metadata in the database (using ./dspace index-authority). So, it seems
like that
As discussed in DevMtg yesterday, I summed up the current state and
possible development directions of statistics on the wiki:
https://wiki.duraspace.org/display/DSPACE/DSpace+statistics+-+current+status+and+future+development
Regards,
~~helix84
Hi Tim,
CSV export may be adequate for backup, but one important thing
suggested here was an event consumer that would write to a
persistent store (which could be CSV files). We currently don't have
a persistent store.
Regards,
~~helix84
Compulsory reading: DSpace Mailing List Etiquette
On Tue, Mar 24, 2015 at 11:34:44AM -0400, Peter Dietz wrote:
Also. What are people thinking would be a safe preservation location for
usage events? i.e. for people concerned about resources.
What I've been thinking is duplicated DVD-ROMs in fire-insulated
storages, right alongside of content
Hi All,
Just to bring this thread back to the original question of how we use
Solr to store statistics (and also authority info for that matter).
Personally, I agree that having statistics authority information
stored *solely* in Solr is dangerous. As mentioned, Solr is primarily
meant as an
I second many of helix’s points
Storing Stats data in an external product relies on that product to be around -
something that google does not guarantee
Whether to use google analytics for anything other than getting nice stats
right now very much depends on whether google provides a data
On Tue, Mar 24, 2015 at 5:27 PM, TAYLOR Robin robin.tay...@ed.ac.uk wrote:
Hi Peter,
The short answer is I don't know, but a quick bit of investigation suggests
possibly maybe :) . There does appear to be an import facility
https://support.google.com/analytics/answer/3191589?hl=en-GB ,
Hi helix84,
So, it seems like there's two possible routes to take here:
1. An event consumer writes directly to Solr. The persistent store is
then simply a dump from Solr to CSV.
2. An event consumer writes directly to CSV. Solr then indexes those CSVs.
So, my question is whether #2 is really
Hi Tim,
sorry, I think you missed my point about the persistent store. Your 1)
is not a persistent store, it's a snapshot of a cache. Yes, I know
we've been treating it as if it were a permanent store, but that's
what this whole issue is about.
It's really not as important what the form or
On Wed, Mar 25, 2015 at 09:54:02AM -0500, Tim Donohue wrote:
Hi helix84,
So, it seems like there's two possible routes to take here:
1. An event consumer writes directly to Solr. The persistent store is
then simply a dump from Solr to CSV.
2. An event consumer writes directly to CSV.
Hi all,
On 26/03/15 03:07, Tim Donohue wrote:
In DSpace 5, we obviously already have a basic version of a backup to
CSV for statistics:
Hi again,
On 26/03/15 11:19, Andrea Schweer wrote:
Another gap in my code is incremental exports. At the moment, the
export part of my code dumps all of the data. I think it would be nice
for back-up purposes to be able to specify a start date from which to
export, so that people can
Just to ask a follow up question about Google Analytics. Say I have all of
my data (comm, coll, item views, bitstream downloads) for as long as I've
been collecting it in SOLR or Elastic Search (many years). Is it possible
to write a converter, and push this legacy information to Google
Hi Peter,
The short answer is I don't know, but a quick bit of investigation suggests
possibly maybe :) . There does appear to be an import facility
https://support.google.com/analytics/answer/3191589?hl=en-GB , but what is not
clear to me at first reading is whether it just allows you to
Hi Andrea,
You are quite right about the download stats, I had forgotten that.
Cheers.
Robin Taylor
Main Library
University of Edinburgh
From: Andrea Schweer schw...@waikato.ac.nz
Sent: 16 March 2015 02:31
To: TAYLOR Robin
Cc: DSpace Developers
Subject:
Hi Robin,
On 14/03/15 05:22, TAYLOR Robin wrote:
Just a wee point about GA stats with apologies if I am stating the obvious.
You can present data going back as long as you have been collecting it, not
just from the moment you enable the DSpace GA Stats XMLUI aspect.
As long as you have been
Hi Andrea,
Just a wee point about GA stats with apologies if I am stating the obvious. You
can present data going back as long as you have been collecting it, not just
from the moment you enable the DSpace GA Stats XMLUI aspect.
Cheers, Robin.
Robin Taylor
Main Library
University of Edinburgh
ES is equally guilty of being a statistics data source, by storing
original/raw. So, statistics is something that complicates DSpace's role in
preserving assets, since stats are a value-add, and not a core repository
function. But, since repo managers enjoy statistics, we can't not offer
Hi Peter, all,
On 13/03/15 07:35, Peter Dietz wrote:
ES is equally guilty of being a statistics data source, by storing
original/raw. So, statistics is something that complicates DSpace's
role in preserving assets, since stats are a value-add, and not a core
repository function. But, since
On 12 March 2015 at 20:35, Peter Dietz pe...@longsight.com wrote:
But, since repo managers enjoy statistics, we can't not offer statistics.
I would however like to offload the role of stats to a third party, such as
Google Analytics though.
Or Piwik, see: http://piwik.org
*Hilton Gibson*
I've always been leery of statistics like those DSpace keeps. They're more
akin to a library patron picking a book off a shelf and setting it on a table,
rather than actually using it for anything. (I know, getting rid of them would
bring masses of pitchfork-toting authors to all our doors.)
Several recent issues (DS-2337, DS-2487, and perhaps DS-2488) suggest
that we should step back and take a long look at how we are using the
Solr 'statistics' core.
Solr seems designed for use as a cache. That's how the other cores
are used: they can be refreshed from data in the database and
27 matches
Mail list logo