Re: redo log for solr

2014-08-29 Thread Dmitry Kan
@Shawn: the suggestion about custom UP you made proves to be really useful.
Thanks!

It is also great to learn the internal solr API of searching documents
(need to find documents in Solr and store in a paper trail before they get
deleted).

Dmitry


On Thu, Aug 28, 2014 at 4:25 PM, Dmitry Kan solrexp...@gmail.com wrote:

 It may mean that I wasn't clear enough :)

 The idea is to build a paper trail system (without negative connotation!).
 Such that for instance if user deleted some data _by mistake_ and we have
 hard-committed to solr (upon which the tlog has been truncated), we paper
 trail'ed the document before the delete for providing the restore
 functionality.

 So if tlog is meant to make soft commits durable, this feature will serve
 more like undo functionality and persist the _history_ of modifications.

 I'm currently investigating what you suggested over IRC -- the
 UpdateProcessor. Looks like a way to go.

 Thanks,

 Dmitry


 On Thu, Aug 28, 2014 at 4:16 PM, Shawn Heisey s...@elyograg.org wrote:

 On 8/28/2014 3:10 AM, Dmitry Kan wrote:
  We have a case when any actions a user did to the solr shard should be
  recorded for a possible later replay. This way we are looking at per
 user
  replay feature such that if the user did something wrong accidentally or
  because of a system level bug, we could restore a previous state.
 
  Two actions are available:
 
  1. INSERT new solr document
  2. DELETE existing solr document
 
  If user wants to perform an update on the existing document, we first
  delete it and insert a new one with modified fields.
 
  Are there any existing components / solutions in the Solr universe that
  could help implement this?

 I'm wondering what functionality you need beyond what Solr already
 provides ... because it sounds like Solr already does a lot of what you
 are implementing.

 Solr already includes a transaction log that records all changes to the
 index.  Each individual log is closed when you do a hard commit.  Enough
 transaction logs are kept so that Solr can replay at least the last 100
 transactions.  The entire transaction log is replayed when Solr is
 restarted or a core is reloaded.

 What you describe where you delete an existing document before inserting
 a new one ... Solr already has that functionality built in, using the
 uniqueKey.  That capability is further extended by the Atomic Update
 functionality.

 You're not new around here, so I don't think I'm telling you anything
 you don't already know ... which may mean that I'm missing something. :)

 Thanks,
 Shawn




 --
 Dmitry Kan
 Blog: http://dmitrykan.blogspot.com
 Twitter: http://twitter.com/dmitrykan
 SemanticAnalyzer: www.semanticanalyzer.info




-- 
Dmitry Kan
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


redo log for solr

2014-08-28 Thread Dmitry Kan
Hello solr users!

We have a case when any actions a user did to the solr shard should be
recorded for a possible later replay. This way we are looking at per user
replay feature such that if the user did something wrong accidentally or
because of a system level bug, we could restore a previous state.

Two actions are available:

1. INSERT new solr document
2. DELETE existing solr document

If user wants to perform an update on the existing document, we first
delete it and insert a new one with modified fields.

Are there any existing components / solutions in the Solr universe that
could help implement this?

Dmitry

-- 
Dmitry Kan
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Re: redo log for solr

2014-08-28 Thread Shawn Heisey
On 8/28/2014 3:10 AM, Dmitry Kan wrote:
 We have a case when any actions a user did to the solr shard should be
 recorded for a possible later replay. This way we are looking at per user
 replay feature such that if the user did something wrong accidentally or
 because of a system level bug, we could restore a previous state.
 
 Two actions are available:
 
 1. INSERT new solr document
 2. DELETE existing solr document
 
 If user wants to perform an update on the existing document, we first
 delete it and insert a new one with modified fields.
 
 Are there any existing components / solutions in the Solr universe that
 could help implement this?

I'm wondering what functionality you need beyond what Solr already
provides ... because it sounds like Solr already does a lot of what you
are implementing.

Solr already includes a transaction log that records all changes to the
index.  Each individual log is closed when you do a hard commit.  Enough
transaction logs are kept so that Solr can replay at least the last 100
transactions.  The entire transaction log is replayed when Solr is
restarted or a core is reloaded.

What you describe where you delete an existing document before inserting
a new one ... Solr already has that functionality built in, using the
uniqueKey.  That capability is further extended by the Atomic Update
functionality.

You're not new around here, so I don't think I'm telling you anything
you don't already know ... which may mean that I'm missing something. :)

Thanks,
Shawn



Re: redo log for solr

2014-08-28 Thread Dmitry Kan
It may mean that I wasn't clear enough :)

The idea is to build a paper trail system (without negative connotation!).
Such that for instance if user deleted some data _by mistake_ and we have
hard-committed to solr (upon which the tlog has been truncated), we paper
trail'ed the document before the delete for providing the restore
functionality.

So if tlog is meant to make soft commits durable, this feature will serve
more like undo functionality and persist the _history_ of modifications.

I'm currently investigating what you suggested over IRC -- the
UpdateProcessor. Looks like a way to go.

Thanks,

Dmitry


On Thu, Aug 28, 2014 at 4:16 PM, Shawn Heisey s...@elyograg.org wrote:

 On 8/28/2014 3:10 AM, Dmitry Kan wrote:
  We have a case when any actions a user did to the solr shard should be
  recorded for a possible later replay. This way we are looking at per user
  replay feature such that if the user did something wrong accidentally or
  because of a system level bug, we could restore a previous state.
 
  Two actions are available:
 
  1. INSERT new solr document
  2. DELETE existing solr document
 
  If user wants to perform an update on the existing document, we first
  delete it and insert a new one with modified fields.
 
  Are there any existing components / solutions in the Solr universe that
  could help implement this?

 I'm wondering what functionality you need beyond what Solr already
 provides ... because it sounds like Solr already does a lot of what you
 are implementing.

 Solr already includes a transaction log that records all changes to the
 index.  Each individual log is closed when you do a hard commit.  Enough
 transaction logs are kept so that Solr can replay at least the last 100
 transactions.  The entire transaction log is replayed when Solr is
 restarted or a core is reloaded.

 What you describe where you delete an existing document before inserting
 a new one ... Solr already has that functionality built in, using the
 uniqueKey.  That capability is further extended by the Atomic Update
 functionality.

 You're not new around here, so I don't think I'm telling you anything
 you don't already know ... which may mean that I'm missing something. :)

 Thanks,
 Shawn




-- 
Dmitry Kan
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info