Re: Regarding Transaction logging

2011-10-26 Thread Simon Willnauer
I uploaded a patch to LUCENE-3424 which implements sequence ids for IW. Add, update and delete returns a long seqID for every operation and commit returns the largest committed seq id. When writing transaction logs or a journal (however you wanna call it) - the biggest problem here is that in a

Re: Regarding Transaction logging

2011-09-11 Thread Michael McCandless
I agree: we should figure out just how an app would effectively make use of this seq ID, in order to understand if this really is gonna work end to end. Else we shouldn't change Lucene's core APIs. EG: could ES remove its lock array if Lucene returned a seq ID? How bad is it that

Re: Regarding Transaction logging

2011-09-10 Thread Simon Willnauer
On Thu, Sep 8, 2011 at 5:35 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Sep 8, 2011 at 11:26 AM, Michael McCandless luc...@mikemccandless.com wrote: Returning a long seqID seems the least invasive change to make this total ordering possible?  Especially since the DWDQ already

Re: Regarding Transaction logging

2011-09-09 Thread Simon Willnauer
I created LUCENE-3424 for this. But I still would like to keep the discussion open here rather than moving this entirely to an issue. There is more about this than only the seq. ids. simon On Thu, Sep 8, 2011 at 5:35 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Sep 8, 2011 at

Re: Regarding Transaction logging

2011-09-09 Thread Andrzej Bialecki
On 09/09/2011 11:00, Simon Willnauer wrote: I created LUCENE-3424 for this. But I still would like to keep the discussion open here rather than moving this entirely to an issue. There is more about this than only the seq. ids. I'm concerned also about the content of the transaction log. In

Re: Regarding Transaction logging

2011-09-09 Thread eks dev
+1 indeed! All possibilities are are needed. One might do wild things if it is somehow typed. For example, dictionary compression for fields that are tokenized (not only stored), as we already have Term dictionary supporting ord-s. Keeping just a map Token - ord with transaction log... On

Re: Regarding Transaction logging

2011-09-09 Thread Andrzej Bialecki
On 09/09/2011 12:07, eks dev wrote: +1 indeed! All possibilities are are needed. One might do wild things if it is somehow typed. For example, dictionary compression for fields that are tokenized (not only stored), as we already have Term dictionary supporting ord-s. Keeping just a map Token-

Re: Regarding Transaction logging

2011-09-09 Thread eks dev
I didn't think, it was just a spontaneous reaction :) At the moment I am using static dictionaries to at least get a grip on size of stored fields (escaping encoded terms) Re: Global Maybe the trick would be to somehow use term dictionary as it must be *eventually* updated? An idea is to write

Re: Regarding Transaction logging

2011-09-09 Thread Andrzej Bialecki
On 09/09/2011 13:20, eks dev wrote: I didn't think, it was just a spontaneous reaction :) At the moment I am using static dictionaries to at least get a grip on size of stored fields (escaping encoded terms) Re: Global Maybe the trick would be to somehow use term dictionary as it must be

Re: Regarding Transaction logging

2011-09-09 Thread Simon Willnauer
On Fri, Sep 9, 2011 at 11:19 AM, Andrzej Bialecki a...@getopt.org wrote: On 09/09/2011 11:00, Simon Willnauer wrote: I created LUCENE-3424 for this. But I still would like to keep the discussion open here rather than moving this entirely to an issue. There is more about this than only the

Regarding Transaction logging

2011-09-08 Thread Simon Willnauer
hey folks, we already have transaction logging on Solr side so I should have started this discussion earlier. However, I want to bring this up to the list since I think this is a very valuable feature also for plain Lucene users and eventually this should also be available to them. I don't think

Re: Regarding Transaction logging

2011-09-08 Thread Andrzej Bialecki
On 08/09/2011 11:35, Simon Willnauer wrote: hey folks, we already have transaction logging on Solr side so I should have started this discussion earlier. However, I want to bring this up to the list since I think this is a very valuable feature also for plain Lucene users and eventually this

Re: Regarding Transaction logging

2011-09-08 Thread Yonik Seeley
On Thu, Sep 8, 2011 at 5:35 AM, Simon Willnauer simon.willna...@googlemail.com wrote: I don't think this needs to be a core feature at all but I think we need to provide the necessary hooks in Lucene core to make this reliable and consistent. I've thought about it a little - it would be really

Re: Regarding Transaction logging

2011-09-08 Thread Jason Rutherglen
The delete by query is solved by recording the primary / UID of the document(s) deleted. It's only expensive if the transaction log implementation is not designed properly. :) On Thu, Sep 8, 2011 at 5:35 AM, Simon Willnauer simon.willna...@googlemail.com wrote: hey folks, we already have

Re: Regarding Transaction logging

2011-09-08 Thread Simon Willnauer
On Thu, Sep 8, 2011 at 4:21 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: The delete by query is solved by recording the primary / UID of the document(s) deleted.  It's only expensive if the transaction log implementation is not designed properly.  :) phew I don't think this is

Re: Regarding Transaction logging

2011-09-08 Thread Jason Rutherglen
This isn't a new problem. Databases have been around for what, 30+ years? On Thu, Sep 8, 2011 at 11:01 AM, Simon Willnauer simon.willna...@googlemail.com wrote: On Thu, Sep 8, 2011 at 4:21 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: The delete by query is solved by recording the

Re: Regarding Transaction logging

2011-09-08 Thread Simon Willnauer
On Thu, Sep 8, 2011 at 2:54 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Sep 8, 2011 at 5:35 AM, Simon Willnauer simon.willna...@googlemail.com wrote: I don't think this needs to be a core feature at all but I think we need to provide the necessary hooks in Lucene core to make

Re: Regarding Transaction logging

2011-09-08 Thread Michael McCandless
+1 for having a contrib/transactionlog that apps could use, outside of Solr/ElasticSearch. And it sounds like one cannot build such a thing unless one forces an order above Lucene (like ElasticSearch), or, we make it possible to see/control the order of ops inside IW? Even ES's approach is

Re: Regarding Transaction logging

2011-09-08 Thread Yonik Seeley
On Thu, Sep 8, 2011 at 11:26 AM, Michael McCandless luc...@mikemccandless.com wrote: Returning a long seqID seems the least invasive change to make this total ordering possible?  Especially since the DWDQ already computes this order... +1 This seems like the most powerful option. -Yonik