D23008: [baloo_file_extractor] Be more resistant to multiple QSocketNotifier events

2019-08-08 Thread Igor Poboiko
poboiko added a comment.


  In D23008#508782 , @bruns wrote:
  
  > @poboiko - is the problem you describe purely theoretical, or did you 
actually see it happen?
  >
  > Multiple triggers of `slotNewInput` are **not** a problem, the code flow 
reaches the `m_tr = new Transaction(...)` only iff a batch has been read. And 
batches are only issued when the  previous one has been signaled completed with 
the 'B' batch end marker.
  
  
  To be completely honest, it //is// theoretical. I do see it should not 
happen. Provided everything works as expected.
  There were just too many examples (even inside baloo), when critical things 
didn't work as expected, and got silently ignored due to asserts. I thought it 
won't hurt getting rid of those, replacing them with proper checks. Even if 
it's checks of stuff that's impossible.
  
  I did see extractor hanging though, without reliable way to reproduce it, and 
didn't have time to debug it when it happened.
  That's why I've started lurking around extractors code in the first place, 
looking for stuff like that.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D23008

To: poboiko, #baloo, bruns, ngraham
Cc: anthonyfieroni, broulik, kde-frameworks-devel, LeGast00n, fbampaloukas, 
domson, ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, abrahams


D23008: [baloo_file_extractor] Be more resistant to multiple QSocketNotifier events

2019-08-08 Thread Stefan Brüns
bruns added a comment.


  @poboiko - is the problem you describe purely theoretical, or did you 
actually see it happen?
  
  Multiple triggers of `slotNewInput` are **not** a problem, the code flow 
reaches the `m_tr = new Transaction(...)` only iff a batch has been read. And 
batches are only issued when the  previous one has been signaled completed with 
the 'B' batch end marker.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D23008

To: poboiko, #baloo, bruns, ngraham
Cc: anthonyfieroni, broulik, kde-frameworks-devel, LeGast00n, fbampaloukas, 
domson, ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, abrahams


D23008: [baloo_file_extractor] Be more resistant to multiple QSocketNotifier events

2019-08-08 Thread Igor Poboiko
poboiko added inline comments.

INLINE COMMENTS

> anthonyfieroni wrote in app.cpp:76-81
> processNextFile should be called till m_ids ends, why it does not happen, 
> probably that's why @bruns reject your patch

Well, if that happens, something went wrong, and we have to do something.
As far as I can see, the options are:

1. Ignore new batch (that would be simply `return`)
2. Ignore old batch (that is done now - committing something that was extracted 
from the old one if there is any)
3. Try to merge batches and process everything (that would be `m_ids.append(new 
batch)` and do not create a new transaction). This might require some 
additional housekeeping though, as we do not want resulting transaction to be 
larger than `batchSize`. And we probably do not want to do our own splitting 
here - as it would duplicate work done in the parent `baloo_file` process.

However, since ignored batch do not get removed from `ContentIndexingDB`, our 
parent process `baloo_file` will retry those documents eventually. I believe it 
should be pretty safe to just drop one of the batches here.

> broulik wrote in app.h:50
> It doens't have an EXPORT macro to it, so I suppose not?

It's not. It's only used inside `main.cpp` of `baloo_file_extractor`.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D23008

To: poboiko, #baloo, bruns, ngraham
Cc: anthonyfieroni, broulik, kde-frameworks-devel, LeGast00n, fbampaloukas, 
domson, ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, abrahams


D23008: [baloo_file_extractor] Be more resistant to multiple QSocketNotifier events

2019-08-07 Thread Anthony Fieroni
anthonyfieroni added inline comments.

INLINE COMMENTS

> app.cpp:76-81
> +if (m_tr) {
> +// Parent proccess yielded us a new batch while previous have not 
> yet finished
> +// This should not happen, but better play safe
> +m_tr->commit();
> +delete m_tr;
>  }

processNextFile should be called till m_ids ends, why it does not happen, 
probably that's why @bruns reject your patch

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D23008

To: poboiko, #baloo, bruns, ngraham
Cc: anthonyfieroni, broulik, kde-frameworks-devel, LeGast00n, fbampaloukas, 
domson, ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, abrahams


D23008: [baloo_file_extractor] Be more resistant to multiple QSocketNotifier events

2019-08-07 Thread Stefan Brüns
bruns requested changes to this revision.
This revision now requires changes to proceed.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D23008

To: poboiko, #baloo, bruns, ngraham
Cc: broulik, kde-frameworks-devel, LeGast00n, fbampaloukas, domson, 
ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, abrahams


D23008: [baloo_file_extractor] Be more resistant to multiple QSocketNotifier events

2019-08-07 Thread Nathaniel Graham
ngraham accepted this revision.
ngraham added a comment.
This revision is now accepted and ready to land.


  I can't reproduce the issue either, but this doesn't seem to cause any 
regressions that I can see, and the code makes sense to me.

REPOSITORY
  R293 Baloo

BRANCH
  extractor-transaction (branched from master)

REVISION DETAIL
  https://phabricator.kde.org/D23008

To: poboiko, #baloo, bruns, ngraham
Cc: broulik, kde-frameworks-devel, LeGast00n, fbampaloukas, domson, 
ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, abrahams


D23008: [baloo_file_extractor] Be more resistant to multiple QSocketNotifier events

2019-08-07 Thread Kai Uwe Broulik
broulik added inline comments.

INLINE COMMENTS

> ngraham wrote in app.h:50
> This isn't public to outside users, right?

It doens't have an EXPORT macro to it, so I suppose not?

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D23008

To: poboiko, #baloo, bruns, ngraham
Cc: broulik, kde-frameworks-devel, LeGast00n, fbampaloukas, domson, 
ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, abrahams


D23008: [baloo_file_extractor] Be more resistant to multiple QSocketNotifier events

2019-08-07 Thread Nathaniel Graham
ngraham added inline comments.

INLINE COMMENTS

> app.h:50
>  public:
> -explicit App(QObject* parent = nullptr);
>  

This isn't public to outside users, right?

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D23008

To: poboiko, #baloo, bruns, ngraham
Cc: kde-frameworks-devel, LeGast00n, fbampaloukas, domson, ashaposhnikov, 
michaelh, astippich, spoorun, ngraham, bruns, abrahams


D23008: [baloo_file_extractor] Be more resistant to multiple QSocketNotifier events

2019-08-07 Thread Igor Poboiko
poboiko created this revision.
poboiko added reviewers: Baloo, bruns, ngraham.
Herald added projects: Frameworks, Baloo.
poboiko requested review of this revision.

REVISION SUMMARY
  When a new batch arrives at stdin (as notified by QSocketNotifier), extractor
  creates a new transaction. If code, which reacts to notification, gets 
executed
  multiple times, we end up having two write transaction, which is a bad idea.
  
  Current implementation disables `QSocketNotifier`, so that we won't receive 
new
  notifications. It blocks further notifications, but it does not eliminate the
  possibility when several notifications already appeared in the event queue
  (which can happen due to race condition).
  It shouldn't happen, normally, but the only guard for that case currently is a
  single `Q_ASSERT`, which gets silently ignored in the non-debug build.
  
  This patch replaces assert with proper check, which instead commits & deletes
  the transaction if there is any (just in case).
  It also removes attempts to open the database multiple times (for each batch),
  which is a bad idea, as LMDB documentation suggests. These attempts will be
  ignored anyway (because of the check inside Database::open), yet semantically
  it's better to move the code outside, to the main(), as it is in `baloo_file`.

TEST PLAN
  I didn't manage to cause such race condition "in the laboratory environment", 
  so I've only checked it does not break anything

REPOSITORY
  R293 Baloo

BRANCH
  extractor-transaction (branched from master)

REVISION DETAIL
  https://phabricator.kde.org/D23008

AFFECTED FILES
  src/file/extractor/app.cpp
  src/file/extractor/app.h
  src/file/extractor/main.cpp

To: poboiko, #baloo, bruns, ngraham
Cc: kde-frameworks-devel, LeGast00n, fbampaloukas, domson, ashaposhnikov, 
michaelh, astippich, spoorun, ngraham, bruns, abrahams