D16266: [Extractor] Make extractor crash resilient

2018-10-25 Thread Stefan Brüns
This revision was automatically updated to reflect the committed changes.
Closed by commit R293:1509ca51c5ed: [Extractor] Make extractor crash resilient 
(authored by bruns).

REPOSITORY
  R293 Baloo

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D16266?vs=43777=44212

REVISION DETAIL
  https://phabricator.kde.org/D16266

AFFECTED FILES
  src/file/extractorprocess.cpp
  src/file/extractorprocess.h
  src/file/filecontentindexer.cpp
  src/file/filecontentindexerprovider.cpp
  src/file/filecontentindexerprovider.h

To: bruns, #baloo, #frameworks, poboiko, ngraham
Cc: broulik, apol, kde-frameworks-devel, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns, abrahams


D16266: [Extractor] Make extractor crash resilient

2018-10-25 Thread Nathaniel Graham
ngraham accepted this revision.
ngraham added a comment.
This revision is now accepted and ready to land.


  Yeah go ahead I think.

REPOSITORY
  R293 Baloo

BRANCH
  mimetype_handling

REVISION DETAIL
  https://phabricator.kde.org/D16266

To: bruns, #baloo, #frameworks, poboiko, ngraham
Cc: broulik, apol, kde-frameworks-devel, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns, abrahams


D16266: [Extractor] Make extractor crash resilient

2018-10-24 Thread Stefan Brüns
bruns marked an inline comment as done.
bruns added inline comments.

INLINE COMMENTS

> poboiko wrote in filecontentindexer.cpp:91
> OK, we can simply count how many times we got `finishedIndexingFile`, and 
> just go to the corresponding position in the batch.
> It's just the binary search here does look a bit unnecessary to me...

Counting would give a good hint which one failed, but still no guarantee. Files 
may be added or removed during the indexer run, so the position is only 
approximate. The indexer may also have sent a "finished" message, and crash 
afterwards in a destructor call.

Doing a binary search is straight forward and avoids any dependencies or 
assumptions about other parts of the code.

Also, this code should be only temporary anyway - if the extractor is run in a 
separate process which only receives the file using a readonly file descriptor 
(for sandboxing) and passes back the result, the problematic documents id is 
known by the parent process.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D16266

To: bruns, #baloo, #frameworks, poboiko, ngraham
Cc: broulik, apol, kde-frameworks-devel, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns, abrahams


D16266: [Extractor] Make extractor crash resilient

2018-10-20 Thread Igor Poboiko
poboiko added inline comments.

INLINE COMMENTS

> bruns wrote in filecontentindexer.cpp:91
> But it is racy - if the file is replaced in the meantime, inode and filename 
> no longer match. This is not completely unlikely when dealing with temporary 
> files.
> 
> It would make the code also significantly more complex, and I want it simple 
> here. It should only be hit in exceptional cases.
> 
> Also, it is not to uncommon to have a batch of size one from the start, i.e. 
> when adding/modifying files.

OK, we can simply count how many times we got `finishedIndexingFile`, and just 
go to the corresponding position in the batch.
It's just the binary search here does look a bit unnecessary to me...

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D16266

To: bruns, #baloo, #frameworks, poboiko, ngraham
Cc: broulik, apol, kde-frameworks-devel, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns, abrahams


D16266: [Extractor] Make extractor crash resilient

2018-10-17 Thread Stefan Brüns
bruns marked 3 inline comments as done.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D16266

To: bruns, #baloo, #frameworks, poboiko, ngraham
Cc: broulik, apol, kde-frameworks-devel, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns, abrahams


D16266: [Extractor] Make extractor crash resilient

2018-10-17 Thread Stefan Brüns
bruns marked 3 inline comments as done.
bruns added inline comments.

INLINE COMMENTS

> poboiko wrote in filecontentindexer.cpp:91
> There is `Transaction::documentId(const QByteArray& path)`, which can resolve 
> it using `DocumentUrlDB`.
> I believe it still would be cheaper than reindexing several files multiple 
> times.

But it is racy - if the file is replaced in the meantime, inode and filename no 
longer match. This is not completely unlikely when dealing with temporary files.

It would make the code also significantly more complex, and I want it simple 
here. It should only be hit in exceptional cases.

Also, it is not to uncommon to have a batch of size one from the start, i.e. 
when adding/modifying files.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D16266

To: bruns, #baloo, #frameworks, poboiko, ngraham
Cc: broulik, apol, kde-frameworks-devel, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns, abrahams


D16266: [Extractor] Make extractor crash resilient

2018-10-17 Thread Igor Poboiko
poboiko added inline comments.

INLINE COMMENTS

> bruns wrote in filecontentindexer.cpp:91
> Yes, its a binary search.
> Because we only have IDs here, and the progress reporting works on strings.

There is `Transaction::documentId(const QByteArray& path)`, which can resolve 
it using `DocumentUrlDB`.
I believe it still would be cheaper than reindexing several files multiple 
times.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D16266

To: bruns, #baloo, #frameworks, poboiko, ngraham
Cc: broulik, apol, kde-frameworks-devel, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns, abrahams


D16266: [Extractor] Make extractor crash resilient

2018-10-17 Thread Stefan Brüns
bruns added inline comments.

INLINE COMMENTS

> poboiko wrote in filecontentindexer.cpp:91
> Do I understand correctly, that's a binary-search-like way to find the file 
> actually causing `extractor` to crash, and it will reindex some of the files 
> in a batch ~log2(batchsize) times?
> Why don't we rely instead on its progress reporting, via 
> `startedIndexingFile` / `finishedIndexingFile`?

Yes, its a binary search.
Because we only have IDs here, and the progress reporting works on strings.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D16266

To: bruns, #baloo, #frameworks, poboiko, ngraham
Cc: broulik, apol, kde-frameworks-devel, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns, abrahams


D16266: [Extractor] Make extractor crash resilient

2018-10-17 Thread Kai Uwe Broulik
broulik added inline comments.

INLINE COMMENTS

> poboiko wrote in filecontentindexer.cpp:75
> Is it OK to use `QObject::connect` inside a `while` loop? Those are not 
> `Qt::UniqueConnection`, won't they fire multiple times (more and more, 
> actually)?

given `loop` lives on the stack it is destroyed when the scope is left and the 
connection severed

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D16266

To: bruns, #baloo, #frameworks, poboiko, ngraham
Cc: broulik, apol, kde-frameworks-devel, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns, abrahams


D16266: [Extractor] Make extractor crash resilient

2018-10-17 Thread Igor Poboiko
poboiko added inline comments.

INLINE COMMENTS

> filecontentindexer.cpp:75
> +bool hadErrors = false;
> +connect(, ::failed, , [, 
> ]() { hadErrors = true; loop.quit(); });
> +

Is it OK to use `QObject::connect` inside a `while` loop? Those are not 
`Qt::UniqueConnection`, won't they fire multiple times (more and more, 
actually)?

> filecontentindexer.cpp:91
> +batchSize = idList.size() / 2;
> +}
> +process.start();

Do I understand correctly, that's a binary-search-like way to find the file 
actually causing `extractor` to crash, and it will reindex some of the files in 
a batch ~log2(batchsize) times?
Why don't we rely instead on its progress reporting, via `startedIndexingFile` 
/ `finishedIndexingFile`?

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D16266

To: bruns, #baloo, #frameworks, poboiko, ngraham
Cc: apol, kde-frameworks-devel, ashaposhnikov, michaelh, astippich, spoorun, 
ngraham, bruns, abrahams


D16266: [Extractor] Make extractor crash resilient

2018-10-16 Thread Stefan Brüns
bruns updated this revision to Diff 43777.
bruns added a comment.


  rebase

REPOSITORY
  R293 Baloo

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D16266?vs=43769=43777

BRANCH
  mimetype_handling

REVISION DETAIL
  https://phabricator.kde.org/D16266

AFFECTED FILES
  src/file/extractorprocess.cpp
  src/file/extractorprocess.h
  src/file/filecontentindexer.cpp
  src/file/filecontentindexerprovider.cpp
  src/file/filecontentindexerprovider.h

To: bruns, #baloo, #frameworks, poboiko, ngraham
Cc: apol, kde-frameworks-devel, ashaposhnikov, michaelh, astippich, spoorun, 
ngraham, bruns, abrahams


D16266: [Extractor] Make extractor crash resilient

2018-10-16 Thread Stefan Brüns
bruns updated this revision to Diff 43769.
bruns marked 2 inline comments as done.
bruns added a comment.


  coding style

REPOSITORY
  R293 Baloo

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D16266?vs=43768=43769

BRANCH
  mimetype_handling

REVISION DETAIL
  https://phabricator.kde.org/D16266

AFFECTED FILES
  src/file/extractorprocess.cpp
  src/file/extractorprocess.h
  src/file/filecontentindexer.cpp
  src/file/filecontentindexerprovider.cpp
  src/file/filecontentindexerprovider.h

To: bruns, #baloo, #frameworks, poboiko, ngraham
Cc: apol, kde-frameworks-devel, ashaposhnikov, michaelh, astippich, spoorun, 
ngraham, bruns, abrahams


D16266: [Extractor] Make extractor crash resilient

2018-10-16 Thread Stefan Brüns
bruns marked 3 inline comments as done.
bruns added inline comments.

INLINE COMMENTS

> apol wrote in extractorprocess.cpp:35
> Shouldn't it check the exitCode too?

the exitCode is "only valid for normal exits" (Qt docu 
), and 0 otherwise.

> apol wrote in extractorprocess.cpp:54
> Do you really need to waitForStarted?

Copied from the old code. Should not hurt, as done from a runner thread.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D16266

To: bruns, #baloo, #frameworks, poboiko, ngraham
Cc: apol, kde-frameworks-devel, ashaposhnikov, michaelh, astippich, spoorun, 
ngraham, bruns, abrahams


D16266: [Extractor] Make extractor crash resilient

2018-10-16 Thread Aleix Pol Gonzalez
apol added inline comments.

INLINE COMMENTS

> extractorprocess.cpp:35
> +connect(_extractorProcess, QOverload QProcess::ExitStatus>::of(::finished),
> +[=](int exitCode, QProcess::ExitStatus exitStatus)
> +{

Shouldn't it check the exitCode too?

> extractorprocess.cpp:54
> +m_extractorProcess.start(QIODevice::Unbuffered | QIODevice::ReadWrite);
> +m_extractorProcess.waitForStarted();
> +m_extractorProcess.setReadChannel(QProcess::StandardOutput);

Do you really need to waitForStarted?

> filecontentindexer.cpp:74
>  
> +bool hadErrors{false};
> +connect(, ::failed, , [, 
> ]() { hadErrors = true; loop.quit(); });

`= false;`

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D16266

To: bruns, #baloo, #frameworks, poboiko, ngraham
Cc: apol, kde-frameworks-devel, ashaposhnikov, michaelh, astippich, spoorun, 
ngraham, bruns, abrahams


D16266: [Extractor] Make extractor crash resilient

2018-10-16 Thread Stefan Brüns
bruns created this revision.
bruns added reviewers: Baloo, Frameworks, poboiko, ngraham.
Herald added projects: Frameworks, Baloo.
Herald added a subscriber: kde-frameworks-devel.
bruns requested review of this revision.

REVISION SUMMARY
  Connect to QProcess::finished to detect the exit status. In case the
  process has crashed, signal the indexer.
  
  On a crash, restart the process and feed it a smaller batch. If the
  crashing batch contains only a single file, mark the file as failed, i.e.
  add it to the "failedid" db and remove it from the content indexing db
  to avoid further indexing attempts.
  
  CCBUG: 375131

TEST PLAN
  start `balooctl monitor`
  add a file known to crash the extractor to an indexable path
  touch an unproblematic file
  -> indexer crashes on first file and continues with the second

REPOSITORY
  R293 Baloo

BRANCH
  mimetype_handling

REVISION DETAIL
  https://phabricator.kde.org/D16266

AFFECTED FILES
  src/file/extractorprocess.cpp
  src/file/extractorprocess.h
  src/file/filecontentindexer.cpp
  src/file/filecontentindexerprovider.cpp
  src/file/filecontentindexerprovider.h

To: bruns, #baloo, #frameworks, poboiko, ngraham
Cc: kde-frameworks-devel, ashaposhnikov, michaelh, astippich, spoorun, ngraham, 
bruns, abrahams