D11828: Simplify orPostingIterator and make it faster

2019-04-11 Thread Stefan Brüns
This revision was automatically updated to reflect the committed changes.
Closed by commit R293:8fcd690fe853: Simplify orPostingIterator and make it 
faster (authored by bruns).

REPOSITORY
  R293 Baloo

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D11828?vs=53527=56021

REVISION DETAIL
  https://phabricator.kde.org/D11828

AFFECTED FILES
  src/engine/orpostingiterator.cpp
  src/engine/orpostingiterator.h

To: bruns, #baloo, #frameworks, poboiko, ngraham
Cc: ngraham, fvogt, kde-frameworks-devel, #frameworks, gennad, domson, 
ashaposhnikov, michaelh, astippich, spoorun, bruns, abrahams


D11828: Simplify orPostingIterator and make it faster

2019-04-11 Thread Nathaniel Graham
ngraham accepted this revision.
ngraham added a comment.
This revision is now accepted and ready to land.


  Let's get this in.

REPOSITORY
  R293 Baloo

BRANCH
  speedup_orpostingoperator

REVISION DETAIL
  https://phabricator.kde.org/D11828

To: bruns, #baloo, #frameworks, poboiko, ngraham
Cc: ngraham, fvogt, kde-frameworks-devel, #frameworks, gennad, domson, 
ashaposhnikov, michaelh, astippich, spoorun, bruns, abrahams


D11828: Simplify orPostingIterator and make it faster

2019-04-11 Thread Stefan Brüns
bruns marked 5 inline comments as done.
bruns added a comment.


  Ping ...

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D11828

To: bruns, #baloo, #frameworks, poboiko
Cc: fvogt, kde-frameworks-devel, #frameworks, gennad, domson, ashaposhnikov, 
michaelh, astippich, spoorun, ngraham, bruns, abrahams


D11828: Simplify orPostingIterator and make it faster

2019-03-09 Thread Stefan Brüns
bruns added a comment.


  Ping!

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D11828

To: bruns, #baloo, #frameworks, poboiko
Cc: fvogt, kde-frameworks-devel, #frameworks, gennad, domson, ashaposhnikov, 
michaelh, astippich, spoorun, ngraham, bruns, abrahams


D11828: Simplify orPostingIterator and make it faster

2019-03-09 Thread Stefan Brüns
bruns updated this revision to Diff 53527.
bruns added a comment.


  rebase

REPOSITORY
  R293 Baloo

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D11828?vs=36944=53527

BRANCH
  speedup_orpostingoperator

REVISION DETAIL
  https://phabricator.kde.org/D11828

AFFECTED FILES
  src/engine/orpostingiterator.cpp
  src/engine/orpostingiterator.h

To: bruns, #baloo, #frameworks, poboiko
Cc: fvogt, kde-frameworks-devel, #frameworks, gennad, domson, ashaposhnikov, 
michaelh, astippich, spoorun, ngraham, bruns, abrahams


D11828: Simplify orPostingIterator and make it faster

2018-10-08 Thread Stefan Brüns
bruns added a comment.


  In D11828#339379 , @poboiko wrote:
  
  > Looks fine to me. But do we really need to optimize it? I mean, I didn't 
see it running more than ~20 ms, and with this patch for small queries it runs 
like ~16 ms. Worst case is when user types something in KRunner, but again, the 
lag is negligible there.
  
  
  KRunner currently executes each search term 8 times, once for each of type in 
[ Audio, Video, Document, ...]. I think especially for krunner lower latency is 
important, as it does search-as-you-type.

INLINE COMMENTS

> poboiko wrote in orpostingiterator.cpp:79
> I'm not really sure, do we actually need to set it to `nullptr`? It becomes 
> inaccessible after next line

It makes debugging a little bit nicer, erase is implemented as swap, and when 
setting it the vector only contains nullptrs after size() elements.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D11828

To: bruns, #baloo, #frameworks, poboiko
Cc: fvogt, kde-frameworks-devel, #frameworks, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns, abrahams


D11828: Simplify orPostingIterator and make it faster

2018-10-08 Thread Igor Poboiko
poboiko added a comment.


  Looks fine to me. But do we really need to optimize it? I mean, I didn't see 
it running more than ~20 ms, and with this patch for small queries it runs like 
~16 ms. Worst case is when user types something in KRunner, but again, the lag 
is negligible there.

INLINE COMMENTS

> orpostingiterator.cpp:73
>  
> -// First or last element
> -if (iter->docId() == 0 && iter->next() == 0) {
> -delete iter;
> -*it = nullptr;
> -continue;
> +auto docId = iter->docId();
> +if (docId <= m_docId) {

Usage of `auto` keyword (at least for docId) somehow made it a bit harder to 
read for me. Looks like Qt Coding Conventions 
 agrees 
(it's fine for iterators, whose definition has a long-long-name and it's clear 
from the line that it's an iterator. But for ordinary `docId` I've began to 
think it's not `quint64`, but some weird class with large definition or 
something).

> orpostingiterator.cpp:79
> +delete iter;
> +*it = nullptr;
> +it = m_iterators.erase(it);

I'm not really sure, do we actually need to set it to `nullptr`? It becomes 
inaccessible after next line

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D11828

To: bruns, #baloo, #frameworks, poboiko
Cc: fvogt, kde-frameworks-devel, #frameworks, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns, abrahams


D11828: Simplify orPostingIterator and make it faster

2018-10-06 Thread Stefan Brüns
bruns added a reviewer: poboiko.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D11828

To: bruns, #baloo, #frameworks, poboiko
Cc: fvogt, kde-frameworks-devel, #frameworks, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns, abrahams


D11828: Simplify orPostingIterator and make it faster

2018-08-18 Thread Stefan Brüns
bruns marked an inline comment as done.
bruns added a comment.


  Ping!

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D11828

To: bruns, #baloo, michaelh, #frameworks
Cc: fvogt, kde-frameworks-devel, #frameworks, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns, abrahams


D11828: Simplify orPostingIterator and make it faster

2018-08-18 Thread Stefan Brüns
bruns removed a reviewer: michaelh.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D11828

To: bruns, #baloo, #frameworks
Cc: fvogt, kde-frameworks-devel, #frameworks, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns, abrahams


D11828: Simplify orPostingIterator and make it faster

2018-06-30 Thread Stefan Brüns
bruns updated this revision to Diff 36944.
bruns added a comment.


  remove extra parentheses

REPOSITORY
  R293 Baloo

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D11828?vs=31623=36944

BRANCH
  speedup_orpostingoperator

REVISION DETAIL
  https://phabricator.kde.org/D11828

AFFECTED FILES
  src/engine/orpostingiterator.cpp
  src/engine/orpostingiterator.h

To: bruns, #baloo, michaelh, #frameworks
Cc: fvogt, kde-frameworks-devel, #frameworks, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns, abrahams


D11828: Simplify orPostingIterator and make it faster

2018-06-29 Thread Stefan Brüns
bruns added inline comments.

INLINE COMMENTS

> fvogt wrote in orpostingiterator.cpp:86
> Looks like most of those parens  are unnecessary.

only one pair is, the others support readability, see Qt Coding Style:

> Use parentheses to group expressions

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D11828

To: bruns, #baloo, michaelh, #frameworks
Cc: fvogt, kde-frameworks-devel, #frameworks, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns, abrahams


D11828: Simplify orPostingIterator and make it faster

2018-06-28 Thread Fabian Vogt
fvogt added inline comments.

INLINE COMMENTS

> orpostingiterator.cpp:86
> +// check if the docId is the new lowest docId
> +if (((docId < m_nextId)) || (m_nextId == 0)) {
> +m_nextId = docId;

Looks like most of those parens  are unnecessary.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D11828

To: bruns, #baloo, michaelh, #frameworks
Cc: fvogt, kde-frameworks-devel, #frameworks, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns, abrahams


D11828: Simplify orPostingIterator and make it faster

2018-06-26 Thread Stefan Brüns
bruns marked an inline comment as done.
bruns added a comment.
Restricted Application added a subscriber: kde-frameworks-devel.


  Can someone please review this change?

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D11828

To: bruns, #baloo, michaelh, #frameworks
Cc: kde-frameworks-devel, #frameworks, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns, abrahams


D11828: Simplify orPostingIterator and make it faster

2018-04-26 Thread Stefan Brüns
bruns added a reviewer: Frameworks.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D11828

To: bruns, #baloo, michaelh, #frameworks
Cc: #frameworks, ashaposhnikov, michaelh, astippich, spoorun, bruns


D11828: Simplify orPostingIterator and make it faster

2018-04-21 Thread Stefan Brüns
bruns marked 8 inline comments as done.
bruns added inline comments.

INLINE COMMENTS

> michaelh wrote in orpostingiterator.cpp:28
> `, m_nextId(ULONG_LONG_MAX)` ?

Currently the only reserved value is `0`, and I prefer to keep it as is.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D11828

To: bruns, #baloo, michaelh
Cc: #frameworks, ashaposhnikov, michaelh, astippich, spoorun, bruns


D11828: Simplify orPostingIterator and make it faster

2018-04-09 Thread Michael Heidelbach
michaelh added a comment.


  Thank you once more. This is complicated stuff (for me), it will take  some 
time to sink in.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D11828

To: bruns, #baloo, michaelh
Cc: #frameworks, ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, 
alexeymin


D11828: Simplify orPostingIterator and make it faster

2018-04-09 Thread Stefan Brüns
bruns added a comment.


  In D11828#242402 , @michaelh wrote:
  
  > I'm not very familiar with the concept of iterators (yet). To me it looks 
like `auto i = new OrPostingIterator(iters); i->docId();` will return 0 and 
`i->next()` returns a valid docId. After that `i->docId();` is also valid.
  >  Is this how iterators work? Naively I would expect `i->docId();` to be 
valid after construction or both `i->docId();` and `i->next()` to be invalid.
  >
  > I have to admit, it's very difficult to understand (and I don't, really) 
what baloo is doing here with all this iterating over vectors of inherited 
iterator classes, pointer-to-pointer stuff of which some might not even exist. 
My brain hurts, terribly! All I can presently do is ask some simple questions. 
Sorry, I'm of no help here. Please take my comments as a device for better 
understanding and I hope it's not to much of a bother.
  
  
  Each PostingIterator represents a "set" of document ids. PostingIterators are 
not related to STL or Qt container iterators (although the "subiterators" are 
implemented using QVectors<>).
  
  The `PI::next()` call returns the smallest element and removes it from the 
set, `PI::docId()` just returns the smallest element.
  
  The result set of the OrPostingIterator is the union of its "subsets" 
(represented by its "subiterators", i.e. the QVector 
constructor argument). Or works a little bit like an insertion sort. To return 
the smallest element, it creates the union of the smallest element of all 
subsets, and yields the smallest element of the union. It then calls 
`PI::next()` on all subsets which returned the smallest element, removing it 
from all subsets and thus from the union.
  
  In C++, you can use iterators in two different ways, either PreIncrement or 
PostIncrement. `PostingIterator::next()` will behave like `*(++it)`, i.e. it is 
incremented first, then the current value is returned.
  
  I don't know the original justification for this design. One possible idea is 
to delay the actual database query until `next()` is called the first time. For 
e.g. a query "a AND b AND c", one can execute "a", if not empty do "b", 
intersect (i.e. calling the Iterators of Result("a") and Result("b") until both 
return the same value), and if the intersection is non-empty, do "c".
  
  Currently, it does not implement this optimization. D12025 
 implements one case, i.e. stopping if a 
leaf query returns an empty set, and propagating it upwards.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D11828

To: bruns, #baloo, michaelh
Cc: #frameworks, ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, 
alexeymin


D11828: Simplify orPostingIterator and make it faster

2018-04-09 Thread Michael Heidelbach
michaelh added a comment.


  :-) Thank you for the lesson.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D11828

To: bruns, #baloo, michaelh
Cc: #frameworks, ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, 
alexeymin


D11828: Simplify orPostingIterator and make it faster

2018-04-09 Thread Stefan Brüns
bruns added inline comments.

INLINE COMMENTS

> michaelh wrote in orpostingiterator.cpp:71
> Newbie question: Does std::unique_ptr make sense here?

When writing from scratch today, yes.
But this would not disallow nullptr 
(`std::make_unique(nullptr)` is fully valid)

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D11828

To: bruns, #baloo, michaelh
Cc: #frameworks, ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, 
alexeymin


D11828: Simplify orPostingIterator and make it faster

2018-04-09 Thread Stefan Brüns
bruns added a comment.


  In D11828#242372 , @michaelh wrote:
  
  > While reading in IDE realized `(*it)` resolves to `PostingIterator**` I got 
a little dizzy at first, then started to play with this code and came up with 
this.
  >  In Constructor:
  >
  >   for (PostingIterator* it : m_iterators) {
  > if (it == nullptr) {
  >   m_iterators.erase();
  
  
  Thats not how QVector<>::iterator works
  
  - you can not create an QVector<>::iterator from an PostingIterator*. The 
latter is just a (pointer to) a PostingIterator, it does not know about the 
container (QVector) it is member of [1]
  - deleting an element from a container using `.erase(it)` typically 
invalidates the `it`, thats the reason `.erase(it)` returns an iterator to the 
next element
  - `it = m_iterators.erase(it)` thus implicitly moves the iterator forward
  
  >} else {
  >  auto docId = it->next();
  >  if (docId && (docId < m_nextId || m_nextId == 0)) {
  >m_nextId = docId;
  >  }
  >   }
  > 
  > }
  > 
  >   In `OrPostingIterator::next()`
  
  does not work for the same reason ...
  
  What could work is deleting the PostingIterator immediately and setting the 
iterator value to `nullptr`, and then do a cleanup pass afterwards with 
`QVector<>::removeAll(nullptr)`.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D11828

To: bruns, #baloo, michaelh
Cc: #frameworks, ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, 
alexeymin


D11828: Simplify orPostingIterator and make it faster

2018-04-08 Thread Michael Heidelbach
michaelh added a comment.


  I'm not very familiar with the concept of iterators (yet). To me it looks 
like `auto i = new OrPostingIterator(iters); i->docId();` will return 0 and 
`i->next()` returns a valid docId. After that `i->docId();` is also valid.
  Is this how iterators work? Naively I would expect `i->docId();` to be valid 
after construction or both `i->docId();` and `i->next()` to be invalid.
  
  I have to admit, it's very difficult to understand (and I don't, really) what 
baloo is doing here with all this iterating over vectors of inherited iterator 
classes, pointer-to-pointer stuff of which some might not even exist. My brain 
hurts, terribly! All I can presently do is ask some simple questions. Sorry, 
I'm of no help here. Please take my comments as a device for better 
understanding and I hope it's not to much of a bother.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D11828

To: bruns, #baloo, michaelh
Cc: #frameworks, ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, 
alexeymin


D11828: Simplify orPostingIterator and make it faster

2018-04-08 Thread Michael Heidelbach
michaelh added a comment.


  While reading in IDE realized `(*it)` resolves to `PostingIterator**` I got a 
little dizzy at first, then started to play with this code and came up with 
this.
  In Constructor:
  
for (PostingIterator* it : m_iterators) {
  if (it == nullptr) {
m_iterators.erase();
   } else {
 auto docId = it->next();
 if (docId && (docId < m_nextId || m_nextId == 0)) {
   m_nextId = docId;
 }
  }
}
  
  In `OrPostingIterator::next()`
  
for (PostingIterator* it : m_iterators) {
auto docId = it->docId();
if (docId <= m_docId) {
docId = it->next();
if (docId == 0) {
m_iterators.erase();
continue;
}
}
m_nextId = m_nextId ? qMin(docId , m_nextId) : docId;
}
  
  I'm not 100% sure if this does exactly the same. But if it does, for readers 
on my skill level this code will be easier to understand, I guess. 
  BTW, `` also makes me dizzy :-)

INLINE COMMENTS

> orpostingiterator.cpp:28
>  , m_docId(0)
> +, m_nextId(0)
>  {

`, m_nextId(ULONG_LONG_MAX)` ?

> orpostingiterator.cpp:44
> +m_nextId = docId;
> +}
> +

`m_nextId = (docId > 0) ? qMin(m_nextId, docId) : m_nextId;`  ?

> orpostingiterator.cpp:68
>  }
>  
> +// advance all iterators which point to the lowest docId

Remove the above  and instead ?

  if (m_nextId == ULONG_LONG_MAX) {
m_docId = 0;
return 0;
  } 
  m_docId = m_nextId;

?

> orpostingiterator.cpp:71
> +for (auto it = m_iterators.begin(); it != m_iterators.end(); ) {
>  PostingIterator* iter = *it;
>  

Newbie question: Does std::unique_ptr make sense here?

> orpostingiterator.cpp:88
> +m_nextId = docId;
>  }
>  

`m_nextId = qMin(docId, m_nextId);` ?

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D11828

To: bruns, #baloo, michaelh
Cc: #frameworks, ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, 
alexeymin


D11828: Simplify orPostingIterator and make it faster

2018-04-07 Thread Stefan Brüns
bruns updated this revision to Diff 31623.
bruns added a reviewer: michaelh.
bruns added a comment.


  cleanup

REPOSITORY
  R293 Baloo

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D11828?vs=31008=31623

BRANCH
  speedup_orpostingoperator

REVISION DETAIL
  https://phabricator.kde.org/D11828

AFFECTED FILES
  src/engine/orpostingiterator.cpp
  src/engine/orpostingiterator.h

To: bruns, #baloo, michaelh
Cc: #frameworks, ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, 
alexeymin


D11828: Simplify orPostingIterator and make it faster

2018-04-06 Thread Stefan Brüns
bruns added inline comments.

INLINE COMMENTS

> orpostingiterator.cpp:44
>  {
> -m_docId = 0;
> -if (m_iterators.isEmpty()) {
> -return 0;
> +// first call
> +if (m_nextId == 0) {

Move this to the constructor?

> orpostingiterator.cpp:47
> +for (auto it = m_iterators.cbegin(), end = m_iterators.cend(); it != 
> end; it++) {
> +auto docId = (*it)->next();
> +// find smallest docId

iterator may contain nullptr's

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D11828

To: bruns, #baloo
Cc: #frameworks, ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, 
alexeymin


D11828: Simplify orPostingIterator and make it faster

2018-03-30 Thread Stefan Brüns
bruns created this revision.
bruns added a reviewer: Baloo.
Restricted Application added projects: Frameworks, Baloo.
Restricted Application added a subscriber: Frameworks.
bruns requested review of this revision.

REVISION SUMMARY
  Trivial searches (e.g. baloosearch foo) are expanded to a large list
  of ORed small results sets (e.g. 80 terms with 2...5 entries), thus
  speeding it up is quite beneficial.
  
  Currently, iterators which no longer return any entries are deleted
  and replaced with nullptrs, thus the value has to be checked on each
  iteration. The saved instructions are sufficient to more than amortize
  the cost of moving the remaining elements in the vector.
  
  The or operator has to return the smallest ID of the combined sets.
  Instead of doing a traversal on each next() call, determine the smallest
  ID on the first call and update it when checking if the iterators have
  to be advanced.
  
  Keep the docId in a local variable, as the virtual function call to
  (PostingIterator*)->docId() is somewhat expensive.
  
  According to valgrind, typical execution cost of Baloo::Query::exec()
  is reduced by 25% to 40%.
  
  Signed-off-by: Stefan Brüns 

TEST PLAN
  valgrind --tool=callgrind baloosearch foo OR bar

REPOSITORY
  R293 Baloo

BRANCH
  speedup_orpostingoperator

REVISION DETAIL
  https://phabricator.kde.org/D11828

AFFECTED FILES
  src/engine/orpostingiterator.cpp
  src/engine/orpostingiterator.h

To: bruns, #baloo
Cc: #frameworks, ashaposhnikov, michaelh, astippich, spoorun, ngraham, alexeymin