D12787: Ignore more types of source files

2018-05-15 Thread Nathaniel Graham
This revision was automatically updated to reflect the committed changes.
Closed by commit R293:7529727e4624: Ignore more types of source files (authored 
by ngraham).

REPOSITORY
  R293 Baloo

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D12787?vs=34155=34241

REVISION DETAIL
  https://phabricator.kde.org/D12787

AFFECTED FILES
  src/file/fileexcludefilters.cpp

To: ngraham, michaelh, bruns
Cc: broulik, cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-15 Thread Stefan Brüns
bruns accepted this revision.
bruns added a comment.
This revision is now accepted and ready to land.


  Not tested by me, by looks good in general.

REPOSITORY
  R293 Baloo

BRANCH
  more-excluded-source-files (branched from master)

REVISION DETAIL
  https://phabricator.kde.org/D12787

To: ngraham, michaelh, bruns
Cc: broulik, cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-14 Thread Nathaniel Graham
ngraham updated this revision to Diff 34155.
ngraham added a comment.


  Omit all .map files, and also .ini files

REPOSITORY
  R293 Baloo

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D12787?vs=34151=34155

BRANCH
  more-excluded-source-files (branched from master)

REVISION DETAIL
  https://phabricator.kde.org/D12787

AFFECTED FILES
  src/file/fileexcludefilters.cpp

To: ngraham, michaelh, bruns
Cc: broulik, cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-14 Thread Nathaniel Graham
ngraham marked an inline comment as done.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D12787

To: ngraham, michaelh, bruns
Cc: broulik, cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-14 Thread Stefan Brüns
bruns added a comment.


  If you want to read more about text in SVG:
  http://tavmjong.free.fr/blog/
  
  To show a generalized XML extractor is sufficient for SVG:
  
  - Path data: ``
  - Single Line: `Single 
line`
  - Multiline: `This is some multiline 
Text`
  
  Non-text tags are empty (i.e., are defined by attributes only).

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D12787

To: ngraham, michaelh, bruns
Cc: broulik, cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-14 Thread Nathaniel Graham
ngraham marked 3 inline comments as done.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D12787

To: ngraham, michaelh, bruns
Cc: broulik, cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-14 Thread Nathaniel Graham
ngraham updated this revision to Diff 34151.
ngraham added a comment.


  Revert change to omit SVG files

REPOSITORY
  R293 Baloo

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D12787?vs=34146=34151

BRANCH
  more-excluded-source-files (branched from master)

REVISION DETAIL
  https://phabricator.kde.org/D12787

AFFECTED FILES
  src/file/fileexcludefilters.cpp

To: ngraham, michaelh, bruns
Cc: broulik, cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-14 Thread Stefan Brüns
bruns added inline comments.

INLINE COMMENTS

> ngraham wrote in fileexcludefilters.cpp:154
> My impression is that Baloo is really intended for user files; SVGs only get 
> their content indexed by accident, because they happen to be textual. I don't 
> think there's any textual content inside an SVG file that you'd actually want 
> to have indexed.

SVGs are user files, and anything inside `` is textual content. You can 
have several paragraphs with text inside SVGs.
We index the RDF metadata (author, title, ...) for PDFs, EPUB, ... so we should 
for SVG.
Of course it is pointless to index e.g. the tags itself, or the content of any 
non-textual tag, thats the reason I asked for an XML extractor.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D12787

To: ngraham, michaelh, bruns
Cc: broulik, cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-14 Thread Nathaniel Graham
ngraham added inline comments.

INLINE COMMENTS

> bruns wrote in fileexcludefilters.cpp:154
> Hm, not to sure about this one - SVG typically has RDF metadata, and also 
> everything in `` tags qualifies as "content".
> Do we have a generalized XML extractor?

My impression is that Baloo is really intended for user files; SVGs only get 
their content indexed by accident, because they happen to be textual. I don't 
think there's any textual content inside an SVG file that you'd actually want 
to have indexed.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D12787

To: ngraham, michaelh, bruns
Cc: broulik, cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-14 Thread Stefan Brüns
bruns added inline comments.

INLINE COMMENTS

> fileexcludefilters.cpp:154
> +"image/svg+xml",
> +"image/svg+xml-compressed",
>  "application/x-awk",

Hm, not to sure about this one - SVG typically has RDF metadata, and also 
everything in `` tags qualifies as "content".
Do we have a generalized XML extractor?

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D12787

To: ngraham, michaelh, bruns
Cc: broulik, cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-14 Thread Nathaniel Graham
ngraham updated this revision to Diff 34146.
ngraham added a comment.


  Add some more

REPOSITORY
  R293 Baloo

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D12787?vs=34145=34146

BRANCH
  more-excluded-source-files (branched from master)

REVISION DETAIL
  https://phabricator.kde.org/D12787

AFFECTED FILES
  src/file/fileexcludefilters.cpp

To: ngraham, michaelh, bruns
Cc: broulik, cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-14 Thread Nathaniel Graham
ngraham marked an inline comment as done.
ngraham added a comment.


  How do people feel about adding `*.ini` to the exclusions list?

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D12787

To: ngraham, michaelh, bruns
Cc: broulik, cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-14 Thread Nathaniel Graham
ngraham marked 5 inline comments as done.
ngraham added inline comments.

INLINE COMMENTS

> bruns wrote in fileexcludefilters.cpp:82
> Thats not what I meant (I am not aware of anything generating a `Bytecode` 
> file literally).
> I meant changing the `// Compiled files` comment to `// Bytecode files`, 
> which all the ones below are.

Heh, oops.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D12787

To: ngraham, michaelh, bruns
Cc: broulik, cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-14 Thread Nathaniel Graham
ngraham updated this revision to Diff 34145.
ngraham added a comment.


  Fix misinterpretation

REPOSITORY
  R293 Baloo

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D12787?vs=34143=34145

BRANCH
  more-excluded-source-files (branched from master)

REVISION DETAIL
  https://phabricator.kde.org/D12787

AFFECTED FILES
  src/file/fileexcludefilters.cpp

To: ngraham, michaelh, bruns
Cc: broulik, cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-14 Thread Stefan Brüns
bruns added inline comments.

INLINE COMMENTS

> fileexcludefilters.cpp:82
> +"*.jsc",   // Javascript
> +"Bytecode",
>  

Thats not what I meant (I am not aware of anything generating a `Bytecode` file 
literally).
I meant changing the `// Compiled files` comment to `// Bytecode files`, which 
all the ones below are.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D12787

To: ngraham, michaelh, bruns
Cc: broulik, cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-14 Thread Nathaniel Graham
ngraham updated this revision to Diff 34143.
ngraham added a comment.


  More buildy files

REPOSITORY
  R293 Baloo

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D12787?vs=34141=34143

BRANCH
  more-excluded-source-files (branched from master)

REVISION DETAIL
  https://phabricator.kde.org/D12787

AFFECTED FILES
  src/file/fileexcludefilters.cpp

To: ngraham, michaelh, bruns
Cc: broulik, cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-14 Thread Stefan Brüns
bruns added a comment.


  Does anyone know if there are any artifacts generated by the meson build 
system?

INLINE COMMENTS

> fileexcludefilters.cpp:74
>  
>  // Compiled files
>  "*.class", // Java

Probably `Bytecode` - we have `.o` above, which is also compiled

> fileexcludefilters.cpp:76
>  "*.class", // Java
>  "*.pyc",   // Python
>  "*.elc",   // Emacs Lisp

For python2, there is also `.pyo` (Python3 is covered by the `__pycache__` 
directory filter)

> ngraham wrote in fileexcludefilters.cpp:69
> As far as I can tell, we do not, and they have to be manually listed. I've 
> added `qmlc` and `jsc`. Any more you can think of?

Static library - `.a`

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D12787

To: ngraham, michaelh, bruns
Cc: broulik, cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-14 Thread Nathaniel Graham
ngraham marked an inline comment as done.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D12787

To: ngraham, michaelh, bruns
Cc: broulik, cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-14 Thread Nathaniel Graham
ngraham marked 2 inline comments as done.
ngraham added inline comments.

INLINE COMMENTS

> broulik wrote in fileexcludefilters.cpp:69
> Don't we ignore blobs already? If not, we should also add stuff like `qmlc` 
> and `jsc`

As far as I can tell, we do not, and they have to be manually listed. I've 
added `qmlc` and `jsc`. Any more you can think of?

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D12787

To: ngraham, michaelh, bruns
Cc: broulik, cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-14 Thread Nathaniel Graham
ngraham updated this revision to Diff 34141.
ngraham added a comment.


  Add more blobs

REPOSITORY
  R293 Baloo

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D12787?vs=33920=34141

BRANCH
  more-excluded-source-files (branched from master)

REVISION DETAIL
  https://phabricator.kde.org/D12787

AFFECTED FILES
  src/file/fileexcludefilters.cpp

To: ngraham, michaelh, bruns
Cc: broulik, cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-14 Thread Kai Uwe Broulik
broulik added inline comments.

INLINE COMMENTS

> fileexcludefilters.cpp:69
> +"*.css.map,"
> +"*.so",
> +"*.db",

Don't we ignore blobs already? If not, we should also add stuff like `qmlc` and 
`jsc`

> fileexcludefilters.cpp:77
>  "*.elc",   // Emacs Lisp
> +"*.qrc",   // QML
>  

`qrc` is a Qt resource file, not a QML file

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D12787

To: ngraham, michaelh, bruns
Cc: broulik, cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, 
astippich, spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-13 Thread Nathaniel Graham
ngraham added reviewers: michaelh, bruns.
ngraham added a comment.


  Friendly ping!

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D12787

To: ngraham, michaelh, bruns
Cc: cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-09 Thread Nathaniel Graham
ngraham updated this revision to Diff 33920.
ngraham added a comment.


  Also omit node_packages folders

REPOSITORY
  R293 Baloo

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D12787?vs=33919=33920

BRANCH
  more-excluded-source-files (branched from master)

REVISION DETAIL
  https://phabricator.kde.org/D12787

AFFECTED FILES
  src/file/fileexcludefilters.cpp

To: ngraham
Cc: cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-09 Thread Nathaniel Graham
ngraham edited the summary of this revision.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D12787

To: ngraham
Cc: cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-09 Thread Nathaniel Graham
ngraham edited the summary of this revision.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D12787

To: ngraham
Cc: cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-09 Thread Nathaniel Graham
ngraham updated this revision to Diff 33919.
ngraham added a comment.


  Add more to also fix 39093

REPOSITORY
  R293 Baloo

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D12787?vs=33915=33919

BRANCH
  more-excluded-source-files (branched from master)

REVISION DETAIL
  https://phabricator.kde.org/D12787

AFFECTED FILES
  src/file/fileexcludefilters.cpp

To: ngraham
Cc: cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-09 Thread Nathaniel Graham
ngraham updated this revision to Diff 33915.
ngraham added a comment.


  Revert unintentional change

REPOSITORY
  R293 Baloo

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D12787?vs=33912=33915

BRANCH
  more-excluded-source-files (branched from master)

REVISION DETAIL
  https://phabricator.kde.org/D12787

AFFECTED FILES
  src/file/fileexcludefilters.cpp

To: ngraham
Cc: cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-09 Thread Nathaniel Graham
ngraham marked an inline comment as done.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D12787

To: ngraham
Cc: cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-09 Thread Nathaniel Graham
ngraham updated this revision to Diff 33912.
ngraham added a comment.


  add missing comma

REPOSITORY
  R293 Baloo

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D12787?vs=33903=33912

BRANCH
  more-excluded-source-files (branched from master)

REVISION DETAIL
  https://phabricator.kde.org/D12787

AFFECTED FILES
  CMakeLists.txt
  src/file/fileexcludefilters.cpp

To: ngraham
Cc: cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-09 Thread Christoph Feck
cfeck added inline comments.

INLINE COMMENTS

> fileexcludefilters.cpp:142
> +"text/csx",
> +"text/vnd.trolltech.linguist"
>  "application/x-awk",

,

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D12787

To: ngraham
Cc: cfeck, kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns


D12787: Ignore more types of source files

2018-05-09 Thread Nathaniel Graham
ngraham edited the summary of this revision.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D12787

To: ngraham
Cc: kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, astippich, spoorun, 
ngraham, bruns


D12787: Ignore more types of source files

2018-05-09 Thread Nathaniel Graham
ngraham created this revision.
Restricted Application added projects: Frameworks, Baloo.
Restricted Application added subscribers: Baloo, kde-frameworks-devel.
ngraham requested review of this revision.

REVISION SUMMARY
  Add more types of development-related files to the exclusion lists. Thhese 
files aren't useful to index, and having them there can bog down Baloo.
  BUG: 394002
  FIXED-IN 5.47

TEST PLAN
  Created a bunch of files of the newly excluded types. Baloo didn't index them.

REPOSITORY
  R293 Baloo

BRANCH
  more-excluded-source-files (branched from master)

REVISION DETAIL
  https://phabricator.kde.org/D12787

AFFECTED FILES
  CMakeLists.txt
  src/file/fileexcludefilters.cpp

To: ngraham
Cc: kde-frameworks-devel, #baloo, ashaposhnikov, michaelh, astippich, spoorun, 
ngraham, bruns