D10694: epubextractor: Handle multiple subjects better

2018-12-05 Thread Stefan Brüns
bruns commandeered this revision.
bruns edited reviewers, added: michaelh; removed: bruns.
bruns added a comment.
This revision is now accepted and ready to land.


  Picking this up ...

REPOSITORY
  R286 KFileMetaData

BRANCH
  multi-subject

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: bruns, mgallien, dfaure, michaelh
Cc: kde-frameworks-devel, #baloo, bruns, astippich, alexde, sourabhboss, 
feverfew, ashaposhnikov, michaelh, spoorun, navarromorales, firef, ngraham, 
andrebarros, emmanuelp, mikesomov, abrahams


D10694: epubextractor: Handle multiple subjects better

2018-05-09 Thread Stefan Brüns
bruns requested changes to this revision.
bruns added a comment.
This revision now requires changes to proceed.
Restricted Application edited subscribers, added: Baloo, kde-frameworks-devel; 
removed: Frameworks.


  Discussion regarding subject vs keywords is still pending, i.e. no conclusion.

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure, bruns
Cc: kde-frameworks-devel, #baloo, bruns, astippich, ashaposhnikov, michaelh, 
spoorun, navarromorales, isidorov, firef, ngraham, andrebarros, emmanuelp, 
#frameworks


D10694: epubextractor: Handle multiple subjects better

2018-04-19 Thread Stefan Brüns
bruns added inline comments.

INLINE COMMENTS

> michaelh wrote in epubextractor.cpp:85
> I think we should port away from libepub. Multiple titles result in one 
> ';'-joined string.
> Also it seems to be unmaintained.

The joined titles is the fault of this epubextractor AFAICS - see 
fetchMetadataString

> michaelh wrote in epubextractor.cpp:97
> Right, this inconsistency is intentional, and it needs discussion. That's why 
> I added a comment in D12197  which was 
> probably overlooked.
> DC and IDPF aren't very clear on how to use `dc:subject`. Calibre interprets 
> it as tags, My impression is, that most provider also do. Hence I prefer to 
> use `Property::Keywords` only because it comes closest imo. That change would 
> not really be breaking as currently `Property::Subject` is one large string 
> joined with ';'.

Distinction between Subject and Keywords typically is keywords are just a bunch 
of words without further specification, while subject, as specified by DC, and 
as used by e.g. libraries, are taken from filed specific catalogs.

Baloos properties documentation specifically mentions dc:subject for 
Properties:Subject.

One of the file formats which has both, keywords and subject, is ODF, which 
uses dc:subject and meta:keywords.

DC specifies for //any// property:

> Recommendation 5. Multiple property values should be encoded by repeating the 
> XML element for that property.

My opinion is to **always** use Properties::Subject for dc:subject (as 
documented for baloo), and add each property instance individually. If 
properties are already messed up in the originating document, there is nothing 
we can do, but we should not make things worse.

REPOSITORY
  R286 KFileMetaData

BRANCH
  multi-subject

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: bruns, astippich, #frameworks, ashaposhnikov, michaelh, spoorun, 
navarromorales, isidorov, firef, andrebarros, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-04-19 Thread Michael Heidelbach
michaelh added inline comments.

INLINE COMMENTS

> bruns wrote in epubextractor.cpp:85
> I think we should add each title individually (there may be one per language).
> 
> Dito for all other properties, see below.

I think we should port away from libepub. Multiple titles result in one 
';'-joined string.
Also it seems to be unmaintained.

> bruns wrote in epubextractor.cpp:97
> This looks a little bit inconsistent - if we have only one value, we add the 
> value to two different Properties, otherwise only Keywords is used.
> 
> DublinCore uses subject and keywords synonymously. It //recommends// to use 
> one property entry per keyword.
> 
> See Recommendation 5 of http://dublincore.org/documents/dc-xml-guidelines/
> and e.g. 
> http://dublincore.org/groups/collections/collection-application-profile/#coldcsubject
> 
> I think we should stick with either Subject or Keywords, but not both.

Right, this inconsistency is intentional, and it needs discussion. That's why I 
added a comment in D12197  which was 
probably overlooked.
DC and IDPF aren't very clear on how to use `dc:subject`. Calibre interprets it 
as tags, My impression is, that most provider also do. Hence I prefer to use 
`Property::Keywords` only because it comes closest imo. That change would not 
really be breaking as currently `Property::Subject` is one large string joined 
with ';'.

REPOSITORY
  R286 KFileMetaData

BRANCH
  multi-subject

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: bruns, astippich, #frameworks, ashaposhnikov, michaelh, spoorun, 
navarromorales, isidorov, firef, andrebarros, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-04-19 Thread Stefan Brüns
bruns added inline comments.

INLINE COMMENTS

> epubextractor.cpp:85
>  
> -QString value = fetchMetadata(ePubDoc, EPUB_TITLE);
> +QString value = fetchMetadataString(ePubDoc, EPUB_TITLE);
>  if (!value.isEmpty()) {

I think we should add each title individually (there may be one per language).

Dito for all other properties, see below.

> epubextractor.cpp:97
> +result->add(Property::Keywords, values[0]);
> +result->add(Property::Subject, values[0]);
>  }

This looks a little bit inconsistent - if we have only one value, we add the 
value to two different Properties, otherwise only Keywords is used.

DublinCore uses subject and keywords synonymously. It //recommends// to use one 
property entry per keyword.

See Recommendation 5 of http://dublincore.org/documents/dc-xml-guidelines/
and e.g. 
http://dublincore.org/groups/collections/collection-application-profile/#coldcsubject

I think we should stick with either Subject or Keywords, but not both.

REPOSITORY
  R286 KFileMetaData

BRANCH
  multi-subject

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: bruns, astippich, #frameworks, ashaposhnikov, michaelh, spoorun, 
navarromorales, isidorov, firef, andrebarros, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-04-18 Thread Matthieu Gallien
mgallien accepted this revision.
mgallien added a comment.
This revision is now accepted and ready to land.


  Thanks

REPOSITORY
  R286 KFileMetaData

BRANCH
  multi-subject

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: bruns, astippich, #frameworks, ashaposhnikov, michaelh, spoorun, 
navarromorales, isidorov, firef, andrebarros, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-04-14 Thread Stefan Brüns
bruns added a comment.


  Sorry to join late here ...
  
  A QMap can store multiple values for one key, and a client reading the Map 
can use QMap::values() to get a list of all matching properties. If a client 
naively uses value() instead, it just gets the first value for the key, but so 
be it.

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: bruns, astippich, #frameworks, ashaposhnikov, michaelh, spoorun, 
navarromorales, isidorov, firef, ngraham, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-04-14 Thread Michael Heidelbach
michaelh updated this revision to Diff 32124.
michaelh edited the summary of this revision.
michaelh added a comment.


  Remove tests from epubextractortest use multivaluetest instead
  Depends on D12197 

REPOSITORY
  R286 KFileMetaData

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D10694?vs=30972=32124

BRANCH
  multi-subject

REVISION DETAIL
  https://phabricator.kde.org/D10694

AFFECTED FILES
  autotests/multivaluetest.cpp
  src/extractors/epubextractor.cpp

To: michaelh, mgallien, dfaure
Cc: bruns, astippich, #frameworks, ashaposhnikov, michaelh, spoorun, 
navarromorales, isidorov, firef, ngraham, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-04-14 Thread Michael Heidelbach
michaelh edited the summary of this revision.

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: bruns, astippich, #frameworks, ashaposhnikov, michaelh, spoorun, 
navarromorales, isidorov, firef, ngraham, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-04-14 Thread Michael Heidelbach
michaelh added a dependency: D12197: autotests: Test for multiple values.

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: bruns, astippich, #frameworks, ashaposhnikov, michaelh, spoorun, 
navarromorales, isidorov, firef, ngraham, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-03-30 Thread Michael Heidelbach
michaelh added a dependent revision: D11820: Handle properties with multiple 
values.

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: astippich, #frameworks, ashaposhnikov, michaelh, spoorun, navarromorales, 
isidorov, nicolasfella, firef, ngraham, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-03-30 Thread Michael Heidelbach
michaelh updated this revision to Diff 30972.
michaelh added a comment.


  - Additionally use keywords property
  
  This way the value-type of subject can be preserved

REPOSITORY
  R286 KFileMetaData

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D10694?vs=30858=30972

BRANCH
  multi-subject (branched from master)

REVISION DETAIL
  https://phabricator.kde.org/D10694

AFFECTED FILES
  autotests/epubextractortest.cpp
  autotests/samplefiles/test.epub
  src/extractors/epubextractor.cpp

To: michaelh, mgallien, dfaure
Cc: astippich, #frameworks, ashaposhnikov, michaelh, spoorun, navarromorales, 
isidorov, nicolasfella, firef, ngraham, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-03-30 Thread Michael Heidelbach
michaelh added a comment.


  The semicolons problem is fixed now. See T8196 


REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: astippich, #frameworks, ashaposhnikov, michaelh, spoorun, navarromorales, 
isidorov, nicolasfella, firef, ngraham, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-03-29 Thread Michael Heidelbach
michaelh added a comment.


  Finally I understood how those multiple `add()`s were meant and it makes more 
sense now.
  We're still not good to go because of this:
  
$ balooctl index test.epub
Indexing ~/frameworks/kfilemetadata/autotests/samplefiles/test.epub
File(s) indexed

$ balooshow test.epub
177442497910278149 2053 41314051 
~/frameworks/kfilemetadata/autotests/samplefiles/test.epub
Author: Happy Man
Title: The Big Brown Bear
Subject: Baloo;KFile;Meta;Data
Publisher: Happy Publisher
Creation Date: 2014-01-01T01:01:01Z

$ baloo_filemetadata_temp_extractor test.epub
QMap(("author", QVariant(QString, "Happy Man"))("creationDate", 
QVariant(QDateTime, QDateTime(2014-01-01 01:01:01.000 UTC 
Qt::TimeSpec(UTC("publisher", QVariant(QString, "Happy 
Publisher"))("subject", QVariant(QString, "Baloo;KFile;Meta;Data"))("title", 
QVariant(QString, "The Big Brown Bear")))
BQ ...
  
  Those semicolons in `Baloo;KFile;Meta;Data` should not be there and I have no 
idea where they come from.

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: astippich, #frameworks, ashaposhnikov, michaelh, spoorun, navarromorales, 
isidorov, nicolasfella, firef, ngraham, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-03-29 Thread Michael Heidelbach
michaelh updated this revision to Diff 30858.
michaelh edited the summary of this revision.
michaelh added a comment.


  - Add multiple properies to map

REPOSITORY
  R286 KFileMetaData

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D10694?vs=27632=30858

BRANCH
  multi-subject (branched from master)

REVISION DETAIL
  https://phabricator.kde.org/D10694

AFFECTED FILES
  autotests/epubextractortest.cpp
  autotests/samplefiles/test.epub
  src/extractors/epubextractor.cpp

To: michaelh, mgallien, dfaure
Cc: astippich, #frameworks, ashaposhnikov, michaelh, spoorun, navarromorales, 
isidorov, nicolasfella, firef, ngraham, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-03-13 Thread Michael Heidelbach
michaelh added a comment.


  A Either I'm completely off-track here or `kfilemetadata` is not 
doing this correctly. 
  I see a lot of statements like `artist += ', ' + value` -> no list!
  `result->add` calls `QMap->addMulti().` This forces the client to iterate 
over the map, which is not necessary.
  In contrary
  
kfilemetadata/src/propertyinfo.cpp: 52
 case Property::AlbumArtist:
d->name = QStringLiteral("albumArtist");
d->displayName = i18nc("@label", "Album Artist");
d->valueType = QVariant::StringList;
break;
  
  The client can call `toVariantMap()` on the extraction result to get a 
`QVariantMap` but not a `QVariant::StringList`.
  I really, really, really do not understand why `kfilemetadata` is not 
delivering a string list. It is much more natural and consistent with the 
announced Property info.
  Also:
  
kfilemetadata/src/properties.h: 282
typedef QMap PropertyMap;
//In taglibextractor never a list --^`
  
  I'm confused.

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: astippich, #frameworks, ashaposhnikov, michaelh, spoorun, navarromorales, 
isidorov, nicolasfella, firef, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-03-13 Thread Matthieu Gallien
mgallien added a comment.


  In D10694#224465 , @michaelh wrote:
  
  > In D10694#224440 , @mgallien 
wrote:
  >
  > > Could you please update your diff and we can land it ? This is a useful 
improvement.
  >
  >
  > 1. We can't land it yet. baloo searching breaks with this patch. baloo has 
be adapted first.
  
  
  I trust you on this one but on some property the taglib extractor is already 
doing multiple adds of the same property. It means Baloo is already storing 
lists.
  I can help you but have not much time as usual.
  
  > 1. I don't know what to update. Please tell me once more what you want me 
to change.
  
  You can have a look at the taglib extractor. There are multiple examples of 
for loops adding multiple times the same property.
  
  > For this change concerted actions are needed. Let's discuss general 
questions here:  T8196 
  
  Thanks, this is a good idea.

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: astippich, #frameworks, ashaposhnikov, michaelh, spoorun, navarromorales, 
isidorov, nicolasfella, firef, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-03-13 Thread Michael Heidelbach
michaelh added a comment.


  In D10694#224440 , @mgallien wrote:
  
  > Could you please update your diff and we can land it ? This is a useful 
improvement.
  
  
  1. We can't land it yet. baloo searching breaks with this patch. baloo has be 
adapted first.
  2. I don't know what to update. Please tell me once more what you want me to 
change.
  
  For this change concerted actions are needed. Let's discuss general questions 
here:  T8196 

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: astippich, #frameworks, ashaposhnikov, michaelh, spoorun, navarromorales, 
isidorov, nicolasfella, firef, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-03-13 Thread Matthieu Gallien
mgallien added a comment.


  In D10694#222674 , @michaelh wrote:
  
  > @mgallien: I think we have a beautiful misunderstanding here :-)
  >
  > In D10694#221719 , @michaelh 
wrote:
  >
  > > This is bad! ...
  >
  >
  > I was referring to baloo inablility to  handle string lists and not to this 
diff.
  >
  > In D10694#221734 , @mgallien 
wrote:
  >
  > > I believe this is quite the opposite. ...
  >
  >
  > I took your answer the wrong way completely. Let's rewind our conversation 
a little
  
  
  No problem from side. In fact this is quite the opposite given you made me 
better understand all this code. I have now to fix Elisa code to handle lists 
where I was expecting a single string.
  
  Could you please update your diff and we can land it ? This is a useful 
improvement.

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: astippich, #frameworks, ashaposhnikov, michaelh, spoorun, navarromorales, 
isidorov, nicolasfella, firef, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-03-10 Thread Michael Heidelbach
michaelh added a comment.


  @mgallien: I think we have a beautiful misunderstanding here :-)
  
  In D10694#221719 , @michaelh wrote:
  
  > This is bad! ...
  
  
  I was referring to baloo inablility to  handle string lists and not to this 
diff.
  
  In D10694#221734 , @mgallien wrote:
  
  > I believe this is quite the opposite. ...
  
  
  I took your answer the wrong way completely. Let's rewind our conversation a 
little

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: astippich, #frameworks, ashaposhnikov, michaelh, spoorun, navarromorales, 
isidorov, nicolasfella, firef, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-03-09 Thread Alexander Stippich
astippich added a comment.


  In D10694#221734 , @mgallien wrote:
  
  > In D10694#221719 , @michaelh 
wrote:
  >
  > > This is bad! I have learned baloo itself doesn't handle stringlists. 
Which in my view would be the natural way to handle token-like items like tags, 
keywords and subject(s). Until this is changed in baloo I'm putting this diff 
on hold and fro the time being work around these long subject-strings within 
baloo-widgets.
  >
  >
  > I believe this is quite the opposite. I am already getting strings list for 
some audio metadata. I would like to get your patch in given you make it works 
like already existing code and apart from the baloo-widgets code everybody 
should be fine with your modifications.
  >
  > > @mgallien: Did I understand you correctly, the refactoring helped you to 
understand it better? That would be an incentive for me to do more refactoring.
  >
  > This is this patch that made me read the code in baloo that store the 
properties returned by the extractors. This code is handling strings list quite 
fine and automatically when a property is added several times.
  
  
  Could you please point me to the code location? I would like to investigate 
how the code is used for D10803 .
  Also, since this discussion also applies for the taglibextractor: Is a string 
list preferred for new properties in KFileMetadata when multiple entries are 
possible? I think right now most of the properties are using a single 
concatenated string.

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: astippich, #frameworks, ashaposhnikov, michaelh, spoorun, navarromorales, 
isidorov, nicolasfella, firef, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-03-09 Thread Michael Heidelbach
michaelh added a comment.


  In D10694#221734 , @mgallien wrote:
  
  > I believe this is quite the opposite. I am already getting strings list for 
some audio metadata.
  
  
  That would be great. Please point me to the respective code in elisa. I want 
to know which properties and how baloo is involved.
  Maybe I gave up too soon. With the old kfilemetadata `baloosearch 
subject:Dummy1` has a result. When kfilemetadata delivers `subject` as string 
list is does not. I will reinvestigate.
  
  > I would like to get your patch in given you make it works like already 
existing code and apart from the baloo-widgets code everybody should be fine 
with your modifications.
  
  Please rephrase. It looks as if some words were deleted. Anyway, 
baloo-widgets is no problem at all. It's easy to adapt and to debug. It is 
baloo that's giving me headaches.

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: #frameworks, ashaposhnikov, michaelh, spoorun, navarromorales, isidorov, 
nicolasfella, firef, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-03-09 Thread Matthieu Gallien
mgallien added a comment.


  In D10694#221719 , @michaelh wrote:
  
  > This is bad! I have learned baloo itself doesn't handle stringlists. Which 
in my view would be the natural way to handle token-like items like tags, 
keywords and subject(s). Until this is changed in baloo I'm putting this diff 
on hold and fro the time being work around these long subject-strings within 
baloo-widgets.
  
  
  I believe this is quite the opposite. I am already getting strings list for 
some audio metadata. I would like to get your patch in given you make it works 
like already existing code and apart from the baloo-widgets code everybody 
should be fine with your modifications.
  
  > @mgallien: Did I understand you correctly, the refactoring helped you to 
understand it better? That would be an incentive for me to do more refactoring.
  
  This is this patch that made me read the code in baloo that store the 
properties returned by the extractors. This code is handling strings list quite 
fine and automatically when a property is added several times.

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: #frameworks, ashaposhnikov, michaelh, spoorun, navarromorales, isidorov, 
nicolasfella, firef, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-03-08 Thread Michael Heidelbach
michaelh added a comment.


  This is bad! I have learned baloo itself doesn't handle stringlists. Which in 
my view would be the natural way to handle token-like items like tags, keywords 
and subject(s). Until this is changed in baloo I'm putting this diff on hold 
and fro the time being work around these long subject-strings within 
baloo-widgets.
  
  @mgallien: Did I understand you correctly, the refactoring helped you to 
understand it better? That would be an incentive for me to do more refactoring.

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: #frameworks, ashaposhnikov, michaelh, spoorun, navarromorales, isidorov, 
nicolasfella, firef, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-03-08 Thread Matthieu Gallien
mgallien added a comment.


  Sorry for being so late.
  
  In D10694#215663 , @michaelh wrote:
  
  > @mgallien : taglibextractor.cpp is very hard to read. Hopefully it is no 
longer.

INLINE COMMENTS

> michaelh wrote in epubextractor.cpp:94
> Did you mean something like this?:
> 
>   const auto& values = fetchMetadata(ePubDoc, EPUB_SUBJECT);
>   for (auto& value : values) {
>   result->add(Property::Subject, value);
>}
> 
> It fails the test (of course). It it unclear to me how to handle 
> `result.properties().value(Property::Subject)` and there is no example in the 
> tests. They all compare against `QVariant::Type::String`. Do you have an 
> example from elisa?

Elisa code is not properly handling this case. When writing it, I had not 
understood that. Thanks to your diff, I have now properly read the code and I 
should go modify Elisa code.

In general, you can use canConvert() from QVariant. This should allow 
to handle both cases.
You can probably always convert to a string list QVariant::toStringList. It 
should be enough for properties with one string or a list of strings.

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: #frameworks, ashaposhnikov, michaelh, spoorun, navarromorales, isidorov, 
nicolasfella, firef, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-02-27 Thread Matthieu Gallien
mgallien requested changes to this revision.
mgallien added a comment.
This revision now requires changes to proceed.


  In D10694#214996 , @michaelh wrote:
  
  > In D10694#213712 , @mgallien 
wrote:
  >
  > > To me, it is necessary to have a test that ensures that possible existing 
clients are not affected by your change. Could you add it ?
  >
  >
  > The new behaviour will break clients. No test needed for this. E.g. 
baloo-widgets will show the label 'Subject' but no content.
  >  To make it clear: I do not want break this for clients apart from 
baloo-widgets. The question is: Who are the clients? and How to find them 
except for searching https://lxr.kde.org/? Please help me with this.
  
  
  Yesterday, I tried looking for such use and did not found them. I did try to 
use web search engine with no success. I am afraid I cannot help you but would 
be happy to hear any advice.
  
  > The current behaviour is already breaking: I have some epub files with a 
lot of `dc:subject` tags. The semicolon-concatenated string is longer than my 
monitor is wide. As a result in dolphin the tooltip does not display at all. My 
rationale was to tackle the problem at the root rather than to work around it 
by splitting the string within baloo-widgets. With regard to T8079 
 it will be the same. If data is taken from 
baloo's database that old long string will be displayed
  
  I am sure that your work is really trying to fix real problem that are 
seeing. You have my full support on your aim.
  
  I should apologize given how bad my knowledge of KFileMetaData is. It is 
already silently managing single and multiple entries. In fact, yes, the epub 
extractor is doing it wrong.
  I have also checked how baloo is handling multiple entries and it works in 
the same way than pure kfilemetadata.
  
  In the taglib extractors multiple entries are added in several call to add 
method. This is nicely handled by KFileMetaData::SimpleExtractionResult::add . 
It would be nice if you modify your patch to follow the same approach. In 
Baloo, things are a little bit different and a list is automatically created 
when multiple add are called with the same property but the result is similar 
(in baloo Result::add ).
  
  You can forget my initial objection, I was plain wrong.
  
  We may have bugs in user of the API not expecting a list when they should 
have. In my opinion, this should not block your diff. Let's fix them when we 
discover them.
  
  > According to the standard an epub can have multiple `dc:subject` entries. 
So it is reasonable to expect a string list from KFileMetaData. Plus the 
possibilities are grand with an KFileMetaData writer for epubs one could use 
the tag widget to change them. One could unify xattr tags and epub subjects in 
tags:// protocol and mayby more.

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: #frameworks, ashaposhnikov, michaelh, spoorun, navarromorales, isidorov, 
nicolasfella, firef, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-02-27 Thread Michael Heidelbach
michaelh added a comment.


  In D10694#213712 , @mgallien wrote:
  
  > To me, it is necessary to have a test that ensures that possible existing 
clients are not affected by your change. Could you add it ?
  
  
  The new behaviour will break clients. No test needed for this. E.g. 
baloo-widgets will show the label 'Subject' but no content.
  To make it clear: I do not want break this for clients apart from 
baloo-widgets. The question is: Who are the clients? and How to find them 
except for searching https://lxr.kde.org/? Please help me with this.
  
  The current behaviour is already breaking: I have some epub files with a lot 
of `dc:subject` tags. The semicolon-concatenated string is longer than my 
monitor is wide. As a result in dolphin the tooltip does not display at all. My 
rationale was to tackle the problem at the root rather than to work around it 
by splitting the string within baloo-widgets. With regard to T8079 
 it will be the same. If data is taken from 
baloo's database that old long string will be displayed
  
  According to the standard an epub can have multiple `dc:subject` entries. So 
it is reasonable to expect a string list from KFileMetaData. Plus the 
possibilities are grand with an KFileMetaData writer for epubs one could use 
the tag widget to change them. One could unify xattr tags and epub subjects in 
tags:// protocol and mayby more.

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: #frameworks, ashaposhnikov, michaelh, spoorun, navarromorales, isidorov, 
nicolasfella, firef, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-02-26 Thread Matthieu Gallien
mgallien added a comment.


  I have created T8079  to work on baloo 
database update when extractors are modified and returned different data.

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: #frameworks, ashaposhnikov, michaelh, spoorun, navarromorales, isidorov, 
nicolasfella, firef, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-02-25 Thread Matthieu Gallien
mgallien added a comment.


  In D10694#210509 , @michaelh wrote:
  
  > The only component I could find to be affected by this change is 
`baloo-widgets`. I have already adapted it to this change. And yes, it will 
handle both.
  >  It will take a some time to publish it because some other stuff has to get 
reviewed first.
  >  I you know of any component or application using the epub-extractor of 
KFileMetadata apart from baloo-widgets please let me know.
  
  
  I do understand that you have searched inside KDE hosted code. Still, you are 
modifying the behavior. To me, it is necessary to have a test that ensures that 
possible existing clients are not affected by your change. Could you add it ?

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: #frameworks, ashaposhnikov, michaelh, kmorwinski, spoorun, navarromorales, 
isidorov, nicolasfella, firef, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-02-20 Thread Michael Heidelbach
michaelh added a comment.


  The only component I could find to be affected by this change is 
`baloo-widgets`. I have already adapted it to this change. And yes, it will 
handle both.
  It will take a some time to publish it because some other stuff has to get 
reviewed first.
  I you know of any component or application using the epub-extractor of 
KFileMetadata apart from baloo-widgets please let me know.

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: #frameworks, ashaposhnikov, michaelh, spoorun, navarromorales, isidorov, 
nicolasfella, firef, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-02-20 Thread Matthieu Gallien
mgallien added a comment.


  In D10694#210409 , @michaelh wrote:
  
  > @mgallien Then they won't benefit from this until a file is reindexed. 
  >  Users with baloo disabled will benefit, because in that case 
`baloo_filemetadata_temp_extractor` will extract the data on-the-fly.
  >  Reindexing just the ebooks can be done with:
  >
  >   find ~/EBooks/ -type f -name "*.epub" | { mapfile -t; balooctl clear 
"${MAPFILE[@]}";  balooctl index "${MAPFILE[@]}"; }
  >  
  >
  
  
  Could you also add a test for the compatibility with current unmodified code 
? I would like to be sure that we do not introduce any incompatibility.
  
  I am still not convinced we should change the extractors without an automatic 
solution for the reindexing case. You will have to ensure everybody can use the 
property both as a string and a stringlist forever.

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: #frameworks, ashaposhnikov, michaelh, spoorun, navarromorales, isidorov, 
nicolasfella, firef, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-02-20 Thread Michael Heidelbach
michaelh added a comment.


  @mgallien Then they won't benefit from this until a file is reindexed. 
  Users with baloo disabled will benefit, because in that case 
`baloo_filemetadata_temp_extractor` will extract the data on-the-fly.
  Reindexing just the ebooks can be done with:
  
find ~/EBooks/ -type f -name "*.epub" | { mapfile -t; balooctl clear 
"${MAPFILE[@]}";  balooctl index "${MAPFILE[@]}"; }

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: #frameworks, ashaposhnikov, michaelh, spoorun, navarromorales, isidorov, 
nicolasfella, firef, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-02-20 Thread Matthieu Gallien
mgallien added a comment.


  The problem with any changes to the extractors its that the baloo database of 
users will not be refreshed.

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D10694

To: michaelh, mgallien, dfaure
Cc: #frameworks, ashaposhnikov, michaelh, spoorun, navarromorales, isidorov, 
nicolasfella, firef, andrebarros, alexeymin, emmanuelp


D10694: epubextractor: Handle multiple subjects better

2018-02-20 Thread Michael Heidelbach
michaelh created this revision.
michaelh added reviewers: mgallien, dfaure.
michaelh added projects: Baloo, Dolphin, Frameworks.
Restricted Application added a subscriber: Frameworks.
michaelh requested review of this revision.

REVISION SUMMARY
  Instead of returing one long word of ';'-concatenated subjects,
  which is unwrappable return a list of strings and let the consumer handle it.
  
  This will facilitate e.g. dolphin's tooltips display of multiple subjects.
  
  - Add more subjects to test.epub

TEST PLAN
  unit test
  visual inspection in dolphin with adapted baloo-widgets

REPOSITORY
  R286 KFileMetaData

BRANCH
  multi-subject (branched from master)

REVISION DETAIL
  https://phabricator.kde.org/D10694

AFFECTED FILES
  autotests/epubextractortest.cpp
  autotests/samplefiles/test.epub
  src/extractors/epubextractor.cpp

To: michaelh, mgallien, dfaure
Cc: #frameworks, ashaposhnikov, michaelh, spoorun, navarromorales, isidorov, 
nicolasfella, firef, andrebarros, alexeymin, emmanuelp