Re: [Tracker] Reviving the project, a first attempt

Philip Van Hoof Thu, 22 Nov 2012 03:22:13 -0800

On Thu, 2012-11-22 at 10:52 +0000, Martyn Russell wrote:
> On 11/22/2012 09:01 AM, Philip Van Hoof wrote:


> Hello Philip,

Hey Martyn!

> > Buried under its own weight of complexity the project is stifled. Why do
> > I think this?
> 
> The project isn't dead. I should point this out. It's slowed up due to 
> the change in funding clearly.

Right, it's not dead it's slowed down. I agree. You made several fine
releases and patches are being reviewed. Great job on that.

[CUT]

> > I always wondered, Jürg, where the name vstore came from? But it was a
> > fantastic branch and piece of work that you did. It clearly steered the
> > project in the direction of SPARQL and Nepomuk as ontology. Thanks.
> 
> I don't recall what the branch was about actually. At the hight of our 
> development, I was merging ca. 6 branches a week into master. Hard to 
> keep up with all of them ;)

That branch was the SPARQL and Nepomuk stuff, the big redesign point
where the indexer got separated from the SPARQL endpoint. The branch got
removed apparently but I recall that vstore was its name.

[CUT]

> > This is not the case anymore. And I heard from developers of a new phone
> > OS being developed that Tracker is again used, that it was again a hard
> 
> Which one?

They have not officially announced this and I don't want the developers
who discussed it with me in private to feel sorry that they did. You'll
figure it out, Martyn ;-)

[CUT]

> > I would also like to thank our top contributors and the people who
> > worked on Qt based libraries built on top of libtracker-sparql for
> > spreading the truth about our team and Tracker. You guys know who you
> > are, I don't have to name you ;-)
> 
> Don't forget your input. You made quite a sizeable contribution and made 
> quite some difference. ;)

> > And oh my God I'm writing so much text just to make a simple point ..
> 
> It's definitely not an imposter writing this then :P

Thanks for the nice compliments!

> > The API libtracker-extract's tracker_extract_client_get_metadata is not
> > public enough because the Tracker is relying too heavy on the file
> > system miner. Today it is time to change this.
> 
> I agree that it's too heavily relying on the miner.

> There is a good reason for this. The filesystem information, name, size, 
> mtime, etc is all handled by the miner-fs. You could likely solve this 
> issue by "chaining" extractors and have a basic file extractor which 
> gets this information so the miner isn't doing it.

Yes, I agree.

> This is the reason why miner-fs is injecting SPARQL, because it 
> concatenates extractor specific SPARQL with file system general SPARQL.

We always wanted to redesign this, as it was an extra IPC callback that
we could avoid. So improvement here would also be beneficial for
Tracker's FS miner in the form on less IPC overhead. It's win-win.

> > Phone builders want to rid themselves of file system mining. Instead
> > they want to let MTP daemons, who deal with incoming files, do the
> > processing and extraction of file meta data. They don't want to
> > configure with DConf or a GKeyFile to point to a directory where the MTP
> > daemon will write files, at all.
> 
> Right, there are different ways which data can come in and we shouldn't 
> restrict ourselves to the filesystem. That makes sense. But we don't. It 
> just so happens the miner-fs is the main way people get data into 
> tracker-store.

Right now. We can change this, but I think the right developers need to
be reactivated or at least provide a supportive role to a contributor
when or if somebody new starts working on this.

> > Instead they want their MTP daemon to use a simple API that will trigger
> > tracker-extract into extracting the file and then writing the SPARQL
> > INSERT to the SPARQL endpoint.

> One of the things I would love to see happen (or add) is a command line 
> option or way to inject SPARQL from tracker-extract into the store. We 
> have a hack for this right now with tracker-control -f $FILE and a dbus 
> API. The main problem with this is that the filesystem data is not there 
> for files (which are the main use case in tracker right now).

Yes, this could certainly be part of such a redesign or even an
intermediate step towards it.

[CUT]

> > Tomorrow's phone builders might not even use a file system. Why would
> > inter app data sharing then necessarily depend on file system indexing?!
> 
> You know that the miner-fs doesn't have to be a daemon and can index on 
> demand (instead of by inotify) right now in stable releases right? The 
> miner-fs is also configurable to not be built --disable-miner-fs (I think).

I know, but I don't think this is sufficient. A MTP daemon doesn't want
to call system(). They'll need deeper and better defined integration.

> > File system indexing is of course important, but only for users who need
> > it. Like a desktop. A desktop needs it. A phone might not need it. And
> > if it does, they understandably want to limit its use.
> >
> > I would like to propose to start with adapting libtracker-extract to be
> > fully documented, to change tracker_extract_client_get_metadata's API in
> > such a way that it is truly obvious for a platform builder, integrator
> > or app developer of for example a MTP daemon to call it in order to get
> > the file's meta data to be inserted into tracker-store before the MTP
> > daemon had to write the file itself.
> 
> I was under the impression that it was already. If someone is paying for 
> this or wants patch review, I am happy to step up.

Awesome

> > To make it possible to call this on a .tmp-XYZ file for a file that will
> > later be renamed to Girlfriend.JPEG in the DCIM folder of the phone.

> Well, this isn't actually easy to solve even if you move away from 
> miner-fs. If you're returning the full SPARQL including things like the 
> file name, size, mtime, etc. then these details change. You either 
> change the SPARQL and wait before injecting it to the store, or post 
> process by updating the store details when it changes.

As a team we did a lot of things that were not easy to solve ;-)

> You can't have it both ways. You either want the data early and have to 
> cope with changes like the name changing OR you wait and have the data 
> in it's final (albeit maybe for a small time) state.

Yep

> > Right now this ain't possible, because libtracker-extract is too focused
> > on being "just a tool library for the filesystem miner".
> 
> Well, I would say it's more that the miner-fs is _THE_ only one using 
> it, so it's not so bad given that.

Agree

> If you mean to suggest we separate this into a new project, I think that 
> might be a good idea. Same for the miner-fs. Possible for 
> libtracker-sparql too? Some investigation would be needed, there are 
> core libraries that we depend on in all cases and might cause problems...

I don't think that separating or splitting the subprojects of Tracker is
right now needed and/or a good idea. Long term it probably is.

> One of the recent issues I've had with Tracker is, I can't find it on 
> Google - I think Rob mentioned this way back at some GUADEC. The name is 
> quite generic. I have been asked several times why we have so many 
> things in the tree and if we can disable or split out things. I think 
> RedHat recently asked if we could do this, I am sure Debian maintainers 
> have too.

Yes. Back then my opinion was that a rename was not needed and would at
that point in time hurt the project's team adhesion.

Today the situation is different and if all former team members and / or
a new group of contributors taking a lead role in the project agree,
then I think a rename (long term goal) would not be a bad idea.

Sadly has the name "Tracker" been given a bad reputation for false
reasons. I think the Intel MeeGo attempt for example wrongly accused
Tracker of being a reason why Harmattan MeeGo didn't succeed.

Start of thread here:
http://lists.meego.com/pipermail/meego-architecture/2011-March/000081.html

This was my response:
http://lists.meego.com/pipermail/meego-architecture/2011-March/000113.html
https://mail.gnome.org/archives/tracker-list/2011-March/msg00033.html

A rename might undo that. I'm still not much of a fan for yielding to
reputation pressure done by clueless people who without doing much
investigation (like we did do) make faux statements.

> It's not really the Linux way IMO to have everything in one monolithic 
> module. So I wouldn't mind splitting things out.

I agree.

> > To make language bindings for it like for JS, Dalvik, MonoTouch, Qt.
> 
> That would be good. The API is quite small too, shouldn't take much effort.

Right

> > It ought to be a library for all application developers, just like how
> > libtracker-sparql is such a library: obvious in API, well documented,
> > suitable for wrapping it with for example a Qt layer and all that stuff.

> :) interesting. There is a reason why it's not a library. We often have 
> crashes for whatever reason.

Yes, I don't think tracker-extract should cease being a process. A
library that does IPC to tracker-extract is probably the right solution.
That or a strong warning that a no-extract-process libtracker-extract
can crash as it relies on a wide variety of libraries having to cope
with a wide variety of file formats.

A libtracker-extract could also be done like how libstreamanalyze was
done, but I consider libstreamanalyzer's integration, adaptation and /
or merge with what is now the Tracker project a long term goal.

I'm adding Jos in CC. hey Jos, start of thread here:
https://mail.gnome.org/archives/tracker-list/2012-November/msg00009.html

> Sometimes, it's just that the system library was updated and now our
> extractor crashes. Sometimes, it's problematic files which cause crashes.
> That's why we use a daemon/program to do extraction, because the people
> using the extractor don't die. I think making this into a library
> presents some interesting  situations we would need to consider like that.

I fully agree.

> > I think whoever starts with improving libtracker-extract in this
> > direction, perhaps by renaming, copying or refactoring to a new library
> > the API tracker_extract_client_get_metadata, will revive the project to
> > its original glory.
> 
> I don't really view the project as "loosing" it's glory. It's just 
> slowed down, matured even you could say.

Yes ok. Still, it had more glory a few years ago. I think :-)

Kind regards,

Philip

-- 


Philip Van Hoof
Software developer
Codeminded BVBA - http://codeminded.be

_______________________________________________
tracker-list mailing list
[email protected]
https://mail.gnome.org/mailman/listinfo/tracker-list

Re: [Tracker] Reviving the project, a first attempt

Reply via email to