Hi Aldo,
On Mar/30/10 1:46 am, Aldo Bucchi wrote:
Hi David,
I love it and I NEED it ;)
Awesome work, really.
I heard it will be opensource so I will probably be able to extend it
myself,
Yup, it'll be open source. Clean data sets are all clean the same way,
but each dirty data set is dirty in its own way. Which is why Gridworks
needs all the open source contributions in order to cover as many
different kinds of data dirtiness as possible. :-)
but here are some ideas for (missing?) features:
* Importing custom Lookups/Dictionaries ( to go from text to IDs or
the other way around ). Maybe this is possible using a different hook
for the reconciliation mechanism.
* Related: Plug in other reconciliation services ( not sure how this
stands up to freebase biz alignment )
Definitely. Right now Gridworks is hooked up to 2 services: the Freebase
text search service (called "relevance") and the experimental proper
reconciliation service. It makes sense to be able to plug in other
services as well.
* Command line engine. To add a GW project as a step in a traditional
transformation job and execute steps sequentially.
We've thought of that, too, but haven't implemented it. That shouldn't
be too hard.
* Expose Gazetteers ( dictionaries ) generated within the tool ( when
equating facets )
That makes sense. I'll think more about how to support that.
David