Re: Output Connector - Apache Marmotta

Karl Wright Wed, 09 Sep 2015 10:54:08 -0700

Hi Joshua,

"My question is; why would I need to setup different transform modules?
Since there is no real config to do in the transform connector (all the
good stuff seems to be under Task config) I'm not sure why I would need to
make more than one and keep reusing it by changing the transform paeans
under task?"


While the Metadata Adjuster transformer has no configuration, the model
that MCF uses for transformers is just like the model it uses for other
kinds of connectors.  Pretend for a moment that you needed to call an
external system to do content extraction, then you will see the point.

Thanks,
Karl


On Wed, Sep 9, 2015 at 12:55 PM, Joshua Dunham <[email protected]>
wrote:

> Hi Karl, Rafa,
>
>   I finally had some time to work on this and I have a scheme which
> (largely) works very well but I have some question, one stumbling block,
> and one comment.
>
> First, my environment consists of, Manifold v 2.1, MariaDB which I
> imported a small CSV into for testing, and Marmotta 3.3.
>
> The real interesting bits are in specifying the Task. I have the mySQL
> input -> metadata adjuster -> filesystem output. mySQL is setup and the
> connection shows as OK and on starting the job, it does write files to the
> output folder.
>
> Getting the list of ID's works well no issue there, and I'm not using
> versioning or access tokens yet. The stumbling block has to do with setting
> up the Data Query and the best use of the $URL and $DATA variables. First:
> I've hijacked the $URL into ~ CONCAT("addresses/", id) AS $(URLCOLUMN)
> which has the effect of creating a folder called addresses in the root of
> the output folder. Inside of the addresses folder it makes numbered files
> corresponding to the rowID. I can point the root folder path at the
> marmotta import directory and even use the context templating feature
> (setting 'addresses' into the real context name). That's really slick for
> out of the box hack at integration.
>
> What is not apparent is how to use the metadata adjuster to interact with
> the variables in the Data query. I've followed the guide and made a simple
> hello, False, ${city} statement but the only bits that are written into the
> file are the contents of the $DATACOLUMN variable. So, given a simple
> address book in a database with columns, id, street, city, region, country,
> post code, latitude, longitude ... how should I approach making such a data
> query? My real use cases will be much much more complicated so I'm
> wondering if you have some explanation of how I should want to use that
> field and maybe a small SQL snippet example with those columns? :) My end
> goal is to have a column called out and then use the metadata adjuster to
> simply prepend each column's value with a string. So if the city is 'New
> York' it would write out city:New_York or the like.
>
> =====
>
> The comment was in regards to a bit of sample data which could ship with
> the source. It would be very educational if there was a complex but real
> configuration of ManifoldCF that links to a sqlite3 file as input and maybe
> the same one input db but a different table as output?
>
> =====
>
> My question is; why would I need to setup different transform modules?
> Since there is no real config to do in the transform connector (all the
> good stuff seems to be under Task config) I'm not sure why I would need to
> make more than one and keep reusing it by changing the transform paeans
> under task?
>
>
> Thank you!
>
> J
>
>
> > On 5 July 2015 at 17:27, Karl Wright <[email protected]> wrote:
> > Hi Joshua,
> >
> > My take:
> >
> > --> (A) How I define the data to grab, whether some SQL statement or the
> > like. <--
> >
> > Have a look at the user documentation here:
> >
> https://manifoldcf.apache.org/release/release-1.9/en_US/end-user-documentation.html#jdbcrepository
> >
> > It should be pretty clear how you define what you are looking for.
> >
> > --> (B) How to use this data as individual variables which I can arrange
> > into a linked data relationship (ManifoldCF mapping module?) <--
> >
> > Rafa's previous reply about the RepositoryDocument is appropriate.
> > Basically, an output connector will be handed one of those objects for
> every
> > MCF "document".  The javadoc for it is here:
> >
> >
> https://manifoldcf.apache.org/release/trunk/api/framework/org/apache/manifoldcf/agents/interfaces/RepositoryDocument.html
> >
> > --> (C) How difficult would it be to connect to Marmotta's webservice(s).
> > I'm not familiar with the exact mechanism, but I saw ManifoldCF has
> > support for elasticsearch so maybe I could put something together that
> > talks to Marmotta..<--
> >
> > You can readily write your own output connector.  There's a book, in
> fact,
> > describing how to do that.  See:
> >
> > https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs
> >
> > ... and read Chapter 9.
> >
> > Thanks,
> > Karl
> >
> >
> > On Sun, Jul 5, 2015 at 11:53 AM, Joshua Dunham <[email protected]>
> > wrote:
> >>
> >> That sounds promising. Would you recommend ManifoldCF for this? If so,
> >> do you know of any resources which I can use to get up to speed with
> >> using it in this way?
> >>
> >> -J
> >>
> >>> On 4 July 2015 at 21:48,  <[email protected]> wrote:
> >>> Hi Joshua,
> >>>
> >>> The ManifoldCF unit logic in terms of indexing is the Repository
> >>> Document
> >>> which, simplifying a lot, model a document composed by content plus
> >>> metadata
> >>> (key-value). It should be relative easy to tripifly that structure and
> >>> push
> >>> it to Marmotta using SPARQL update queries or Marmotta’s java client
> for
> >>> adding resources.
> >>> The Generic Database connector uses a set of queries for crawling the
> >>> database. You should have to use that queries to get you data. I’m not
> >>> completely sure if each record result is converted directly to a
> >>> Repository
> >>> Document, that is something that I would need to check.
> >>>
> >>> Hope that helps,
> >>> Cheers, Rafa
> >>>
> >>>
> >>>
> >>>
> >>> On Sun, Jul 5, 2015 at 2:56 AM, Joshua Dunham <[email protected]
> >
> >>> wrote:
> >>>>
> >>>> Hi ManifoldCF Users (and Devs)
> >>>>
> >>>> I'm wondering if ManifoldCF can work in my use case. I have some
> >>>> random mySQL and Oracle DB's that I would like to connect to and
> >>>> extract certain known bits of info, format them each a certain way and
> >>>> then store the info in Apache Marmotta [1]. Marmotta is an RDF triple
> >>>> store for linked data so I would need to parse and store the mySQL and
> >>>> Oracle DB's info into a linked format, which is no problem for me to
> >>>> create the relationships etc, I just need something that would let me
> >>>> specifically do this.
> >>>>
> >>>> From what I've read, ManifoldCF can connect to mySQL and Oracle
> >>>> (via non-distributed libraries), and store the results out in several
> >>>> target data stores. What isn't clear is
> >>>> (A) How I define the data to grab, whether some SQL statement or the
> >>>> like.
> >>>> (B) How to use this data as individual variables which I can arrange
> >>>> into a linked data relationship (ManifoldCF mapping module?)
> >>>> (C) How difficult would it be to connect to Marmotta's webservice(s).
> >>>> I'm not familiar with the exact mechanism, but I saw ManifoldCF has
> >>>> support for elasticsearch so maybe I could put something together that
> >>>> talks to Marmotta..
> >>>>
> >>>> Would this be possible? If so, could someone point me in the right
> >>>> direction?
> >>>>
> >>>> Thanks!
> >>>> -Joshua
> >>>>
> >>>>
> >>>> [1] - http://marmotta.apache.org/index.html
>
>

Re: Output Connector - Apache Marmotta

Reply via email to