Re: Output Connector - Apache Marmotta

Karl Wright Wed, 09 Sep 2015 13:21:20 -0700

Hi Joshua,

"What is not apparent is how to use the metadata adjuster to interact with
the variables in the Data query. I've followed the guide and made a simple
hello, False, ${city} statement but the only bits that are written into the
file are the contents of the $DATACOLUMN variable. So, given a simple
address book in a database with columns, id, street, city, region, country,
post code, latitude, longitude ... how should I approach making such a data
query? "


(1) In the JDBC connector, every column that you include in your data
query, which isn't one of the known ones like $DATACOLUMN, is treated as a
metadata value, with the name of the metadata value being the name of the
column.  There are some differences of behavior between different JDBC
drivers; for example, some JDBC drivers map the column names to upper
case.  See:
https://manifoldcf.apache.org/release/release-2.1/en_US/end-user-documentation.html#jdbcrepository

(2) The Metadata Adjuster takes what the JDBC connector outputs, and allows
you to manipulate the metadata values according to certain rules.  The
documentation is here:
https://manifoldcf.apache.org/release/release-2.1/en_US/end-user-documentation.html#metadataadjuster

If you don't seem to be making any progress, please provide the exact data
query you are using, and a screenshot or paste of the job view that
includes the metadata adjuster specification, so I can see what you are
trying to do more precisely.

Karl


On Wed, Sep 9, 2015 at 3:48 PM, Joshua Dunham <[email protected]>
wrote:

> Could you shed any light on the middle part,
>
> =====
>
> What is not apparent is how to use the metadata adjuster to interact with
> the variables in the Data query. I've followed the guide and made a simple
> hello, False, ${city} statement but the only bits that are written into the
> file are the contents of the $DATACOLUMN variable. So, given a simple
> address book in a database with columns, id, street, city, region, country,
> post code, latitude, longitude ... how should I approach making such a data
> query?
>
> My real use cases will be much much more complicated so I'm wondering if
> you have some explanation of how I should want to use that field and maybe
> a small SQL snippet example with those columns? :) My end goal is to have a
> column called out and then use the metadata adjuster to simply prepend each
> column's value with a string. So if the city is 'New York' it would write
> out city:New_York or the like.
>
> =====
> Thx in advance!
>
> -J
>
> On Sep 9, 2015, at 1:53 PM, Karl Wright <[email protected]> wrote:
>
> Hi Joshua,
>
> "My question is; why would I need to setup different transform modules?
> Since there is no real config to do in the transform connector (all the
> good stuff seems to be under Task config) I'm not sure why I would need to
> make more than one and keep reusing it by changing the transform paeans
> under task?"
>
> While the Metadata Adjuster transformer has no configuration, the model
> that MCF uses for transformers is just like the model it uses for other
> kinds of connectors.  Pretend for a moment that you needed to call an
> external system to do content extraction, then you will see the point.
>
> Thanks,
> Karl
>
>
> On Wed, Sep 9, 2015 at 12:55 PM, Joshua Dunham <[email protected]>
> wrote:
>
>> Hi Karl, Rafa,
>>
>>   I finally had some time to work on this and I have a scheme which
>> (largely) works very well but I have some question, one stumbling block,
>> and one comment.
>>
>> First, my environment consists of, Manifold v 2.1, MariaDB which I
>> imported a small CSV into for testing, and Marmotta 3.3.
>>
>> The real interesting bits are in specifying the Task. I have the mySQL
>> input -> metadata adjuster -> filesystem output. mySQL is setup and the
>> connection shows as OK and on starting the job, it does write files to the
>> output folder.
>>
>> Getting the list of ID's works well no issue there, and I'm not using
>> versioning or access tokens yet. The stumbling block has to do with setting
>> up the Data Query and the best use of the $URL and $DATA variables. First:
>> I've hijacked the $URL into ~ CONCAT("addresses/", id) AS $(URLCOLUMN)
>> which has the effect of creating a folder called addresses in the root of
>> the output folder. Inside of the addresses folder it makes numbered files
>> corresponding to the rowID. I can point the root folder path at the
>> marmotta import directory and even use the context templating feature
>> (setting 'addresses' into the real context name). That's really slick for
>> out of the box hack at integration.
>>
>> What is not apparent is how to use the metadata adjuster to interact with
>> the variables in the Data query. I've followed the guide and made a simple
>> hello, False, ${city} statement but the only bits that are written into the
>> file are the contents of the $DATACOLUMN variable. So, given a simple
>> address book in a database with columns, id, street, city, region, country,
>> post code, latitude, longitude ... how should I approach making such a data
>> query? My real use cases will be much much more complicated so I'm
>> wondering if you have some explanation of how I should want to use that
>> field and maybe a small SQL snippet example with those columns? :) My end
>> goal is to have a column called out and then use the metadata adjuster to
>> simply prepend each column's value with a string. So if the city is 'New
>> York' it would write out city:New_York or the like.
>>
>> =====
>>
>> The comment was in regards to a bit of sample data which could ship with
>> the source. It would be very educational if there was a complex but real
>> configuration of ManifoldCF that links to a sqlite3 file as input and maybe
>> the same one input db but a different table as output?
>>
>> =====
>>
>> My question is; why would I need to setup different transform modules?
>> Since there is no real config to do in the transform connector (all the
>> good stuff seems to be under Task config) I'm not sure why I would need to
>> make more than one and keep reusing it by changing the transform paeans
>> under task?
>>
>>
>> Thank you!
>>
>> J
>>
>>
>> > On 5 July 2015 at 17:27, Karl Wright <[email protected]> wrote:
>> > Hi Joshua,
>> >
>> > My take:
>> >
>> > --> (A) How I define the data to grab, whether some SQL statement or the
>> > like. <--
>> >
>> > Have a look at the user documentation here:
>> >
>> https://manifoldcf.apache.org/release/release-1.9/en_US/end-user-documentation.html#jdbcrepository
>> >
>> > It should be pretty clear how you define what you are looking for.
>> >
>> > --> (B) How to use this data as individual variables which I can arrange
>> > into a linked data relationship (ManifoldCF mapping module?) <--
>> >
>> > Rafa's previous reply about the RepositoryDocument is appropriate.
>> > Basically, an output connector will be handed one of those objects for
>> every
>> > MCF "document".  The javadoc for it is here:
>> >
>> >
>> https://manifoldcf.apache.org/release/trunk/api/framework/org/apache/manifoldcf/agents/interfaces/RepositoryDocument.html
>> >
>> > --> (C) How difficult would it be to connect to Marmotta's
>> webservice(s).
>> > I'm not familiar with the exact mechanism, but I saw ManifoldCF has
>> > support for elasticsearch so maybe I could put something together that
>> > talks to Marmotta..<--
>> >
>> > You can readily write your own output connector.  There's a book, in
>> fact,
>> > describing how to do that.  See:
>> >
>> > https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs
>> >
>> > ... and read Chapter 9.
>> >
>> > Thanks,
>> > Karl
>> >
>> >
>> > On Sun, Jul 5, 2015 at 11:53 AM, Joshua Dunham <[email protected]
>> >
>> > wrote:
>> >>
>> >> That sounds promising. Would you recommend ManifoldCF for this? If so,
>> >> do you know of any resources which I can use to get up to speed with
>> >> using it in this way?
>> >>
>> >> -J
>> >>
>> >>> On 4 July 2015 at 21:48,  <[email protected]> wrote:
>> >>> Hi Joshua,
>> >>>
>> >>> The ManifoldCF unit logic in terms of indexing is the Repository
>> >>> Document
>> >>> which, simplifying a lot, model a document composed by content plus
>> >>> metadata
>> >>> (key-value). It should be relative easy to tripifly that structure and
>> >>> push
>> >>> it to Marmotta using SPARQL update queries or Marmotta’s java client
>> for
>> >>> adding resources.
>> >>> The Generic Database connector uses a set of queries for crawling the
>> >>> database. You should have to use that queries to get you data. I’m not
>> >>> completely sure if each record result is converted directly to a
>> >>> Repository
>> >>> Document, that is something that I would need to check.
>> >>>
>> >>> Hope that helps,
>> >>> Cheers, Rafa
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Sun, Jul 5, 2015 at 2:56 AM, Joshua Dunham <
>> [email protected]>
>> >>> wrote:
>> >>>>
>> >>>> Hi ManifoldCF Users (and Devs)
>> >>>>
>> >>>> I'm wondering if ManifoldCF can work in my use case. I have some
>> >>>> random mySQL and Oracle DB's that I would like to connect to and
>> >>>> extract certain known bits of info, format them each a certain way
>> and
>> >>>> then store the info in Apache Marmotta [1]. Marmotta is an RDF triple
>> >>>> store for linked data so I would need to parse and store the mySQL
>> and
>> >>>> Oracle DB's info into a linked format, which is no problem for me to
>> >>>> create the relationships etc, I just need something that would let me
>> >>>> specifically do this.
>> >>>>
>> >>>> From what I've read, ManifoldCF can connect to mySQL and Oracle
>> >>>> (via non-distributed libraries), and store the results out in several
>> >>>> target data stores. What isn't clear is
>> >>>> (A) How I define the data to grab, whether some SQL statement or the
>> >>>> like.
>> >>>> (B) How to use this data as individual variables which I can arrange
>> >>>> into a linked data relationship (ManifoldCF mapping module?)
>> >>>> (C) How difficult would it be to connect to Marmotta's webservice(s).
>> >>>> I'm not familiar with the exact mechanism, but I saw ManifoldCF has
>> >>>> support for elasticsearch so maybe I could put something together
>> that
>> >>>> talks to Marmotta..
>> >>>>
>> >>>> Would this be possible? If so, could someone point me in the right
>> >>>> direction?
>> >>>>
>> >>>> Thanks!
>> >>>> -Joshua
>> >>>>
>> >>>>
>> >>>> [1] - http://marmotta.apache.org/index.html
>>
>>
>

Re: Output Connector - Apache Marmotta

Reply via email to