Re: translating records from MySQL database to Turtle

Matt Burgess Fri, 14 Apr 2017 19:44:25 -0700

Bill,

If the information in the triple from the DB is enough to generate the
Turtle file (using outside data like the @prefixt ex stuff), then
ReplaceText should do the trick; if you need additional information
supplied by the user and/or some third-party connection, then the
scripting processors (ExecuteScript [1], InvokeScriptedProcessor [2])
are probably the way to go, especially since you seem familiar with
the Python scripts that were doing the job before, and because the
label2uri function is helpful and reusable :)  Please reach out if you
have any issues or questions while setting this up.


Regards,
Matt

[1] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.script.ExecuteScript/index.html
[2] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.script.InvokeScriptedProcessor/index.html

P.S. For your "having problems getting nifi to start", by default it
starts on port 8080, do you have something else running on that port
(i.e. anything with default Jetty/Tomcat settings will start on the
same port)? If you do, you can change the "nifi.web.http.port"
property in conf/nifi.properties to some unused port, and use
http://localhost:<that_port>/nifi to connect.

P.P.S.  "nifi.sh dump" will take a thread dump of the Java Virtual
Machine running NiFi; this is often useful when your instance(s) of
NiFi are hanging or otherwise misbehaving (but not crashing :)

On Thu, Apr 13, 2017 at 10:58 PM, Bill Duncan <[email protected]> wrote:
> Hi Matt,
>
> Thanks for your reply. You definitely suggested a lot options to try. To
> tell you about my project: I'm running MySQL community version 5.6 (I
> think). I'm running the software on a 2015 Macbook Pro, OS X 10.12.4
> (Sierra).
>
> The database contains electronic dental records about procedures, patients,
> providers, etc. For example, I have a provider table that contains (among
> other things) the provider's id, gender, birth date (let's focus on just
> these attributes). When translating this data into RDF, I take records out
> of table and transform them into rdf triples having a subject, predicate,
> and object.
>
> Suppose I have the following record in my provider table:
>
> provider_id gender birth_date
> 1001           F         01/01/1980
>
> This record would (or could) translate into the following three triples:
>
> @prefixt ex: <http://example.com/>                        # means ex: is
> shorthand http://example.com
>
> ex:provider_1001 rdf:type ex:Provider .                  # provider 1001 is
> a type/instance of provider
> ex:provider_1001 ex:has_gender :Female .            # provider 1001 has
> female gender
> ex:provider_1001 ex:has_birth_date 01/01/1980 .  # birth date of provider
> 1001
>
>
> Different translations are, of course, possible depending on how you define
> your data. After data in the provider has been translated, I can then load
> the output into a triple store and query the information using SPARQL.
>
> Right now I'm only looking into doing this translation once and then loading
> it into the triple store, but exploring incremental translations would be
> worthwhile too.
>
> In RDF, you often associate the identifier of an entity (e.g.,
> ex:provider_1001) with a human readable label, e.g.:
>
> ex:provider_1001 rdfs:label 'Jane Doe' .
>
>
> If the labels are unique, I find it helpful to define a function (in Python)
> that returns label associated with the identifier (called a URI or IRI). For
> example:
>
> label2uri('Jane Doe') -> returns ex:provider_1001
>
>
> Is it possible to incorporate such functions into the workflow?
>
> Also, I am having problems getting nifi to start. According to the README
> file, I start nifi using:
>
> nifi.sh start
>
>
> But after I do this localhost/nifi doesn't connect to the server. Am I
> supposed to be running another command? For instance:
>
> nifi.sh run
> nifi.sh install
>
> And what does nifi.sh dump do?
>
> Thanks for your help. I really appreciate it!!
>
> Bill
>
>
>
>
> On Thu, Apr 13, 2017 at 7:29 PM, Matt Burgess <[email protected]> wrote:
>>
>> Bill,
>>
>> Can you share a little bit more detail as to your database setup?
>> What kind of database (MySQL, Oracle, Postgres, e.g.) is it, and what
>> does your table look like?  Are you looking to do this once, or
>> periodically, or incrementally as new rows are added? If
>> incrementally, is there a column that is always being increased (like
>> a primary key / ID column or timestamp)?
>>
>> In general you'll want to set up a DBCPConnectionPool controller
>> service, which gives processors connections to the database.  Then you
>> could use ExecuteSQL, QueryDatabaseTable, or perhaps another
>> SQL-related processor to fetch the data. The aforementioned processors
>> will require a reference to the DBCPConnectionPool you set up, then
>> they can be configured to execute a SQL statement (in the case of
>> ExecuteSQL) or can incrementally fetch "new" rows from a specified
>> table (with QueryDatabaseTable).  These processors output the rows as
>> an Avro-formatted file. Often to manipulate the contents you'd want to
>> convert the file to JSON using the ConvertAvroToJSON processor, then
>> often you want to deal with each row/record at a time, so you can use
>> SplitJson (alternatively after the SQL processor you can use SplitAvro
>> then ConvertAvroToJSON). Depending on how many fields are in each row
>> (I'm going to assume 3 for Turtle), you can use EvaluateJsonPath to
>> get each field/column value into an attribute, then ReplaceText to set
>> the values in Turtle format.
>>
>> There is also an ExecuteStreamCommand processor where you could shell
>> out to your Python script, or if it is a pure Python script, you could
>> paste it into an ExecuteScript processor (using "python" as the engine
>> which is actually Jython). Not sure if any of these approaches will
>> give you better performance except that you could perform some of
>> these operations concurrently (or in parallel if using a NiFi
>> cluster).
>>
>> Regards,
>> Matt
>>
>> On Thu, Apr 13, 2017 at 11:55 AM, Bill Duncan <[email protected]> wrote:
>> > Sorry if this is a duplicate message ...
>> >
>> > I am quite interested in the Nifi software and I've been watching the
>> > videos. However, I can't seem to connect to a database and extract
>> > records.
>> > My main goal is to be able to take take records from a database and
>> > convert
>> > them into RDF (Turtle). I already do this using Python, but I was hoping
>> > that Nifi could speed up the translation process. But, before I start
>> > translating records, I'm just trying to connect and extract.
>> >
>> > Any help would be much appreciated!
>> >
>> > Thanks,
>> > Bill
>> >
>
>

Re: translating records from MySQL database to Turtle

Reply via email to