Hi Matt, Thanks for your reply. You definitely suggested a lot options to try. To tell you about my project: I'm running MySQL community version 5.6 (I think). I'm running the software on a 2015 Macbook Pro, OS X 10.12.4 (Sierra).
The database contains electronic dental records about procedures, patients, providers, etc. For example, I have a provider table that contains (among other things) the provider's id, gender, birth date (let's focus on just these attributes). When translating this data into RDF, I take records out of table and transform them into rdf triples having a subject, predicate, and object. Suppose I have the following record in my provider table: provider_id gender birth_date 1001 F 01/01/1980 This record would (or could) translate into the following three triples: @prefixt ex: <http://example.com/> # means ex: is shorthand http://example.com ex:provider_1001 rdf:type ex:Provider . # provider 1001 is a type/instance of provider ex:provider_1001 ex:has_gender :Female . # provider 1001 has female gender ex:provider_1001 ex:has_birth_date 01/01/1980 . # birth date of provider 1001 Different translations are, of course, possible depending on how you define your data. After data in the provider has been translated, I can then load the output into a triple store and query the information using SPARQL. Right now I'm only looking into doing this translation once and then loading it into the triple store, but exploring incremental translations would be worthwhile too. In RDF, you often associate the identifier of an entity (e.g., ex:provider_1001) with a human readable label, e.g.: ex:provider_1001 rdfs:label 'Jane Doe' . If the labels are unique, I find it helpful to define a function (in Python) that returns label associated with the identifier (called a URI or IRI). For example: label2uri('Jane Doe') -> returns ex:provider_1001 Is it possible to incorporate such functions into the workflow? Also, I am having problems getting nifi to start. According to the README file, I start nifi using: nifi.sh start But after I do this localhost/nifi doesn't connect to the server. Am I supposed to be running another command? For instance: nifi.sh run nifi.sh install And what does nifi.sh dump do? Thanks for your help. I really appreciate it!! Bill On Thu, Apr 13, 2017 at 7:29 PM, Matt Burgess <[email protected]> wrote: > Bill, > > Can you share a little bit more detail as to your database setup? > What kind of database (MySQL, Oracle, Postgres, e.g.) is it, and what > does your table look like? Are you looking to do this once, or > periodically, or incrementally as new rows are added? If > incrementally, is there a column that is always being increased (like > a primary key / ID column or timestamp)? > > In general you'll want to set up a DBCPConnectionPool controller > service, which gives processors connections to the database. Then you > could use ExecuteSQL, QueryDatabaseTable, or perhaps another > SQL-related processor to fetch the data. The aforementioned processors > will require a reference to the DBCPConnectionPool you set up, then > they can be configured to execute a SQL statement (in the case of > ExecuteSQL) or can incrementally fetch "new" rows from a specified > table (with QueryDatabaseTable). These processors output the rows as > an Avro-formatted file. Often to manipulate the contents you'd want to > convert the file to JSON using the ConvertAvroToJSON processor, then > often you want to deal with each row/record at a time, so you can use > SplitJson (alternatively after the SQL processor you can use SplitAvro > then ConvertAvroToJSON). Depending on how many fields are in each row > (I'm going to assume 3 for Turtle), you can use EvaluateJsonPath to > get each field/column value into an attribute, then ReplaceText to set > the values in Turtle format. > > There is also an ExecuteStreamCommand processor where you could shell > out to your Python script, or if it is a pure Python script, you could > paste it into an ExecuteScript processor (using "python" as the engine > which is actually Jython). Not sure if any of these approaches will > give you better performance except that you could perform some of > these operations concurrently (or in parallel if using a NiFi > cluster). > > Regards, > Matt > > On Thu, Apr 13, 2017 at 11:55 AM, Bill Duncan <[email protected]> wrote: > > Sorry if this is a duplicate message ... > > > > I am quite interested in the Nifi software and I've been watching the > > videos. However, I can't seem to connect to a database and extract > records. > > My main goal is to be able to take take records from a database and > convert > > them into RDF (Turtle). I already do this using Python, but I was hoping > > that Nifi could speed up the translation process. But, before I start > > translating records, I'm just trying to connect and extract. > > > > Any help would be much appreciated! > > > > Thanks, > > Bill > > >
