Bill, Can you share a little bit more detail as to your database setup? What kind of database (MySQL, Oracle, Postgres, e.g.) is it, and what does your table look like? Are you looking to do this once, or periodically, or incrementally as new rows are added? If incrementally, is there a column that is always being increased (like a primary key / ID column or timestamp)?
In general you'll want to set up a DBCPConnectionPool controller service, which gives processors connections to the database. Then you could use ExecuteSQL, QueryDatabaseTable, or perhaps another SQL-related processor to fetch the data. The aforementioned processors will require a reference to the DBCPConnectionPool you set up, then they can be configured to execute a SQL statement (in the case of ExecuteSQL) or can incrementally fetch "new" rows from a specified table (with QueryDatabaseTable). These processors output the rows as an Avro-formatted file. Often to manipulate the contents you'd want to convert the file to JSON using the ConvertAvroToJSON processor, then often you want to deal with each row/record at a time, so you can use SplitJson (alternatively after the SQL processor you can use SplitAvro then ConvertAvroToJSON). Depending on how many fields are in each row (I'm going to assume 3 for Turtle), you can use EvaluateJsonPath to get each field/column value into an attribute, then ReplaceText to set the values in Turtle format. There is also an ExecuteStreamCommand processor where you could shell out to your Python script, or if it is a pure Python script, you could paste it into an ExecuteScript processor (using "python" as the engine which is actually Jython). Not sure if any of these approaches will give you better performance except that you could perform some of these operations concurrently (or in parallel if using a NiFi cluster). Regards, Matt On Thu, Apr 13, 2017 at 11:55 AM, Bill Duncan <[email protected]> wrote: > Sorry if this is a duplicate message ... > > I am quite interested in the Nifi software and I've been watching the > videos. However, I can't seem to connect to a database and extract records. > My main goal is to be able to take take records from a database and convert > them into RDF (Turtle). I already do this using Python, but I was hoping > that Nifi could speed up the translation process. But, before I start > translating records, I'm just trying to connect and extract. > > Any help would be much appreciated! > > Thanks, > Bill >
