Re: Maximum-value Columns on QueryDatabaseTable

2018-01-23 Thread Alberto Bengoa
Hello Koji,

That's nice! Thanks for your contribution! I'm looking forward to have this
patch applied to the next stable version.

I've "workarounded" it renaming the odd column within a view on the source
table.

Cheers!



--
Alberto Bengoa
Fones: +55 51 3024-3568 | +55 11 4063-8864 |+55 92 3090-0115 | +1 212
202-1454
Propus - TI alinhada a negócios
Service | Telecom | Tech | Data Science
www.propus.com.br

On Tue, Jan 23, 2018 at 5:58 AM, Koji Kawamura <ijokaruma...@gmail.com>
wrote:

> Hi Alberto,
>
> Thanks for reporting the issue, I was able to reproduce the behavior
> you described.
> Although it's for Microsoft SQL Server, there has been an existing
> JIRA for the same issue, NIFI-4393.
> https://issues.apache.org/jira/browse/NIFI-4393
>
> I've created a Pull Request to fix MS SQL square brackets and MySQL
> back-ticks as well as generic escape with double quotes.
>
> I hope the fix to be merged and available soon.
>
> Thanks,
> Koji
>
> On Fri, Jan 5, 2018 at 5:58 AM, Alberto Bengoa <albe...@propus.com.br>
> wrote:
> > Hello Folks,
> >
> > Not sure if I'm running on a bug, but I'm facing a situation when I try
> to
> > use a "not compliance" column name as my Maximum-value Column.
> >
> > First, I've tried to use a column named _Time-Stamp (underscore at the
> > beginning + hyphen on the middle). This column creates a state like this:
> >
> > "man_fabrica-cdc"@!@_time-stamp 2018-01-04 15:58:07.877 Cluster
> >
> > I was wondering if wouldn't QueryDatabaseTable works with Timestamp
> fields
> > as Maximum-value Column. So, I've changed to another column to make a try
> > (column name _Change-Sequence), and got this state:
> >
> > "man_fabrica-cdc"@!@_change-sequence 252254 Cluster
> >
> > Enabling Nifi debug I see that no "WHERE" clause was passed when
> > Maximum-value Column is filled with quotes ("_my-strange-column-name").
> On
> > the other hand, if I do not wrap the odd column name with quotes I got an
> > error message like this from JDBC Driver:
> >
> > nifi-app_2017-12-12_11.0.log:Caused by: java.sql.SQLException:
> > [DataDirect][OpenEdge JDBC Driver][OpenEdge] Syntax error in SQL
> statement
> > at or about "_Time-Stamp FROM "man_fabrica-cdc" WHERE" (10713)
> >
> > I'm using "Normalize Table/Column Names" as suggested here [1].
> >
> > [1] -
> > http://apache-nifi-users-list.2361937.n4.nabble.com/
> Hyphenated-Tables-and-Columns-names-td3650.html#a3655
> >
> > Thanks!
> >
> > Alberto
>


Possible bug - ConvertJSONToSQL

2018-01-08 Thread Alberto Bengoa
Hey guys,

I'm suspecting that I found a bug on ConvertJSONToSQL (just for DELETE
statements).

I was getting incomplete values for attributes like this:

sql.args.10.type 91
sql.args.10.value 2016

This field is a date type so the attributes should be like this:

sql.args.10.type 91
sql.args.10.value 2016-01-18

I haven't tested another data types/values because I was rushing.

I've changed fieldValue variable to use createSqlStringValue method on
ConvertJSONToSQL.java (like for INSERT and UPDATE statements), rebuild the
nar and it seems working now.

The small patch is attached to this email.

I'm not posting on the Devs Lists because I'm not sure what are the impacts
of this changes neither the correct path to submit bugs.

Thanks,
Alberto
--- /root/ConvertJSONToSQL-20170105-before-delete-patch.java2018-01-05 
21:20:24.031309162 -0200
+++ 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ConvertJSONToSQL.java
   2018-01-05 21:53:08.658887605 -0200
@@ -784,11 +784,8 @@
 
 final Integer colSize = desc.getColumnSize();
 final JsonNode fieldNode = rootNode.get(fieldName);
-if (!fieldNode.isNull()) {
-String fieldValue = fieldNode.asText();
-if (colSize != null && fieldValue.length() > colSize) {
-fieldValue = fieldValue.substring(0, colSize);
-}
+if (!fieldNode.isNull()) { 
+String fieldValue = createSqlStringValue(fieldNode, 
colSize, sqlType);
 attributes.put(attributePrefix + ".args." + fieldCount + 
".value", fieldValue);
 }
 }


Maximum-value Columns on QueryDatabaseTable

2018-01-04 Thread Alberto Bengoa
Hello Folks,

Not sure if I'm running on a bug, but I'm facing a situation when I try to
use a "not compliance" column name as my Maximum-value Column.

First, I've tried to use a column named _Time-Stamp (underscore at the
beginning + hyphen on the middle). This column creates a state like this:

"man_fabrica-cdc"@!@_time-stamp 2018-01-04 15:58:07.877 Cluster

I was wondering if wouldn't QueryDatabaseTable works with Timestamp fields
as Maximum-value Column. So, I've changed to another column to make a try
(column name _Change-Sequence), and got this state:

"man_fabrica-cdc"@!@_change-sequence 252254 Cluster

Enabling Nifi debug I see that no "WHERE" clause was passed when
Maximum-value Column is filled with quotes ("_my-strange-column-name"). On
the other hand, if I do not wrap the odd column name with quotes I got an
error message like this from JDBC Driver:

nifi-app_2017-12-12_11.0.log:Caused by: java.sql.SQLException:
[DataDirect][OpenEdge JDBC Driver][OpenEdge] Syntax error in SQL statement
at or about "_Time-Stamp FROM "man_fabrica-cdc" WHERE" (10713)

I'm using "Normalize Table/Column Names" as suggested here [1].

[1] -
http://apache-nifi-users-list.2361937.n4.nabble.com/Hyphenated-Tables-and-Columns-names-td3650.html#a3655

Thanks!

Alberto


Re: ConvertJSONToSQL empty VALUES fields

2017-12-09 Thread Alberto Bengoa
Hello Matt,

Hmmm. I was expecting to have the entire SQL statement on the SQL
relationship. Actually, I was planning to keep this flow "flowing" and use
ReplaceText to adjust the SQL statement to persist data on Phoenix/HBase
too. Bad news to me. :-(

Regarding NIFI-4071, I had seen this issue before. It looks to not affect
me. I'm not sure if it's allowed to attach files here. If you don't mind I
could send you screenshots of my configurations to your email.

I have around 100 columns on each table. NIFI-4684 will come in handy. I
would test your patch. Can you point me how could I build/compile a new
version of this processor?

Thanks!
Alberto

On Fri, Dec 8, 2017 at 11:56 PM, Matt Burgess <mattyb...@apache.org> wrote:

> Alberto,
>
> This came up the other day as well, the generated SQL is a Prepared
> Statement, which allows the code to use the same statement but then
> just set different values based on "parameters". In this case the
> values for the parameters are stored in "positional" flow file
> attributes for the statement, so for INSERT something like
> "sql.args.1.value = 7501". If you use PutSQL, it will be looking for
> these attributes and things should work fine. However PutSQL doesn't
> support Hive AFAIK, which is why there's a PutHiveQL. Unfortunately,
> PutHiveQL is expecting those attributes in the form
> "hiveql.args.1.value", with a "hiveql" prefix instead of "sql".
>
> I'm curious as to what DBCPConnectionPool you are using to configure
> your ConvertJSONToSQL processor, given that your target database is
> Hive. It used to be that using Hive as the target would give an error
> (NIFI-4071 [1]). If this is no longer the case somehow, we should
> update that Jira.
>
> One option (if there are a small number of known parameters) is to use
> UpdateAttribute to store the sql.args.*.* attributes into
> hiveql.args.*.* attributes. You can also use ExecuteScript to
> accomplish this for arbitrary numbers of attributes/parameters.  I
> have just written NIFI-4684 [2] to cover the addition of a property to
> ConvertJSONToSQL that will let you specify the attribute prefix. It
> would presumably default to "sql" to maintain current behavior but
> could be changed by the user to "hiveql" if desired.
>
> Regards,
> Matt
>
> [1] https://issues.apache.org/jira/browse/NIFI-4071
> [2] https://issues.apache.org/jira/browse/NIFI-4684
>
> On Fri, Dec 8, 2017 at 4:45 PM, Alberto Bengoa <albe...@propus.com.br>
> wrote:
> > Hey Folks,
> >
> > I'm having some problems with ConvertJSONToSQL processor.
> >
> > I'm ingesting a JSON like this:
> >
> > {
> >   "_Time_Stamp" : 1512146156211,
> >   "_Operation" : 4,
> >   "cdn_fabrica" : 7501,
> >   "char_1" : "Value 1",
> >   "char_2" : null
> > }
> >
> > On the SQL relationship I got a query like this:
> >
> > UPDATE progress_cad2esp.man_fabrica SET char_1 = ?, char_2 = ? WHERE
> > cdn_fabrica = ?
> >
> > Even trying an INSERT query I got something like this:
> >
> > INSERT INTO progress_cad2esp.man_fabrica (cdn_fabrica, char_1, char_2)
> > VALUES (?, ?, ?)
> >
> > My current flow is: QueryDatabaseTable -> ConvertAvroToJson -> SplitJson
> ->
> > ExtractText -> RouteOnAttribute -> ConvertJSONToSQL
> >
> > My target database is on Hive.
> >
> > I read a lot on Google about problems with this processor and Hive, but
> I'm
> > not sure if it is not solved on Hive 1.2.0.
> >
> > Any idea?
> >
> > Tks,
> > Alberto
>


ConvertJSONToSQL empty VALUES fields

2017-12-08 Thread Alberto Bengoa
Hey Folks,

I'm having some problems with ConvertJSONToSQL processor.

I'm ingesting a JSON like this:

{
  "_Time_Stamp" : 1512146156211,
  "_Operation" : 4,
  "cdn_fabrica" : 7501,
  "char_1" : "Value 1",
  "char_2" : null
}

On the SQL relationship I got a query like this:

UPDATE progress_cad2esp.man_fabrica SET char_1 = ?, char_2 = ? WHERE
cdn_fabrica = ?

Even trying an INSERT query I got something like this:

INSERT INTO progress_cad2esp.man_fabrica (cdn_fabrica, char_1, char_2)
VALUES (?, ?, ?)

My current flow is: QueryDatabaseTable -> ConvertAvroToJson -> SplitJson ->
ExtractText -> RouteOnAttribute -> ConvertJSONToSQL

My target database is on Hive.

I read a lot on Google about problems with this processor and Hive, but I'm
not sure if it is not solved on Hive 1.2.0.

Any idea?

Tks,
Alberto


Re: Hyphenated Tables and Columns names

2017-12-06 Thread Alberto Bengoa
Matt,

Not sure if it's related.

I'm trying to use a timestamp column as Maximum-value Column, but it keeps
looping.

I have set Use Avro Logical Types = true on my QueryDatabase processor.

The original columns values are like this:

_Time-Stamp
--
2017-12-01 14:35:56:204 - 02:00
2017-12-01 14:35:56:211 - 02:00
2017-12-01 15:25:35:945 - 02:00
2017-12-01 15:25:35:945 - 02:00
2017-12-01 15:28:23:046 - 02:00

So I'm converting to timestamp milis using CAST("_Time-Stamp" as
TIMESTAMP)"_Time-Stamp"

_Time-Stamp
-
2017-12-01 14:35:56.204
2017-12-01 14:35:56.211
2017-12-01 15:25:35.945
2017-12-01 15:25:35.945
2017-12-01 15:28:23.046

The state seems right:

KeyValueScope
pub."man_fabrica-cdc"@!@_time-stamp  2017-12-04 15:33:23.995
 Cluster

Any clue?

Thank you!

Alberto

On Wed, Dec 6, 2017 at 4:06 PM, Alberto Bengoa <albe...@propus.com.br>
wrote:

> Matt,
>
> Perfect! Enabled and working now.
>
> Thank you!
>
> Cheers,
> Alberto
>
>
>
> On Wed, Dec 6, 2017 at 3:54 PM, Matt Burgess <mattyb...@apache.org> wrote:
>
>> Alberto,
>>
>> What version of NiFi are you using? As of version 1.1.0,
>> QueryDatabaseTable has a "Normalize Table/Column Names" property that
>> you can set to true, and it will replace all Avro-illegal characters
>> with underscores.
>>
>> Regards,
>> Matt
>>
>>
>> On Wed, Dec 6, 2017 at 12:06 PM, Alberto Bengoa <albe...@propus.com.br>
>> wrote:
>> > Hey Folks,
>> >
>> > I'm facing an odd situation with Nifi and Tables / Columns that have
>> hyphens
>> > on names (traceback below).
>> >
>> > I found on Avro Spec [1] that hyphens are not allowed, which makes
>> sense to
>> > have this error.
>> >
>> > There is any way to deal with this situation on Nifi instead of changing
>> > table/columns name or creating views to rename the hyphenated names?
>> >
>> > I'm getting this error on the first processor (QueryDatabaseTable) of my
>> > flow.
>> >
>> > Thanks!
>> >
>> > Alberto Bengoa
>> >
>> > [1] - https://avro.apache.org/docs/1.7.7/spec.html#Names
>> >
>> >
>> > 2017-12-06 14:37:25,809 ERROR [Timer-Driven Process Thread-2]
>> > o.a.n.p.standard.QueryDatabaseTable
>> > QueryDatabaseTable[id=9557387b-bbd6-1b2f-b68b-5a4458986794] Unable to
>> > execute SQL select query SELECT "_Change-Sequence" FROM
>> PUB.man_factory_cdc
>> > due to org.apache.nifi.processor.exception.ProcessException: Error
>> during
>> > database query or conversion of records to Avro.: {}
>> > org.apache.nifi.processor.exception.ProcessException: Error during
>> database
>> > query or conversion of records to Avro.
>> > at
>> > org.apache.nifi.processors.standard.QueryDatabaseTable.lambd
>> a$onTrigger$0(QueryDatabaseTable.java:289)
>> > at
>> > org.apache.nifi.controller.repository.StandardProcessSession
>> .write(StandardProcessSession.java:2526)
>> > at
>> > org.apache.nifi.processors.standard.QueryDatabaseTable.onTri
>> gger(QueryDatabaseTable.java:283)
>> > at
>> > org.apache.nifi.controller.StandardProcessorNode.onTrigger(S
>> tandardProcessorNode.java:1118)
>> > at
>> > org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask
>> .call(ContinuallyRunProcessorTask.java:147)
>> > at
>> > org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask
>> .call(ContinuallyRunProcessorTask.java:47)
>> > at
>> > org.apache.nifi.controller.scheduling.TimerDrivenSchedulingA
>> gent$1.run(TimerDrivenSchedulingAgent.java:132)
>> > at
>> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:
>> 308)
>> > at
>> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu
>> tureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>> > at
>> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu
>> tureTask.run(ScheduledThreadPoolExecutor.java:294)
>> > at
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>> > at
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>> > at java.lang.Thread.run(Thread.java:745)
>> > Caused by: org.apache.avro.SchemaParseException: Illegal character in:
>> > _Change-Sequence
>> >
>>
>
>


Re: [EXT] CDC like updates on Nifi

2017-12-06 Thread Alberto Bengoa
On Tue, Dec 5, 2017 at 11:55 PM, Peter Wicks (pwicks) <pwi...@micron.com>
wrote:

> Alberto,
>
Hello Peter,

Thanks for your answer.

>
>
> Since it sounds like you have control over the structure of the tables,
> this should be doable.
>
>
>
> If you have a changelog table for each table this will probably be easier,
> and in your changelog table you’ll need to make sure you have a good
> transaction timestamp column and a change type column (I/U/D). Then use
> QueryDatabaseTable to tail your change log table, one copy of
> QueryDatabaseTable for each change table.
>

Yes. This is the way that I'm trying to do. I have the TimeStamp and
Operation type columns as "metadata columns" and all the other "data
columns" of each table.

>
>
> Now your changes are in easy to ingest Avro files. For HIVE I’d probably
> use an external table with the Avro schema, this makes it easy to use
> PutHDFS to load the file and make it accessible from HIVE. I haven’t used
> Phoenix, sorry.
>

Hmm. Sounds interesting.

I was planning to use ORC because it's allow transactions (to make updates
/ deletes). Avro do not allow transactions, but changing data using HDFS
instead of HiveQL would be an option.

Would be possible to update fields of specific records using PutHDFS?

On my changelog table I do not have the entire row data when triggered by
an update. I just have values of changed fields (not changed fields have
 values on changelog tables).

_TimeStamp  _OperationColumn_A Column_B
Column_C
2017-12-01 14:35:56:204 - 02:00  3 7501   
2017-12-01 14:35:56:211 - 02:00  4 7501 1234  
2017-12-01 15:25:35:945 - 02:00  3 7503   
2017-12-01 15:25:35:945 - 02:00  4 7503 5678  

In the example above, we had two update operations (_Operation = 4).
Column_B was changed, Column_C not. Column_C would have any prior value.


> If you have a single change table for all tables, then you can still use
> the above patter, but you’ll need a middle step where you extract and
> rebuild the changes. Maybe if you store the changes in JSON you could
> extract them using one of the Record parsers and then rebuild the data row.
> Much harder though.
>

I have one changelog table for each table.

Considering that I would use HiveQL to update tables on the Datalake, could
I use a RouteOnContent processor to create SQL Queries according to the
_Operation type?

>
>

>


> Thanks,
>
>   Peter
>
>
>

Thanks you!

Alberto


> *From:* Alberto Bengoa [mailto:albe...@propus.com.br]
> *Sent:* Wednesday, December 06, 2017 06:24
> *To:* users@nifi.apache.org
> *Subject:* [EXT] CDC like updates on Nifi
>
>
>
> Hey folks,
>
>
>
> I read about Nifi CDC processor for MySQL and other CDC "solutions" with
> Nifi found on Google, like these:
>
>
>
> https://community.hortonworks.com/idea/53420/apache-nifi-pro
> cessor-to-address-cdc-use-cases-for.html
>
> https://community.hortonworks.com/questions/88686/change-dat
> a-capture-using-nifi-1.html
>
> https://community.hortonworks.com/articles/113941/change-dat
> a-capture-cdc-with-apache-nifi-version-1-1.html
>
>
>
> I'm trying a different approach to acquire fresh information from tables,
> using triggers on source database's tables to write changes to a "changelog
> table".
>
>
>
> This is done, but my questions are:
>
>
>
> Would Nifi be capable to read this tables, transform these data to
> generate a SQL equivalent query (insert/update/delete) to send to Hive
> and/or Phoenix with current available processors?
>
>
>
> Which would be the best / suggested flow?
>
>
>
> The objective is to keep tables on the Data Lake as up-to-date as possible
> for real time analyses.
>
>
>
> Cheers,
>
> Alberto
>


Re: Hyphenated Tables and Columns names

2017-12-06 Thread Alberto Bengoa
Matt,

Perfect! Enabled and working now.

Thank you!

Cheers,
Alberto



On Wed, Dec 6, 2017 at 3:54 PM, Matt Burgess <mattyb...@apache.org> wrote:

> Alberto,
>
> What version of NiFi are you using? As of version 1.1.0,
> QueryDatabaseTable has a "Normalize Table/Column Names" property that
> you can set to true, and it will replace all Avro-illegal characters
> with underscores.
>
> Regards,
> Matt
>
>
> On Wed, Dec 6, 2017 at 12:06 PM, Alberto Bengoa <albe...@propus.com.br>
> wrote:
> > Hey Folks,
> >
> > I'm facing an odd situation with Nifi and Tables / Columns that have
> hyphens
> > on names (traceback below).
> >
> > I found on Avro Spec [1] that hyphens are not allowed, which makes sense
> to
> > have this error.
> >
> > There is any way to deal with this situation on Nifi instead of changing
> > table/columns name or creating views to rename the hyphenated names?
> >
> > I'm getting this error on the first processor (QueryDatabaseTable) of my
> > flow.
> >
> > Thanks!
> >
> > Alberto Bengoa
> >
> > [1] - https://avro.apache.org/docs/1.7.7/spec.html#Names
> >
> >
> > 2017-12-06 14:37:25,809 ERROR [Timer-Driven Process Thread-2]
> > o.a.n.p.standard.QueryDatabaseTable
> > QueryDatabaseTable[id=9557387b-bbd6-1b2f-b68b-5a4458986794] Unable to
> > execute SQL select query SELECT "_Change-Sequence" FROM
> PUB.man_factory_cdc
> > due to org.apache.nifi.processor.exception.ProcessException: Error
> during
> > database query or conversion of records to Avro.: {}
> > org.apache.nifi.processor.exception.ProcessException: Error during
> database
> > query or conversion of records to Avro.
> > at
> > org.apache.nifi.processors.standard.QueryDatabaseTable.
> lambda$onTrigger$0(QueryDatabaseTable.java:289)
> > at
> > org.apache.nifi.controller.repository.StandardProcessSession.write(
> StandardProcessSession.java:2526)
> > at
> > org.apache.nifi.processors.standard.QueryDatabaseTable.
> onTrigger(QueryDatabaseTable.java:283)
> > at
> > org.apache.nifi.controller.StandardProcessorNode.onTrigger(
> StandardProcessorNode.java:1118)
> > at
> > org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(
> ContinuallyRunProcessorTask.java:147)
> > at
> > org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(
> ContinuallyRunProcessorTask.java:47)
> > at
> > org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(
> TimerDrivenSchedulingAgent.java:132)
> > at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> > at java.util.concurrent.FutureTask.runAndReset(
> FutureTask.java:308)
> > at
> > java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> > at
> > java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> > at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> > at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> > at java.lang.Thread.run(Thread.java:745)
> > Caused by: org.apache.avro.SchemaParseException: Illegal character in:
> > _Change-Sequence
> >
>


Hyphenated Tables and Columns names

2017-12-06 Thread Alberto Bengoa
Hey Folks,

I'm facing an odd situation with Nifi and Tables / Columns that have
hyphens on names (traceback below).

I found on Avro Spec [1] that hyphens are not allowed, which makes sense to
have this error.

There is any way to deal with this situation on Nifi instead of changing
table/columns name or creating views to rename the hyphenated names?

I'm getting this error on the first processor (QueryDatabaseTable) of my
flow.

Thanks!

Alberto Bengoa

[1] - https://avro.apache.org/docs/1.7.7/spec.html#Names


2017-12-06 14:37:25,809 ERROR [Timer-Driven Process Thread-2]
o.a.n.p.standard.QueryDatabaseTable
QueryDatabaseTable[id=9557387b-bbd6-1b2f-b68b-5a4458986794] Unable to
execute SQL select query SELECT "_Change-Sequence" FROM PUB.man_factory_cdc
due to org.apache.nifi.processor.exception.ProcessException: Error during
database query or conversion of records to Avro.: {}
org.apache.nifi.processor.exception.ProcessException: Error during database
query or conversion of records to Avro.
at
org.apache.nifi.processors.standard.QueryDatabaseTable.lambda$onTrigger$0(QueryDatabaseTable.java:289)
at
org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2526)
at
org.apache.nifi.processors.standard.QueryDatabaseTable.onTrigger(QueryDatabaseTable.java:283)
at
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1118)
at
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147)
at
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
at
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.avro.SchemaParseException: Illegal character in:
_Change-Sequence


CDC like updates on Nifi

2017-12-05 Thread Alberto Bengoa
Hey folks,

I read about Nifi CDC processor for MySQL and other CDC "solutions" with
Nifi found on Google, like these:

https://community.hortonworks.com/idea/53420/apache-nifi-
processor-to-address-cdc-use-cases-for.html
https://community.hortonworks.com/questions/88686/change-
data-capture-using-nifi-1.html
https://community.hortonworks.com/articles/113941/change-
data-capture-cdc-with-apache-nifi-version-1-1.html

I'm trying a different approach to acquire fresh information from tables,
using triggers on source database's tables to write changes to a "changelog
table".

This is done, but my questions are:

Would Nifi be capable to read this tables, transform these data to generate
a SQL equivalent query (insert/update/delete) to send to Hive and/or
Phoenix with current available processors?

Which would be the best / suggested flow?

The objective is to keep tables on the Data Lake as up-to-date as possible
for real time analyses.

Cheers,
Alberto