Re: Re: Minifi could not start CaptureChangeMySQL processor

2019-07-15 Thread wangl...@geekplus.com.cn

I  finally got it run by adding some steps.


Add  some jars to the  lib directory:
  mysql-binlog-connector-java-0.11.0.jar
  mysql-connector-java-5.1.47.jar
  nifi-cdc-api-1.9.2.jar
  nifi-cdc-mysql-processors-1.9.2.jar
  nifi-distributed-cache-client-service-1.9.2.jar
  nifi-distributed-cache-client-service-api-1.9.2.jar
  nifi-distributed-cache-protocol-1.9.2.jar
  nifi-distributed-cache-server-1.9.2.jar
  nifi-ssl-context-service-api-1.9.2.jar

Manually add Map Cache Server to conf.yml if you use MapCacheClientService
- id: b1b0141c-016b-1000-5efe-7e9494b76948
  name: Map Cacher Server
  type: org.apache.nifi.distributed.cache.server.map.DistributedMapCacheServer
  Properties:
Port: 4557
Maximum Cache Entries: 1
Eviction Strategy: "Least Frequently Used"




wangl...@geekplus.com.cn
 
From: Aldrin Piri
Date: 2019-07-12 20:37
To: users
Subject: Re: Minifi could not start CaptureChangeMySQL processor
Hi there,

You will need to provide the CDC NAR as part of your MiNiFi installation as it 
is not bundled with the default distribution.

With all the great work that went into supporting extensions into registry, I 
hope to have some additional tooling to make this process much easier.

On Thu, Jul 11, 2019 at 11:52 PM wangl...@geekplus.com.cn 
 wrote:

Build a simpe data flow using nifi,  create template as xml,  transform the xml 
to config.yml using minifi-toolkit-0.5.0/bin/config.sh
But there's error after start minifi

 ERROR [main] o.apache.nifi.controller.FlowController Could not create 
Processor of type org.apache.nifi.cdc.mys
ql.processors.CaptureChangeMySQL for ID b6936ab5-07c5-3670--; 
creating "Ghost" implementation
org.apache.nifi.controller.exception.ProcessorInstantiationException: Unable to 
find bundle for coordinate default:unknown:unversioned
at 
org.apache.nifi.controller.FlowController.instantiateProcessor(FlowController.java:1271)
at 
org.apache.nifi.controller.FlowController.createProcessor(FlowController.java:1188)
at 
org.apache.nifi.controller.FlowController.createProcessor(FlowController.java:1157)
at 
org.apache.nifi.controller.StandardFlowSynchronizer.addProcessGroup(StandardFlowSynchronizer.java:1214)
at 
org.apache.nifi.controller.StandardFlowSynchronizer.sync(StandardFlowSynchronizer.java:359)
at 
org.apache.nifi.controller.FlowController.synchronize(FlowController.java:1697)
at 
org.apache.nifi.persistence.StandardXMLFlowConfigurationDAO.load(StandardXMLFlowConfigurationDAO.java:84)
at 
org.apache.nifi.controller.StandardFlowService.loadFromBytes(StandardFlowService.java:723)
at 
org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:534)
at org.apache.nifi.minifi.MiNiFiServer.start(MiNiFiServer.java:122)
at org.apache.nifi.minifi.MiNiFi.(MiNiFi.java:148)
at org.apache.nifi.minifi.MiNiFi.main(MiNiFi.java:247)

o.apache.nifi.controller.FlowController Unable to start 
GhostProcessor[id=b6936ab5-07c5-3670--] due to 
java.lang.IllegalStateException: Processor CaptureChangeMySQLLocal is not in a 
valid state due to ['Missing Processor' validated against 'Any Property' is 
invalid because Processor is of type 
org.apache.nifi.cdc.mysql.processors.CaptureChangeMySQL, but this is not a 
valid Processor type]


I just downloade the minifi-0.5.0-bin.tar.gz, tar -xzvf and minifs.sh start 
I think i must  have missed something. Some jar that needed?






wangl...@geekplus.com.cn


NiFi PutBigQueryBatch and AVRO logical types

2019-07-15 Thread John W. Phillips


I'm using NiFi 1.9.2 QueryDatabaseTable->PutBigQueryBatch to attempt to
replicate a MySQL table to BigQuery.  In QueryDatabaseTable I've configured
'Use Avro Logical Types=true', so I have a MySQL DATETIME which is encoded
in Avro as a Long with logical type timestamp-millis.  The PutBigQueryBatch
does not support the BigQuery use_avro_logical_types option, so I have an
explicit schema where I cast the Long to a BigQuery TIMESTAMP.

The issue, though, is the Long is being interpreted by BQ as a Timestamp
with microseconds, and so the resulting Timestamp is off by x1000.  Does
anyone have a suggestion for a workaround?  I've tried an UpdateRecord
processor with :multiply(1000), but it has a type conversion exception when
writing the new Avro file.



--
Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/


Re: Site to Site Compression

2019-07-15 Thread Joe Witt
Noe

Just activate compression on the s2s port and the client will honor it if
able.  I dont believe the protocol has changed in quite a while so you
should be fine with the versions noted.

Thanks
Joe

On Mon, Jul 15, 2019 at 9:08 AM Noe Detore  wrote:

> Hello,
>
> What is the best way to configure compression using site to site when
> sending data from one data center to another? I notice there is the ability
> to configure compression in a queue. What considerations need to be taken
> into account for different versions? DC1 Nifi 1.5 and DC2 Nifi 1.9.
>
> Thank you
> Noe
>


Site to Site Compression

2019-07-15 Thread Noe Detore
Hello,

What is the best way to configure compression using site to site when
sending data from one data center to another? I notice there is the ability
to configure compression in a queue. What considerations need to be taken
into account for different versions? DC1 Nifi 1.5 and DC2 Nifi 1.9.

Thank you
Noe


RE: Kafka to parquet to s3

2019-07-15 Thread Williams, Jim
Dweep,

 

The data I am moving into S3 is already some fairly large sets of files, since 
they are a bulk export from a SaaS application.  Thus, the number of files 
which were being PUT to S3 was not a huge consideration.  However, since the 
Parquet files are to be consumed by Redshift Spectrum I had an interest in 
consolidating flow files containing like objects into a single flow file prior 
to Parquet conversion.  I used the MergeRecord processor [1] to do this.

 

So, to amplify on the flow, it really looks more like this:

 

(Get stuff in JSON format) --> ConvertRecord --> MergeRecord --> 
ConvertAvroToParquet --> PutS3

 

 

This is not really a “real-time streaming flow” it’s more batch-oriented.  
There is a delay in the flow (which is acceptable to us) for the MergeRecord 
processor to collect and merge possibly several flow files into a bigger flow 
file.

 

 

[1] - 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.9.2/org.apache.nifi.processors.standard.MergeRecord/index.html

 

 

 

Warm regards,

 


  

Jim Williams | Principal Database Developer


O: +1 713.341.7812 | C: +1 919.523.8767 | jwilli...@alertlogic.com |  
 alertlogic.com    
 


 



 

From: Dweep Sharma  
Sent: Sunday, July 14, 2019 1:07 AM
To: users@nifi.apache.org
Subject: Re: Kafka to parquet to s3

 

Thanks Jim for the insights on advantages, this worked for me as well. 

 

Any thoughts on partitioning and filesize so the S3 PUT costs are not too high?

 

I do not see options on the convertavrotoparquet for this

 

-Dweep

 

On Mon, Jul 8, 2019 at 6:09 PM Williams, Jim mailto:jwilli...@alertlogic.com> > wrote:

Dweep,

 

I have been working on a project where Parquet files are being written to S3.  
I’ve had the liberty to use the most up-to-date version of Nifi, so I have 
implemented this on 1.9.2.

 

The approach I have taken is something like this:

 

(Get stuff in JSON format) --> ConvertRecord --> ConvertAvroToParquet --> PutS3

 

The ConvertRecord [1] processor changes the flow files from JSON to Avro.  
Although it is possible to use schema inference with this processor, it is 
something we have not leveraged yet.  The ConvertAvroToParquet [2] converts the 
flow file, but does not write it out to a local or HDFS file system like the 
PutParquet [3] processor would.

 

Implementing the flow in this way gives a couple advantages:

 

1.  We do not need to use the PutParquet processor

a.  Extra configuration on cluster nodes is avoided for writing directly to 
S3 with this processor
b.  Writing to a local or HDFS filesystem and then copying to S3 is avoided

2.  We can use the native authentication methods which come with the S3 
processor

a.  Roles associated with EC2 instances are leveraged, which makes cluster 
deployment much simpler

 

We have been happy using this pattern for the past couple months.  I am 
watching for progress on Nifi-6089 [4] for a Parquet Record Reader/Writer with 
interest.

 

 

 

[1] - 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.9.2/org.apache.nifi.processors.standard.ConvertRecord/index.html
 

 

[2] - 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-parquet-nar/1.9.2/org.apache.nifi.processors.parquet.ConvertAvroToParquet/index.html
 

 

[3] - 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-parquet-nar/1.9.2/org.apache.nifi.processors.parquet.PutParquet/index.html
 

 

[4] - https://issues.apache.org/jira/browse/NIFI-6089