NiFi doesn't solve your algorithmic or business challenge. Instead, what
NiFi does is provide all the plumbing and connections to your data flow.

Typically, most developers will start by writing a manual ingress script to
feed data into their algorithm. And then they will write the egress script
to send the data to where it needs to go. For example, maybe they write a
script that uses curl to transfer data into a local directory. From there,
they launch a program (or use cron) to read the directory, transform it,
and then write it to an output directory. And then maybe they have a third
process that comes along and moves data from the outbound directory to a
database.

While yes, NiFi has some ETL related transformation processors, it's real
sweet spot is the NiFi framework itself that allows you to build your own
algorithms and then use standard tooling to get the data into and out of
your system. NiFi ingress/egress/endpoint processors include a multitude of
choices for http, file-based, database, message queues, etc. Everything you
can imagine. It also has some basic tooling for relatively simple
transformations, but often you need to write your own Nifi component to
transform your business document.

In your case, for example, you could use NiFi to read in your data and then
pass it to your python script (using ExecuteProcess or ExecuteScript) that
fixes your JSON. Then you'd use NiFi to send it back out to the final
destination like a message queue or database. The algorithm is "yours" and
you plug it into the framework for execution.

Hope that helps.

On Mon, Nov 29, 2021 at 11:14 PM Hao Wang <[email protected]> wrote:

> I gave NiFi a try because I met the following ETL problem: I need to
> transform a dataset into a algorithm-friendly version. The dataset contains
> columns of nested json where the inner json is wrong in format (it doesn't
> have quotes on keys). In addition, some of the columns contain json formats
> and strings and floats all together. I felt it would be impossible to
> transform the dataset into one-hot format in which json keys and column
> string values become key of the data set so that the dataset values contain
> only float numbers.
>
> I did some vanilla python coding to solve this issue, but I wish there
> could be a convenient ETL tool to solve this problem much easier. Is NiFi
> suitable for this task ?
> ------------------------------
> *From:* Adam Taft <[email protected]>
> *Sent:* Tuesday, November 30, 2021 12:09 AM
> *To:* [email protected] <[email protected]>
> *Subject:* Re: How to fix this problem on Windows ?
>
> Try Java 11. It is the latest version of Java to work with NiFi. Java 11
> has those types of package access restrictions as warnings, not as errors.
>
> On Mon, Nov 29, 2021 at 5:46 PM Hao Wang <[email protected]> wrote:
>
> Adam,
>
> Yes, I'm using Java 17. Looks like I need to downgrade the Java to a
> previous version. Do you have any suggestions ?
>
> Bravo!
> Hao Wang
> ------------------------------
> *From:* Adam Taft <[email protected]>
> *Sent:* Monday, November 29, 2021 1:42 PM
> *To:* [email protected] <[email protected]>
> *Subject:* Re: How to fix this problem on Windows ?
>
> Hao Wang,
>
> Are you using Java 17 by chance? This error looks suspiciously like you're
> running with Java 17, which NiFi doesn't (yet) support.
>
> Adam
>
> On Sat, Nov 27, 2021 at 9:36 PM Hao Wang <[email protected]> wrote:
>
> Dear devs :
>
> I came across the following error (as shown in nifi-app.log) during my
> usage of NiFi on Windows :
>
> 2021-11-28 12:20:35,059 ERROR [main] org.apache.nifi.NiFi Failure to
> launch NiFi due to org.xerial.snappy.SnappyError:
> [FAILED_TO_LOAD_NATIVE_LIBRARY] Unable to make protected final
> java.lang.Class
> java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
> throws java.lang.ClassFormatError accessible: module java.base does not
> "opens java.lang" to unnamed module @19e86461
> org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] Unable to
> make protected final java.lang.Class
> java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
> throws java.lang.ClassFormatError accessible: module java.base does not
> "opens java.lang" to unnamed module @19e86461
> at
> org.xerial.snappy.SnappyLoader.injectSnappyNativeLoader(SnappyLoader.java:297)
> at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:227)
> at org.xerial.snappy.Snappy.<clinit>(Snappy.java:48)
> at
> org.apache.nifi.processors.hive.PutHiveStreaming.<clinit>(PutHiveStreaming.java:158)
> at java.base/java.lang.Class.forName0(Native Method)
> at java.base/java.lang.Class.forName(Class.java:467)
> at
> org.apache.nifi.nar.StandardExtensionDiscoveringManager.getClass(StandardExtensionDiscoveringManager.java:330)
> at
> org.apache.nifi.documentation.DocGenerator.documentConfigurableComponent(DocGenerator.java:100)
> at
> org.apache.nifi.documentation.DocGenerator.generate(DocGenerator.java:65)
> at org.apache.nifi.web.server.JettyServer.start(JettyServer.java:1139)
> at org.apache.nifi.NiFi.<init>(NiFi.java:170)
> at org.apache.nifi.NiFi.<init>(NiFi.java:82)
> at org.apache.nifi.NiFi.main(NiFi.java:331)
> 2021-11-28 12:20:35,060 INFO [Thread-0] org.apache.nifi.NiFi Initiating
> shutdown of Jetty web server...
> 2021-11-28 12:20:35,060 INFO [Thread-0] org.apache.nifi.NiFi Jetty web
> server shutdown completed (nicely or otherwise).
>
> Please let me know how to fix the error.
>
> Bravo !
>
> Hao Wang
>
>

Reply via email to