Yes, this definitely helps. And my current task at hand made me rethink the 
significance of data modeling (or data lake, etc.)  I'll think more about the 
problem, and let's see if I can come up with some innovative technical or 
commercial ideas on that.

Thanks for your help,
Hao Wang
________________________________
From: Adam Taft <[email protected]>
Sent: Tuesday, November 30, 2021 12:51 AM
To: [email protected] <[email protected]>
Subject: Re: How to fix this problem on Windows ?

NiFi doesn't solve your algorithmic or business challenge. Instead, what NiFi 
does is provide all the plumbing and connections to your data flow.

Typically, most developers will start by writing a manual ingress script to 
feed data into their algorithm. And then they will write the egress script to 
send the data to where it needs to go. For example, maybe they write a script 
that uses curl to transfer data into a local directory. From there, they launch 
a program (or use cron) to read the directory, transform it, and then write it 
to an output directory. And then maybe they have a third process that comes 
along and moves data from the outbound directory to a database.

While yes, NiFi has some ETL related transformation processors, it's real sweet 
spot is the NiFi framework itself that allows you to build your own algorithms 
and then use standard tooling to get the data into and out of your system. NiFi 
ingress/egress/endpoint processors include a multitude of choices for http, 
file-based, database, message queues, etc. Everything you can imagine. It also 
has some basic tooling for relatively simple transformations, but often you 
need to write your own Nifi component to transform your business document.

In your case, for example, you could use NiFi to read in your data and then 
pass it to your python script (using ExecuteProcess or ExecuteScript) that 
fixes your JSON. Then you'd use NiFi to send it back out to the final 
destination like a message queue or database. The algorithm is "yours" and you 
plug it into the framework for execution.

Hope that helps.

On Mon, Nov 29, 2021 at 11:14 PM Hao Wang 
<[email protected]<mailto:[email protected]>> wrote:
I gave NiFi a try because I met the following ETL problem: I need to transform 
a dataset into a algorithm-friendly version. The dataset contains columns of 
nested json where the inner json is wrong in format (it doesn't have quotes on 
keys). In addition, some of the columns contain json formats and strings and 
floats all together. I felt it would be impossible to transform the dataset 
into one-hot format in which json keys and column string values become key of 
the data set so that the dataset values contain only float numbers.

I did some vanilla python coding to solve this issue, but I wish there could be 
a convenient ETL tool to solve this problem much easier. Is NiFi suitable for 
this task ?
________________________________
From: Adam Taft <[email protected]<mailto:[email protected]>>
Sent: Tuesday, November 30, 2021 12:09 AM
To: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: Re: How to fix this problem on Windows ?

Try Java 11. It is the latest version of Java to work with NiFi. Java 11 has 
those types of package access restrictions as warnings, not as errors.

On Mon, Nov 29, 2021 at 5:46 PM Hao Wang 
<[email protected]<mailto:[email protected]>> wrote:
Adam,

Yes, I'm using Java 17. Looks like I need to downgrade the Java to a previous 
version. Do you have any suggestions ?

Bravo!
Hao Wang
________________________________
From: Adam Taft <[email protected]<mailto:[email protected]>>
Sent: Monday, November 29, 2021 1:42 PM
To: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: Re: How to fix this problem on Windows ?

Hao Wang,

Are you using Java 17 by chance? This error looks suspiciously like you're 
running with Java 17, which NiFi doesn't (yet) support.

Adam

On Sat, Nov 27, 2021 at 9:36 PM Hao Wang 
<[email protected]<mailto:[email protected]>> wrote:
Dear devs :

I came across the following error (as shown in nifi-app.log) during my usage of 
NiFi on Windows :

2021-11-28 12:20:35,059 ERROR [main] org.apache.nifi.NiFi Failure to launch 
NiFi due to org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] 
Unable to make protected final java.lang.Class 
java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
 throws java.lang.ClassFormatError accessible: module java.base does not "opens 
java.lang" to unnamed module @19e86461
org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] Unable to make 
protected final java.lang.Class 
java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
 throws java.lang.ClassFormatError accessible: module java.base does not "opens 
java.lang" to unnamed module @19e86461
at 
org.xerial.snappy.SnappyLoader.injectSnappyNativeLoader(SnappyLoader.java:297)
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:227)
at org.xerial.snappy.Snappy.<clinit>(Snappy.java:48)
at 
org.apache.nifi.processors.hive.PutHiveStreaming.<clinit>(PutHiveStreaming.java:158)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:467)
at 
org.apache.nifi.nar.StandardExtensionDiscoveringManager.getClass(StandardExtensionDiscoveringManager.java:330)
at 
org.apache.nifi.documentation.DocGenerator.documentConfigurableComponent(DocGenerator.java:100)
at org.apache.nifi.documentation.DocGenerator.generate(DocGenerator.java:65)
at org.apache.nifi.web.server.JettyServer.start(JettyServer.java:1139)
at org.apache.nifi.NiFi.<init>(NiFi.java:170)
at org.apache.nifi.NiFi.<init>(NiFi.java:82)
at org.apache.nifi.NiFi.main(NiFi.java:331)
2021-11-28 12:20:35,060 INFO [Thread-0] org.apache.nifi.NiFi Initiating 
shutdown of Jetty web server...
2021-11-28 12:20:35,060 INFO [Thread-0] org.apache.nifi.NiFi Jetty web server 
shutdown completed (nicely or otherwise).

Please let me know how to fix the error.

Bravo !

Hao Wang

Reply via email to