Thanks Andy – This is a good suggestion, but in my case, the workflow must deal with ‘small’ and large JSON files to split and I don’t know in advance which ones will cause this problem. I will give it some thought though because it does sound like it is a workable way around the problem.
Olav Jordens Senior ETL Developer Two Degrees Mobile Limited =========================== (M) 022 620 2429 (P) 09 919 7000 www.2degreesmobile.co.nz<http://www.2degreesmobile.co.nz> [cid:[email protected]] Two Degrees Mobile Limited | 47-49 George Street | Newmarket | Auckland | New Zealand | PO Box 8355 | Symonds Street | Auckland 1150 | New Zealand | Fax +64 9 919 7001 [cid:[email protected]] [cid:[email protected]] [cid:[email protected]] [cid:[email protected]] ________________________________ Disclaimer The e-mail and any files transmitted with it are confidential and may contain privileged or copyright information. If you are not the intended recipient you must not copy, distribute, or use this e-mail or the information contained in it for any purpose other than to notify us of the error. If you have received this message in error, please notify the sender immediately, by email or phone (+64 9 919 7000) and delete this email from your system. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Two Degrees Mobile Limited. We do not guarantee that this material is free from viruses or any other defects although due care has been taken to minimize the risk From: Andy LoPresto [mailto:[email protected]] Sent: Wednesday, 30 November 2016 2:48 p.m. To: [email protected] Subject: Re: Hanging on SplitJSON Olav, Have you tried “stacking” these processors so the initial split breaks the complete input into smaller chunks and then each of those are split again? This is a common pattern we recommend with splitting or merging from/to large files. I don’t know what the overall structure of your original file is, but you should be able to use the SplitContent processor to split on boundaries (for example, if you know each distinct JSON block starts with the same key (I know order is not enforced, but you may have this scenario because all of the blocks are in the same file)), and take each flowfile containing 100-1000 JSON objects and then route them to the SplitJSON processor. Andy LoPresto [email protected]<mailto:[email protected]> [email protected]<mailto:[email protected]> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 On Nov 29, 2016, at 5:34 PM, Olav Jordens <[email protected]<mailto:[email protected]>> wrote: Joe, Thanks so much – certainly if it tries to batch this job, I will not have enough RAM on my small system, but if the processor would understand that and push out batches of splits at a time, then it would work for me. I’ll log the JIRA. Cheers, Olav From: Joe Witt [mailto:[email protected]] Sent: Wednesday, 30 November 2016 2:22 p.m. To: [email protected]<mailto:[email protected]> Subject: Re: Hanging on SplitJSON Olav We want you to be able to split your 36MB file into 400,000 things and not have to stress about this. Do you mind please filing a JIRA for this to be followed up on? We can definitely do better. Thanks Joe On Tue, Nov 29, 2016 at 8:09 PM, Olav Jordens <[email protected]<mailto:[email protected]>> wrote: Hi, My bad – the problem appears to be that the 36MB JSON file would be split into > 400 000 individual records, each carrying a substantial load of attributes. This must be causing an out of memory although I could not find such an error in the logs – perhaps even the logs were no longer being written to properly! Thanks, Olav From: Olav Jordens [mailto:[email protected]<mailto:[email protected]>] Sent: Wednesday, 30 November 2016 1:25 p.m. To: [email protected]<mailto:[email protected]> Subject: Hanging on SplitJSON Hi, I have a JSON file of about 36MB which is passed to a SplitJSON processor. This processor runs for a while and then my UI hangs. In the app-log the following ERRORs pop up: 2016-11-30 13:03:30,999 ERROR [Site-to-Site Worker Thread-393] o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate with remote instance Peer[url=nifi://localhost:42758] due to java.net<http://java.net/>.SocketTimeoutException: Timed out reading from socket; closing connection However, I suspect that this has nothing to do with Site-to-Site (from my single nifi instance to itself) as there are no ERRORs prior to my flowfile hitting the SplitJSON processor, and every time I re-run, it is at this point that it hangs. My java Xmx=1024m and Xms=1024m. When I do a nifi dump: bin/nifi.sh dump nifi.sh: JAVA_HOME not set; results may vary Java home: NiFi home: /app/HDF-2.0.1.0/nifi Bootstrap Config File: /app/HDF-2.0.1.0/nifi/conf/bootstrap.conf Exception in thread "main" java.net<http://java.net/>.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.readLine(BufferedReader.java:324) at java.io.BufferedReader.readLine(BufferedReader.java:389) at org.apache.nifi.bootstrap.RunNiFi.dump(RunNiFi.java:695) at org.apache.nifi.bootstrap.RunNiFi.main(RunNiFi.java:225) This again points at a socket issue, but my main confusion is why this error occurs every time the flowfile hits the SplitJSON processor? The status indicates that it is hanging and not responding to ping requests: service nifi status nifi.sh: JAVA_HOME not set; results may vary Java home: NiFi home: /app/HDF-2.0.1.0/nifi Bootstrap Config File: /app/HDF-2.0.1.0/nifi/conf/bootstrap.conf 2016-11-30 13:23:31,786 INFO [main] org.apache.nifi.bootstrap.Command Apache NiFi is running at PID 23080 but is not responding to ping requests Any ideas? Thanks, Olav Olav Jordens Senior ETL Developer Two Degrees Mobile Limited =========================== (M) 022 620 2429 (P) 09 919 7000 www.2degreesmobile.co.nz<http://www.2degreesmobile.co.nz/> <image001.jpg> Two Degrees Mobile Limited | 47-49 George Street | Newmarket | Auckland | New Zealand | PO Box 8355 | Symonds Street | Auckland 1150 | New Zealand | Fax +64 9 919 7001<tel:+64%209-919%207001> <image002.png> <image003.png> <image004.png> <image005.png> ________________________________ Disclaimer The e-mail and any files transmitted with it are confidential and may contain privileged or copyright information. If you are not the intended recipient you must not copy, distribute, or use this e-mail or the information contained in it for any purpose other than to notify us of the error. If you have received this message in error, please notify the sender immediately, by email or phone (+64 9 919 7000<tel:+64%209-919%207000>) and delete this email from your system. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Two Degrees Mobile Limited. We do not guarantee that this material is free from viruses or any other defects although due care has been taken to minimize the risk
