Thanks Andy – This is a good suggestion, but in my case, the workflow must deal 
with ‘small’ and large JSON files to split and I don’t know in advance which 
ones will cause this problem. I will give it some thought though because it 
does sound like it is a workable way around the problem.



Olav Jordens
Senior ETL Developer
Two Degrees Mobile Limited
===========================
(M) 022 620 2429
(P) 09 919 7000
www.2degreesmobile.co.nz<http://www.2degreesmobile.co.nz>
[cid:[email protected]]
Two Degrees Mobile Limited | 47-49 George Street | Newmarket | Auckland | New 
Zealand |
PO Box 8355 | Symonds Street | Auckland 1150 | New Zealand | Fax +64 9 919 7001

[cid:[email protected]] [cid:[email protected]] 
 [cid:[email protected]]  
[cid:[email protected]]


________________________________

Disclaimer
The e-mail and any files transmitted with it are confidential and may contain 
privileged or copyright information. If you are not the intended recipient you 
must not copy, distribute, or use this e-mail or the information contained in 
it for any purpose other than to notify us of the error. If you have received 
this message in error, please notify the sender immediately, by email or phone 
(+64 9 919 7000) and delete this email from your system. Any views expressed in 
this message are those of the individual sender, except where the sender 
specifically states them to be the views of Two Degrees Mobile Limited. We do 
not guarantee that this material is free from viruses or any other defects 
although due care has been taken to minimize the risk


From: Andy LoPresto [mailto:[email protected]]
Sent: Wednesday, 30 November 2016 2:48 p.m.
To: [email protected]
Subject: Re: Hanging on SplitJSON

Olav,

Have you tried “stacking” these processors so the initial split breaks the 
complete input into smaller chunks and then each of those are split again? This 
is a common pattern we recommend with splitting or merging from/to large files. 
I don’t know what the overall structure of your original file is, but you 
should be able to use the SplitContent processor to split on boundaries (for 
example, if you know each distinct JSON block starts with the same key (I know 
order is not enforced, but you may have this scenario because all of the blocks 
are in the same file)), and take each flowfile containing 100-1000 JSON objects 
and then route them to the SplitJSON processor.

Andy LoPresto
[email protected]<mailto:[email protected]>
[email protected]<mailto:[email protected]>
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Nov 29, 2016, at 5:34 PM, Olav Jordens 
<[email protected]<mailto:[email protected]>> 
wrote:

Joe,

Thanks so much – certainly if it tries to batch this job, I will not have 
enough RAM on my small system, but if the processor would understand that and 
push out batches of splits at a time, then it would work for me. I’ll log the 
JIRA.
Cheers,
Olav


From: Joe Witt [mailto:[email protected]]
Sent: Wednesday, 30 November 2016 2:22 p.m.
To: [email protected]<mailto:[email protected]>
Subject: Re: Hanging on SplitJSON

Olav

We want you to be able to split your 36MB file into 400,000 things and not have 
to stress about this.  Do you mind please filing a JIRA for this to be followed 
up on?  We can definitely do better.

Thanks
Joe

On Tue, Nov 29, 2016 at 8:09 PM, Olav Jordens 
<[email protected]<mailto:[email protected]>> 
wrote:
Hi,

My bad – the problem appears to be that the 36MB JSON file would be split into 
> 400 000 individual records, each carrying a substantial load of attributes. 
This must be causing an out of memory although I could not find such an error 
in the logs – perhaps even the logs were no longer being written to properly!

Thanks,
Olav


From: Olav Jordens 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, 30 November 2016 1:25 p.m.
To: [email protected]<mailto:[email protected]>
Subject: Hanging on SplitJSON

Hi,

I have a JSON file of about 36MB which is passed to a SplitJSON processor. This 
processor runs for a while and then my UI hangs. In the app-log the following 
ERRORs pop up:

2016-11-30 13:03:30,999 ERROR [Site-to-Site Worker Thread-393] 
o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate with remote 
instance Peer[url=nifi://localhost:42758] due to 
java.net<http://java.net/>.SocketTimeoutException: Timed out reading from 
socket; closing connection

However, I suspect that this has nothing to do with Site-to-Site (from my 
single nifi instance to itself) as there are no ERRORs prior to my flowfile 
hitting the SplitJSON processor, and every time I re-run, it is at this point 
that it hangs. My java Xmx=1024m and Xms=1024m. When I do a nifi dump:

bin/nifi.sh dump
nifi.sh: JAVA_HOME not set; results may vary

Java home:
NiFi home: /app/HDF-2.0.1.0/nifi

Bootstrap Config File: /app/HDF-2.0.1.0/nifi/conf/bootstrap.conf

Exception in thread "main" java.net<http://java.net/>.SocketTimeoutException: 
Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:170)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        at java.io.InputStreamReader.read(InputStreamReader.java:184)
        at java.io.BufferedReader.fill(BufferedReader.java:161)
        at java.io.BufferedReader.readLine(BufferedReader.java:324)
        at java.io.BufferedReader.readLine(BufferedReader.java:389)
        at org.apache.nifi.bootstrap.RunNiFi.dump(RunNiFi.java:695)
        at org.apache.nifi.bootstrap.RunNiFi.main(RunNiFi.java:225)

This again points at a socket issue, but my main confusion is why this error 
occurs every time the flowfile hits the SplitJSON processor?

The status indicates that it is hanging and not responding to ping requests:

service nifi status
nifi.sh: JAVA_HOME not set; results may vary

Java home:
NiFi home: /app/HDF-2.0.1.0/nifi

Bootstrap Config File: /app/HDF-2.0.1.0/nifi/conf/bootstrap.conf

2016-11-30 13:23:31,786 INFO [main] org.apache.nifi.bootstrap.Command Apache 
NiFi is running at PID 23080 but is not responding to ping requests

Any ideas?

Thanks,
Olav


Olav Jordens
Senior ETL Developer
Two Degrees Mobile Limited
===========================
(M) 022 620 2429
(P) 09 919 7000
www.2degreesmobile.co.nz<http://www.2degreesmobile.co.nz/>
<image001.jpg>
Two Degrees Mobile Limited | 47-49 George Street | Newmarket | Auckland | New 
Zealand |
PO Box 8355 | Symonds Street | Auckland 1150 | New Zealand | Fax +64 9 919 
7001<tel:+64%209-919%207001>

<image002.png> <image003.png> <image004.png> <image005.png>
________________________________
Disclaimer
The e-mail and any files transmitted with it are confidential and may contain 
privileged or copyright information. If you are not the intended recipient you 
must not copy, distribute, or use this e-mail or the information contained in 
it for any purpose other than to notify us of the error. If you have received 
this message in error, please notify the sender immediately, by email or phone 
(+64 9 919 7000<tel:+64%209-919%207000>) and delete this email from your 
system. Any views expressed in this message are those of the individual sender, 
except where the sender specifically states them to be the views of Two Degrees 
Mobile Limited. We do not guarantee that this material is free from viruses or 
any other defects although due care has been taken to minimize the risk




Reply via email to