[GitHub] MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.

2019-02-19 Thread GitBox
MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.
URL: https://github.com/apache/nifi/pull/3222#issuecomment-465244820
 
 
   @markap14 @bbende would you rather I refactor the JsonTreeReader to use the 
JsonSurfer library or keep this one separate?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.

2018-12-17 Thread GitBox
MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.
URL: https://github.com/apache/nifi/pull/3222#issuecomment-447923122
 
 
   I don't think there is because the Jayway jsonpath library reads the entire 
InputStream into memory before doing anything with it. JsonSurfer uses an 
InputStream here and scans start to finish.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.

2018-12-17 Thread GitBox
MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.
URL: https://github.com/apache/nifi/pull/3222#issuecomment-447902019
 
 
   @markap14 I can rename it to `StreamingJsonPathReader`. Does that work?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.

2018-12-17 Thread GitBox
MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.
URL: https://github.com/apache/nifi/pull/3222#issuecomment-447891669
 
 
   I was vaguely tracking the SplitJson issue, so I knew about the issue before 
even starting on the work.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.

2018-12-17 Thread GitBox
MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.
URL: https://github.com/apache/nifi/pull/3222#issuecomment-447889903
 
 
   @ottobackwards we definitely should refactor the readers at some point to 
use common code. Problem I ran into is that the existing ones are using Jackson 
1.9.X and JsonSurfer uses 2.9.X. So it was a lot of renaming in Jackson that 
made it impossible for me to do that while avoiding scope creep for the ticket.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.

2018-12-17 Thread GitBox
MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.
URL: https://github.com/apache/nifi/pull/3222#issuecomment-447879070
 
 
   > @MikeThomsen are you saying that your 25GB JSON file got an OOM (or some 
error) when using ConvertRecord with JsonTreeReader?
   
   Yeah, in fact I got it with this processor when I used `$.massive_list` and 
not `$.massive_list[*]` hence the check to make sure that the result evaluated 
to an "object" node :-D


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.

2018-12-17 Thread GitBox
MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.
URL: https://github.com/apache/nifi/pull/3222#issuecomment-447875162
 
 
   > I don't see how you avoid reading that all into memory because you have to 
return a Record instance from the reader with all that data in it.
   
   Our use case has someone throwing a 10GB file with that structure, so 
JsonTreeReader would have to have a schema reference to the massive array and 
load it all at once. What this does it uses a streaing JsonPath parser to go 
directly to the array and pull each element one by one and expose them to the 
reader.
   
   As I understand it, JsonTreeReader can only handle such a large file if it 
contains only an array of elements or the elements stacked on each other. If 
the NiFi user has to drill into the document at all, they're out of luck.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.

2018-12-17 Thread GitBox
MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.
URL: https://github.com/apache/nifi/pull/3222#issuecomment-447833164
 
 
   @markap14 @mattyb149 @ijokarumawak @zenfenan @ottobackwards 
   
   We got a file format that looks roughly like this:
   
   ```
   {
  "something": "something",
  "massive_list": [ 9GB later... ]
   }
   ```
   
   Tested this on a 25GB JSON file, and ConvertRecord was able to stream 
through it and make a 16GB Avro file.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services