[GitHub] MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.
MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader. URL: https://github.com/apache/nifi/pull/3222#issuecomment-465244820 @markap14 @bbende would you rather I refactor the JsonTreeReader to use the JsonSurfer library or keep this one separate? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.
MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader. URL: https://github.com/apache/nifi/pull/3222#issuecomment-447923122 I don't think there is because the Jayway jsonpath library reads the entire InputStream into memory before doing anything with it. JsonSurfer uses an InputStream here and scans start to finish. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.
MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader. URL: https://github.com/apache/nifi/pull/3222#issuecomment-447902019 @markap14 I can rename it to `StreamingJsonPathReader`. Does that work? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.
MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader. URL: https://github.com/apache/nifi/pull/3222#issuecomment-447891669 I was vaguely tracking the SplitJson issue, so I knew about the issue before even starting on the work. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.
MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader. URL: https://github.com/apache/nifi/pull/3222#issuecomment-447889903 @ottobackwards we definitely should refactor the readers at some point to use common code. Problem I ran into is that the existing ones are using Jackson 1.9.X and JsonSurfer uses 2.9.X. So it was a lot of renaming in Jackson that made it impossible for me to do that while avoiding scope creep for the ticket. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.
MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader. URL: https://github.com/apache/nifi/pull/3222#issuecomment-447879070 > @MikeThomsen are you saying that your 25GB JSON file got an OOM (or some error) when using ConvertRecord with JsonTreeReader? Yeah, in fact I got it with this processor when I used `$.massive_list` and not `$.massive_list[*]` hence the check to make sure that the result evaluated to an "object" node :-D This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.
MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader. URL: https://github.com/apache/nifi/pull/3222#issuecomment-447875162 > I don't see how you avoid reading that all into memory because you have to return a Record instance from the reader with all that data in it. Our use case has someone throwing a 10GB file with that structure, so JsonTreeReader would have to have a schema reference to the massive array and load it all at once. What this does it uses a streaing JsonPath parser to go directly to the array and pull each element one by one and expose them to the reader. As I understand it, JsonTreeReader can only handle such a large file if it contains only an array of elements or the elements stacked on each other. If the NiFi user has to drill into the document at all, they're out of luck. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader.
MikeThomsen commented on issue #3222: NIFI-5900 Added StreamingJsonReader. URL: https://github.com/apache/nifi/pull/3222#issuecomment-447833164 @markap14 @mattyb149 @ijokarumawak @zenfenan @ottobackwards We got a file format that looks roughly like this: ``` { "something": "something", "massive_list": [ 9GB later... ] } ``` Tested this on a 25GB JSON file, and ConvertRecord was able to stream through it and make a 16GB Avro file. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services