GetMongo is an ingest only processor, so cannot accept and input flow file. It also only has a success relation.
A solution to this would be to use NiFi’s own deduplication. One Flow would seed values in the distributed cache by using GetMongo to pull the ids and PutDistributedMapCache to store them in NiFi’s cache. The main ingest flow would then use UpdateAttributes to create a hash.value that matched the values inserted to the cache -> DetectDuplicates -> flow to PutMongo (use the upset property) -success-> PutSolrContentStream Simon On Apr 28, 2016, at 5:19 PM, Pierre Villard <[email protected]<mailto:[email protected]>> wrote: Hi Susheel, 1. HandleHttpRequest 2. RouteOnAttribute + HandleHttpResponse in case of errors detected in headers 3. Depending of what you want, there are a lot of options to handle JSON data (EvaluateJsonPath will probably useful) 4. GetMongo (I think it will route on success in case there is an entry, and to failure if there is no record, but this has to be checked, otherwise an addional processor will do the job to check the result of the request). 5. & 6. PutMongo + PutFile (if local folder) + PutSolr (if you want to do Solr by yourself). Depending of the details, this could be slightly different, but I think it gives a good idea of the minimal set of processor you would need. HTH, Pierre 2016-04-28 16:54 GMT+02:00 Susheel Kumar <[email protected]<mailto:[email protected]>>: Hi, After attending meetup in NYC, I am realizing NiFi can be used for the data flow use case I have. Can someone please share the steps/processors necessary for below use case. 1. Receive JSON on a HTTP REST end point 2. Parse Http Header and do validation. Return Error code & messages as JSON to the response in case of validation failures 3. Parse request JSON, perform various validations (missing data in fields), massages some data, add some data 4. Check if the request JSON unique ID is present in MongoDB and compare timestamp to validate if this is an update request or a new request 5. If new request, an entry is made in mongo and then JSON files are written to output folder for another process to pick up and submit to Solr. 6. If update request, mongo record is updated and JSON files are written to output folder I understand that something like HandleHttpRequest Processor can be used for receiving http request and then use PutSolrContentStream for writing to Solr but not clear on what processors will be used for validation etc. steps 2 thru 5 above. Appreciate your input. Thanks, Susheel
