GetMongo is an ingest only processor, so cannot accept and input flow file. It 
also only has a success relation.

A solution to this would be to use NiFi’s own deduplication.

One Flow would seed values in the distributed cache by using GetMongo to pull 
the ids and PutDistributedMapCache to store them in NiFi’s cache.

The main ingest flow would then use UpdateAttributes to create a hash.value 
that matched the values inserted to the cache -> DetectDuplicates -> flow to 
PutMongo (use the upset property) -success-> PutSolrContentStream

Simon

On Apr 28, 2016, at 5:19 PM, Pierre Villard 
<[email protected]<mailto:[email protected]>> wrote:

Hi Susheel,

1. HandleHttpRequest
2. RouteOnAttribute + HandleHttpResponse in case of errors detected in headers
3. Depending of what you want, there are a lot of options to handle JSON data 
(EvaluateJsonPath will probably useful)
4. GetMongo (I think it will route on success in case there is an entry, and to 
failure if there is no record, but this has to be checked, otherwise an 
addional processor will do the job to check the result of the request).
5. & 6. PutMongo + PutFile (if local folder) + PutSolr (if you want to do Solr 
by yourself).

Depending of the details, this could be slightly different, but I think it 
gives a good idea of the minimal set of processor you would need.

HTH,
Pierre


2016-04-28 16:54 GMT+02:00 Susheel Kumar 
<[email protected]<mailto:[email protected]>>:
Hi,

After attending meetup in NYC, I am realizing NiFi can be used for the data 
flow use case I have.  Can someone please share the steps/processors necessary 
for below use case.


  1.  Receive JSON on a HTTP REST end point
  2.  Parse Http Header and do validation. Return Error code & messages as JSON 
to the response in case of validation failures
  3.  Parse request JSON, perform various validations (missing data in fields), 
massages some data, add some data
  4.  Check if the request JSON unique ID is present in MongoDB and compare 
timestamp to validate if this is an update request or a new request
  5.  If new request, an entry is made in mongo and then JSON files are written 
to output folder for another process to pick up and submit to Solr.
  6.  If update request, mongo record is updated and JSON files are written to 
output folder

I understand that something like HandleHttpRequest Processor can be used for 
receiving http request and then use PutSolrContentStream for writing to Solr 
but not clear on what processors will be used for validation etc. steps 2 thru 
5 above.

Appreciate your input.

Thanks,
Susheel






Reply via email to