Re: New Feature - Hot deployment of new processors

2016-03-22 Thread Joe Witt
Hello It is certainly possible to support hot deployment for only new things but this would likely leave a really rough user experience. I think you make a great point about it being easier for things not currently being used. I suspect though people will be upgrading/redeploying things that

Re: CSV/delimited to Parquet conversion via Nifi

2016-03-22 Thread Dmitry Goldenberg
Agreed, but probably not the case with XML to Avro. Perhaps ConvertFormat would be for a set of the more straightforward conversions. > On Mar 22, 2016, at 11:30 PM, Tony Kurc wrote: > > On the intermediate representation: not necessarily needed, and likely a > performance

Re: CSV/delimited to Parquet conversion via Nifi

2016-03-22 Thread Tony Kurc
On the intermediate representation: not necessarily needed, and likely a performance hindrance to do so. Consider converting from a CSV to a flat json object. This can be done by streaming through the values, and likely only needing a single input character in memory at a time. On Mar 22, 2016

Re: CSV/delimited to Parquet conversion via Nifi

2016-03-22 Thread Dmitry Goldenberg
It seems to me that for starters it's great to have the processors which convert from various input formats to FlowFile, and from FlowFile to various output formats. That covers all the cases and it gives the users a chance to run some extra processors in between which is often handy, and

Re: Create row keys for HBase from Json messages

2016-03-22 Thread Hong Li
Hi Bryan, Thank you very much for tips. I tested during the day. They are working now. Hong *Hong Li* *Centric Consulting* *In Balance* (888) 781-7567 office (614) 296-7644 mobile www.centricconsulting.com | @Centric On Mon, Mar 21, 2016 at 8:33 PM, Bryan

Re: CSV/delimited to Parquet conversion via Nifi

2016-03-22 Thread Edmon Begoli
Good point. I just think that Parquet and ORC are important targets, just as relational/JDBC stores are. On Tuesday, March 22, 2016, Tony Kurc wrote: > Interesting question. A couple discussion points: If we start doing a > processor for each of these conversions, it may

Re: CSV/delimited to Parquet conversion via Nifi

2016-03-22 Thread Tony Kurc
Interesting question. A couple discussion points: If we start doing a processor for each of these conversions, it may become unwieldy (P(x,2) processors, where x is number of data formats?) I'd say maybe a more general ConvertFormat processor may be appropriate, but then configuration and code

What is the ideal way of handling Provenance repository corruption?

2016-03-22 Thread Andre
Hi there, Quick question. I have noticed that in the case of disk goes full and you start getting errors like that 2016-03-23 11:01:00,810 ERROR [Timer-Driven Process Thread-6] o.a.n.p.standard.RouteOnAttribute RouteOnAttribute[id=14b8bd3c-ca04-4687-ab72-a863e9370482] Failed to process session

PutKafka Processor Time-out Errors with Guarantee Replicated Delivery on NiFi 0.5.1 and Kafka 0.8.2

2016-03-22 Thread indus well
Hello NiFi Experts: I am getting time-out errors from the PutKafka processor when using the Guarantee Replicated Delivery option in the Guarantee Delivery property on NiFi 0.5.1 with Kafka 0.8.2 cluster. However, everything is working as normal when I switched to the Best Effort option. In

nifi.content.repository.archive.max.retention.period

2016-03-22 Thread Andre
Hi there, I have a testing instance of nifi 0.4.2 running an I've noticed a very strange behaviour around content archives When I look I my settings I see: # Content Repository nifi.content.repository.implementation=org.apache.nifi.controller.repository.Fil eSystemRepository

Re: Dataflow architecture for multiple sources

2016-03-22 Thread Andrew Grande
Aurélien, The choice of a multiplexing channel or multiple dedicated ones is really up to any constraints your environment may (not) have. E.g. if you are able to expose every port required for a socket-based protocol or no. On the NiFi side, take a close look at Backpressure here, it will

Dataflow architecture for multiple sources

2016-03-22 Thread aurelien.de...@gmail.com
Hello. I've to make an architecture based on nifi to collect & route data from sources to some hadoop/ES cluster. Sources will have different constraints (from 50msg/s to hundred of thousand, not the same latency necessities, not the same protocol, etc.). I wonder if we should make a

New Feature - Hot deployment of new processors

2016-03-22 Thread N H
Hi, Complex "data flow systems" always need hot deployment. Is it possible to add "hot deployment for ONLY new processors" ?! It might be too complex (or too easy I do not know!)  to allow "full support of hot deployment" for all processors (especially for those that are being used in

Re: Help on creating that flow that requires processing attributes in a flow content but need to preserve the original flow content

2016-03-22 Thread Conrad Crampton
My 2p. If the kaka.key value (very simple json), you could use UpdateAttribute and use some expression language - specifically the string manipulation functions to extract the part you want. I like the power or ExecuteProcessor by the way. And I agree, this community is phenomenally responsive