Re: Strict order of flow files in a cluster

2021-04-01 Thread Boris Tyukin
n failure instead of routing to a > failure relationship were added specifically for this use case of consuming > CDC events (typically from Kafka). But they were only recently added, in > either 1.12 or 1.13. That should make things simpler. > > Thanks > -Mark > > On Apr 1

Re: Strict order of flow files in a cluster

2021-04-01 Thread Boris Tyukin
sion. Now, > this handles the concern of ordering once the data is on the node, but the > data must also arrive in the correct order from Kafka. So, for this case, > you must also pin specific Kafka partitions to specific nifi nodes, which > can be done by adding user-defined properti

Re: Strict order of flow files in a cluster

2021-04-01 Thread Boris Tyukin
We ended up building a simple groovy processor that will use mysql db to queue up flowfiles. If a flowfile A fails, flowfile B would sit in a queue until we address an issue with flowfile A. We also used back pressure feature to slow down upstream Kafka consumers. After playing with wait/notify

Re: Nifi 1.8 to 1.11.4 migration issue

2020-07-30 Thread Boris Tyukin
Sanjeet, just a heads up as we were in the same boat. We decided to hold off the upgrade because of a critical issue in 1.11.4. not sure if it applies to your environment but we decided to wait for 1.12 more details here https://issues.apache.org/jira/browse/NIFI-7460 On Thu, Jul 30, 2020 at

Re: NiFi-light for analysts

2020-06-29 Thread Boris Tyukin
h > in terms of resource isolation. So, if resource isolation is important to > you, then using separate NiFi deployments is likely desirable. > > Hope this helps! > -Mark > > > [1] https://issues.apache.org/jira/browse/NIFI-7476 > [2] https://issues.apache.org/jira/brows

NiFi-light for analysts

2020-06-28 Thread Boris Tyukin
Hi guys, I am thinking to increase the footprint of NiFi in my org to extend it to less technical roles. I have a few questions: 1) is there any plans to support easy dependencies at some point? We are aware of all the current options (wait-notify, kafka, mergerecord/mergecontent etc.) and all

Re: Upgrade NiFi 1.11.4 Cluster External ZooKeeper 3.4.X

2020-06-12 Thread Boris Tyukin
gt;> recent version with other security related fixes. >> >> On Fri, Jun 12, 2020 at 9:43 AM Boris Tyukin >> wrote: >> >>> wow this is a major pain as we have to stay on CDH 6.1/6.2 for a long >>> time for some reasons. Do you know why Bryan they deci

Re: Upgrade NiFi 1.11.4 Cluster External ZooKeeper 3.4.X

2020-06-12 Thread Boris Tyukin
wow this is a major pain as we have to stay on CDH 6.1/6.2 for a long time for some reasons. Do you know why Bryan they decided to require 3.5.5 now? On Thu, Jun 4, 2020 at 8:42 PM Sri Harsha Chavali < sriharsha.chav...@outlook.com> wrote: > Thank you for the inputs Bryan! Not much we can do

Re: Batch Dependency in NIFI - GetFile

2020-06-02 Thread Boris Tyukin
Dependencies in NiFi is something I wish could work better. I see your pain for sure! I wish there was an easier way as we all have to do ETL batch type dependencies eventually. I also tried Wait/Notify but it was a very confusing setup and felt a bit overengineered for what I wanted to do. The

NiFi and real-time data lake

2020-05-12 Thread Boris Tyukin
Hi guys, wanted to share with you how we used Nifi to build near real-time data lake for a large healthcare system https://boristyukin.com/building-near-real-time-big-data-lake-part-2/ Our real-time infra was pretty crucial during Covid19 to support all kind of analytics. I truly appreciate

Re: nifi 1.9.2 upgrade and zookeeper

2020-04-22 Thread Boris Tyukin
great, thanks Drew! On Wed, Apr 22, 2020 at 1:37 PM Andrew Lim wrote: > Hi Boris, > > I think the information you are looking for is in the NiFi Migration > Guidance [1]. > > -Drew > > [1] https://cwiki.apache.org/confluence/display/NIFI/Migration+Guidance > > On A

nifi 1.9.2 upgrade and zookeeper

2020-04-22 Thread Boris Tyukin
Hi guys, we are on 1.9.2 right now and planning to upgrade to the most recent version of NiFi. We run ours in a clustered mode currently, using Zookeeper that came with CDH 5. I remember I heard that something was changed after 1.9.2 and now embedded zookeeper will have to be used since NiFi

Re: Dashboards for reporting ingest status to users

2020-03-27 Thread Boris Tyukin
I do not think provenance data alone will make any sense even to you, certainly not for your users. We followed advise from Pierre's blog https://pierrevillard.com/best-of-nifi/ and it is a combination of naming processors and having our own custom "AuditLog" processor that logs key events, record

Re: CRON issue in NIFI

2020-01-13 Thread Boris Tyukin
note this is quartz cron syntax which is different from standard cron. This is a handy quartz cron calculator https://www.freeformatter.com/cron-expression-generator-quartz.html On Mon, Jan 13, 2020 at 7:25 PM Shawn Weeks wrote: > I usually use one of the cron calculators online as it can be

Re: promote NiFi flows from DEV to PROD and controllers

2019-12-06 Thread Boris Tyukin
Bryan > > On Fri, Dec 6, 2019 at 1:16 PM Etienne Jouvin > wrote: > > > > Hello. > > > > Why don't you use NiFi registry ? > > Discover this couple if weeks ago, ans ont ils really cool > > > > Le ven. 6 déc. 2019 à 19:14, Boris Tyukin a &

promote NiFi flows from DEV to PROD and controllers

2019-12-06 Thread Boris Tyukin
Hi, We have a single NiFi Registry and DEV/PROD NiFi clusters. When we deploy changes to PROD NiFi (after initial commit), we often have to repoint controllers, especially on custom groovy processors as NiFi would not recognize them by name. It does not happen with standard

Re: PutKudu Processor - Be careful with upgrading to NiFi 1.10.0

2019-11-27 Thread Boris Tyukin
oh wow thanks for the heads up! good thing we decided to stay on 1.10 for some time, waiting for 1.10.2 :) Last upgrade did not go well with 1.9.0 On Wed, Nov 27, 2019 at 7:20 AM wrote: > Just a short information for other people who are using the *PutKudu* > processor and are planning to

Re: Content repository data filling up disk...

2019-06-19 Thread Boris Tyukin
we had the exact same issue and after the upgrade to 1.9.2 all is good On Wed, Jun 19, 2019 at 1:28 PM Erik Anderson wrote: > > Mark Payne > > Version 1.9.1 introduced a bug [1] that resulted We ran into the same > issues with NiFi filling up the disk with NiFi 1.9.1 > > 1.9.2 doesnt have the

Re: backpressure issue

2019-05-22 Thread Boris Tyukin
gt; go slightly beyond the set value. Otherwise, how would you be able to split > 1 unit of work? > > Andrew > > On Wed, May 22, 2019, 11:33 AM Boris Tyukin wrote: > >> Hi guys, >> >> we run on 1.9.2 and experience an interesting issue with one of the >>

backpressure issue

2019-05-22 Thread Boris Tyukin
Hi guys, we run on 1.9.2 and experience an interesting issue with one of the custom groovy processors. It batches 2000 flowfiles (sessing.get(2000)) and routes some to a waiting relationship for retries. that relationship has backpressure set to 5000 flowfiles. What happens is we can see often

Re: use attributed defined in the same UpdateAttribute processor

2019-05-16 Thread Boris Tyukin
0EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > > On May 16, 2019, at 2:08 PM, Boris Tyukin wrote: > > Hi guys, > > is it possible to reuse just defined attribute in another attribute in the > same instance of UpdateAttribute processor or I really have to use two > sepa

use attributed defined in the same UpdateAttribute processor

2019-05-16 Thread Boris Tyukin
Hi guys, is it possible to reuse just defined attribute in another attribute in the same instance of UpdateAttribute processor or I really have to use two separate UpdateAttribute processors? For example, I define name=boris and then I want to add another attribute greetings=Hi ${name} If I do

Re: set penalty duration on a flowfile

2019-03-13 Thread Boris Tyukin
ingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > > On Mar 12, 2019, at 9:52 AM, Boris Tyukin wrote: > > Hi guys, > > Is it possible to set penalty duration for a flowfile rather than a > processor? I have a custom groovy processor and I want to set penalty > dy

set penalty duration on a flowfile

2019-03-12 Thread Boris Tyukin
Hi guys, Is it possible to set penalty duration for a flowfile rather than a processor? I have a custom groovy processor and I want to set penalty dynamically based on some criteria. I was also looking at controlRate processor but would rather penalize flowfiles instead with a custom delay.

Re: QueryRecord and NULLs

2019-03-08 Thread Boris Tyukin
; null. > > SELECT > * > ,NULLIF(5, 5) as unit_cerner_alias > ,NULLIF(5, 5) as room_cerner_alias > ,NULLIF(5, 5) as bed_cerner_alias > FROM FLOWFILE > > On Fri, Mar 8, 2019 at 7:57 AM Boris Tyukin wrote: > > > > I am struggling for an hour now with a very sim

Re: ExecuteSQLRecord and timestamps

2019-03-08 Thread Boris Tyukin
ok so we are back to using ExecuteSQL processor...really do not like how new one handles timestamps and there are might be other differences I do not know about. I was hoping it would produce the same data as older processor. On Thu, Mar 7, 2019 at 2:02 PM Boris Tyukin wrote: > Hi guys, >

QueryRecord and NULLs

2019-03-07 Thread Boris Tyukin
I am struggling for an hour now with a very simple thing. I need to add 3 new fields to a record and set them to NULL but it does not work. I tried null instead - same thing. I checked Calcite docs and I do not see anything special about NULL. And I know you can do it in SQL. This works:

ExecuteSQLRecord and timestamps

2019-03-07 Thread Boris Tyukin
Hi guys, we just upgraded to 1.9 and I was excited to start using new ExecuteSQLRecord processor. While I was migrating an older flow, that uses ExecuteSQL processor I've noticed that timestamp/date types are coming as integers not strings like before. Also AVRO schema inferred from a database

Re: join two datasets

2019-02-22 Thread Boris Tyukin
lookup >>> service. COuld be backed by a simple file, a database, Hive, whatever. >>> Then have the live flow run against that. >>> >>> This reminds me - we should make a Kudu based lookup service i think. >>> I'll chat with some of our new Kudu friends

Re: join two datasets

2019-02-22 Thread Boris Tyukin
Thanks Joe and Bryan. In this case I don't need to do it in real-time, probably once a day only. I am thinking to trigger both pulls by generateflow processor, then merge datasets somehow since flowfile id will be the same for both sets. And then need to join somehow. Would like to use nifi

join two datasets

2019-02-22 Thread Boris Tyukin
Hi guys, I pull two datasets from two different databases on schedule and need to join both on some ID and then publish combined dataset to Kafka. What is the best way to do this? Puzzled how I would synchronize two data pulls so data is joined for exact flowfiles I need, i.e. if there are

Re: 1.9 release date?

2019-02-16 Thread Boris Tyukin
wondering the same thing Boris On Sat, Feb 16, 2019 at 10:41 AM dan young wrote: > Heya folks, > > Any insight on 1.9 release date? Looks like a lot of goodies and fixes > included... > > Regards, > > Dano >

Re: Why RedisDistributedMapCacheClientService does not support clustered mode redis?

2019-02-14 Thread Boris Tyukin
I am not NiFi dev, but personally, after looking at DistributedCache processors we decided not to use them. There is nothing "distributed" about them. It will be a single point of failure in your flows, difficult to manage (you have to use NiFi to read/write/delete keys). And I also looked at

Re: flowfiles stuck in load balanced queue; nifi 1.8

2019-02-08 Thread Boris Tyukin
The new feature is indeed amazing but we are in the same boat and based on what've heard we do not want to risk losing flowfiles and certainly avoid restarts of NiFi. decided to wait for 1.9. I hope it will out pretty soon based on the frequency of previous releases. On Fri, Feb 8, 2019 at

Re: NiFi consumers concurrency

2019-01-25 Thread Boris Tyukin
ifi thread will subscribe to different topics. On Thu, Jan 24, 2019 at 4:44 PM Boris Tyukin wrote: > thanks Mark, but it did not help. other 3 consumer IDs are still not > pulling messages from topics, only the very first one. > > But if I set up 9 different NiFi Kafka Consu

Re: NiFi consumers concurrency

2019-01-24 Thread Boris Tyukin
do the > trick. > > Thanks > -Mark > > On Jan 24, 2019, at 3:30 PM, Boris Tyukin wrote: > > any ideas? > > we've added another 7 topics per Kafka Consumer processor (so 9 topics > total) and with concurrency set to 4, it still pulls in one thread, using >

Kafka max topics

2019-01-22 Thread Boris Tyukin
Hi guys, does anyone know why 100 topics maximum hardcoded in ConsumeKafka processor? is there a reason for that? https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-kafka-bundle/nifi-kafka-0-9-processors/src/main/java/org/apache/nifi/processors/kafka/pubsub/ConsumeKafka.java for

Re: NiFI as Data Pipeline Orchestration Tool?

2019-01-22 Thread Boris Tyukin
We've looked at both...Airflow might be a way better tool for coordination/scheduling. Why do not you take one of your pipelines and try to implement it in both tools? We really liked Airflow but unfortunately, Airflow was not a good fit for real-time processes - that's why we decided to go with

Re: CaptureChangeMySQL and Triggers

2019-01-18 Thread Boris Tyukin
Hi Vitaly, I do not work for a CDC vendor but there are many reasons why CDC tools exist and no NiFi processor will ever beat the benefits of these commercial tools. If timestamps are reliable, extracts are fast and you can handle inserts, updates, delete and primary keys updates without

Re: Ingesting golden gate messages to Hbase using Nifi

2019-01-18 Thread Boris Tyukin
gt;> order. I am not sure if Golden Gate can annotate them. I think there is >> a Kafka # that could help. >> >> >> On Mon, Nov 12, 2018 at 12:16 PM Boris Tyukin >> wrote: >> >>> Faisal, BTW I stumbled upon this doc, that explains how HBase Gold

Re: jdbc impala url

2019-01-09 Thread Boris Tyukin
ready looking at doing that, while testing against a > Kerberos-secured MariaDB instance. I think something similar applies > to SQL Server using Active Directory with Kerberos, and if so, I hope > to put forth a generic solution (or review a PR if someone else wants > to do it :) > &

Re: jdbc impala url

2019-01-09 Thread Boris Tyukin
also check out my post - there were a few pitfalls https://boristyukin.com/how-to-connect-apache-nifi-to-apache-impala/ On Wed, Jan 9, 2019 at 1:58 PM Boris Tyukin wrote: > this is how our connection looks like on a kerberized CDH cluster > > jdbc:impala://_host_:21050/default;A

Re: jdbc impala url

2019-01-09 Thread Boris Tyukin
this is how our connection looks like on a kerberized CDH cluster jdbc:impala://_host_:21050/default;AuthMech=1;KrbRealm=blabla.domain.net ;KrbHostFQDN=blabla.domain.net;KrbServiceName=impalaservicename Boris On Wed, Jan 9, 2019 at 10:49 AM Jeff wrote: > Hello, > > I'm working on some

Re: Beginner questions on NiFi

2019-01-06 Thread Boris Tyukin
Hi PasLe, we do that just fine with NiFi. As for your first question, check my blog post https://boristyukin.com/how-to-connect-apache-nifi-to-apache-impala/ - you can connect to Impala and use PutSQL and ExecuteSQL processors to execute Impala SQL. You can also use putHDFS processor or create

Re: flowfiles stuck in load balanced queue; nifi 1.8

2018-12-28 Thread Boris Tyukin
Mark, you are a troubleshooting master! thanks for chasing this down as this new feature is really awesome and we are about to start using it. Good to know there is a semi-safe workaround. Boris On Fri, Dec 28, 2018 at 10:43 AM Mark Payne wrote: > Dan, et al, > > Great news! I was able to

Re: ConsumeKafka demarcator

2018-12-27 Thread Boris Tyukin
I've looked into source code and I see that I do not have to use a single character but can use any string which then is encoded as a byte array by NiFi. I can generate UUID and use it as a safe demarcator. On Wed, Dec 26, 2018 at 4:47 PM Boris Tyukin wrote: > Hello, > > I would like

ConsumeKafka demarcator

2018-12-26 Thread Boris Tyukin
Hello, I would like to batch Kafka messages using demarcator property but I am not sure which one to use. In my case I get JSON messages and some message values can contain special characters and even newline delimiters in both Windows and Unix flavors (data is coming from GoldenGate, which

Re: flowfiles stuck in load balanced queue; nifi 1.8

2018-12-19 Thread Boris Tyukin
we were about to start using this feature but I guess we would have to wait since so many people having issues with it and there are still no comments from NiFi developers who implemented it...Thanks for the heads up guys On Tue, Dec 18, 2018 at 11:27 PM dan young wrote: > We're seeing this

Re: DistributedMapCacheServer questions

2018-11-29 Thread Boris Tyukin
thanks guys here is new Jira as requested https://issues.apache.org/jira/browse/NIFI-5853 On Thu, Nov 29, 2018 at 2:06 PM Otto Fowler wrote: > Maybe you can open a jira for a ZK client like brian mentions? > > > On November 29, 2018 at 13:59:36, Boris Tyukin (bo...@boristyukin.

Re: DistributedMapCacheServer questions

2018-11-29 Thread Boris Tyukin
ow, then you can use this instead. You can look at > ProcesContext.getStateManager() > > On Thu, Nov 29, 2018 at 1:08 PM Boris Tyukin > wrote: > > > > thanks for the explanation, Bryan! it helps! > > > > Boris > > > > On Thu, Nov 29, 2018 at 12:26 PM Bryan Bende wrote: &g

Re: DistributedMapCacheServer questions

2018-11-29 Thread Boris Tyukin
handling concurrent connections, it should be stable. > > Thanks, > > Bryan > > On Thu, Nov 29, 2018 at 11:52 AM Boris Tyukin > wrote: > > > > Hi guys, > > > > I have a few questions about DistributedMapCacheServer. > > > > First que

DistributedMapCacheServer questions

2018-11-29 Thread Boris Tyukin
Hi guys, I have a few questions about DistributedMapCacheServer. First question, I am confused by "Distributed" part. If I get it, the server actually runs on a single node and if it fails, it is game over. Is that right? Why NiFi is not using ZK for that since ZK is already used by NiFi

Re: stop processing related flowfiles

2018-11-28 Thread Boris Tyukin
s. However, if you require a total order over records this can > be achieved with a topic that has only one partition, though this will mean > only one consumer process per consumer group." > > (From https://kafka.apache.org/documentation/) > > On Wed, 28 Nov 2018, 13

stop processing related flowfiles

2018-11-28 Thread Boris Tyukin
Hi guys, I am trying to come up with a good design for the following challenge: 1. ConsumeKafka processor consumes messages from 200 topics. 2. The next processor is a custom groovy processor that does some data transformation and also puts transformed data into target system.* It is crucial to

Re: setting topic name dynamically in PutKafka?

2018-11-20 Thread Boris Tyukin
I think they did in on purpose because it takes time (3-5 seconds) for a consumer group to connect to Kafka (joining consumer group) and whoever created processor did not want to do it for each flowfile. On Tue, Nov 20, 2018 at 10:09 AM l vic wrote: > Hi, > I have to set topic name property of

Re: EnforceOrder processor

2018-11-13 Thread Boris Tyukin
d benefit an "additional details" > documentation page. If you're willing to file a JIRA and submit a PR > explaining the processor with your words, that would certainly be much > appreciated and I'm sure Koji will be happy to review it. > > Thanks, > Pierre > >

EnforceOrder processor

2018-11-12 Thread Boris Tyukin
I was really confused how EnforceOrder processor works and NiFi documentation made it even more confusing. After some time looking for an explanation, I found this gist which I think was created by a developer who created this

Re: Ingesting golden gate messages to Hbase using Nifi

2018-11-12 Thread Boris Tyukin
files together on a single node? ( I know how to >> distribute them i.e is by using S2S-RPG) >> >> I hope i have been able to explain my situation. Kindly let me know of >> your views on this. >> >> Regards, >> Faisal >> >> >> On Mon, Nov 5

Re: Ingesting golden gate messages to Hbase using Nifi

2018-11-07 Thread Boris Tyukin
( I know how to > distribute them i.e is by using S2S-RPG) > > I hope i have been able to explain my situation. Kindly let me know of > your views on this. > > Regards, > Faisal > > > On Mon, Nov 5, 2018 at 11:18 PM Boris Tyukin > wrote: > >> Hi Faisal, I

Re: Ingesting golden gate messages to Hbase using Nifi

2018-11-05 Thread Boris Tyukin
Hi Faisal, I am not Timothy, but you raise an interesting problem we might face soon as well. I did not expect the situation you described and I thought transaction time would be different. Our intent was to use op_ts to enforce order but another option is to use GG rbc value or oracle rowscn

Re: Expression Language

2018-10-31 Thread Boris Tyukin
looks right to me...Did you check flowFile attributes in provenance events to make sure your attributes are populated? also check exact spelling and casing. If still does not work, show us some screenshots of your flow and properties Boris On Wed, Oct 31, 2018 at 4:04 PM Jones, Patrick L. wrote:

Re: wrapping json around while keeping a single flowFile for Kafka

2018-09-21 Thread Boris Tyukin
and NiFi community are awesome! On Fri, Sep 21, 2018 at 4:29 PM Boris Tyukin wrote: > I understand but I need to transform json first as I described in my > example (wrapping it under payload dict and adding meta dict with > additional elements). So it is not a simple passthrough transf

Re: wrapping json around while keeping a single flowFile for Kafka

2018-09-21 Thread Boris Tyukin
. On Fri, Sep 21, 2018 at 4:21 PM Matt Burgess wrote: > With PublishKafkaRecord you don’t need to do the split, you can pass in > the whole array and it will send each record as a message. > > Regards, > Matt > > On Sep 21, 2018, at 4:09 PM, Boris Tyukin wrote: > >

Re: wrapping json around while keeping a single flowFile for Kafka

2018-09-21 Thread Boris Tyukin
ds, > Matt > > On Fri, Sep 21, 2018 at 3:25 PM Boris Tyukin > wrote: > > > > Hey guys, > > > > I have a flow returning thousands of records from RDBMS and I convert > returned AVRO to JSON and get something like below: > > > > [ > > {"

wrapping json around while keeping a single flowFile for Kafka

2018-09-21 Thread Boris Tyukin
Hey guys, I have a flow returning thousands of records from RDBMS and I convert returned AVRO to JSON and get something like below: [ {"col1":"value11", "col2":"value21", "col3:"value31"}, {"col1":"value12", "col2":"value22", "col3:"value32"}, ... ] So still a single flowFile. Now I need to

Re: Run another PG once first one is finished

2018-09-18 Thread Boris Tyukin
onship. > > That said, another option in the list of ways to accomplish this is > use of Wait/Notify processors. > > Thanks > Joe > On Mon, Sep 17, 2018 at 9:23 PM Boris Tyukin > wrote: > > > > thanks Ed! Totally forgot about S2S - we already use it to monitor err

Re: Run another PG once first one is finished

2018-09-17 Thread Boris Tyukin
le (Project B) > (signal files) instead of Kafka. > > hope that helps. > Ed. > > > On Mon, Sep 17, 2018 at 2:08 PM Boris Tyukin > wrote: > >> Hi, >> >> Let's say I have two totally independent projects / PGs - PG ProjectA and >> PG ProjectB. >>

Run another PG once first one is finished

2018-09-17 Thread Boris Tyukin
Hi, Let's say I have two totally independent projects / PGs - PG ProjectA and PG ProjectB. What would be the best way to kick off PG ProjectB when PG ProjectA is finished? I know I can use output/input ports but I do not want to wire them like that since they are independent projects. The only

Re: ExecuteSQL to support multiple statements

2018-08-20 Thread Boris Tyukin
, like a stripped-down built-in > SquirrelSQL/DBeaver :) > > Regards, > Matt > > On Thu, Aug 16, 2018 at 2:19 PM Boris Tyukin > wrote: > > > > Interesting, thanks for the explanation, Matt. I did not realize it was > so complicated. When we just started with

Re: ExecuteSQL to support multiple statements

2018-08-16 Thread Boris Tyukin
in we'd need a Hive database adapter and would > ask the specified adapter if batching is supported, then go along some > other logic route if not. It should be doable, but just hasn't been > finished yet. > > Regards, > Matt > > On Thu, Aug 16, 2018 at 12:14 PM Boris

Re: ExecuteSQL to support multiple statements

2018-08-16 Thread Boris Tyukin
ExecuteHiveQL. Both support multi-statements. > > Regards. > Ed. > > On Wed, Aug 15, 2018 at 11:11 AM Boris Tyukin > wrote: > >> Hi guys, >> >> I need to issue a query like below on Impala. it works fine from >> impala-shell but NiFi seems not to like mul

Re: List/Fetch pattern for QueryDatabaseTable

2018-08-16 Thread Boris Tyukin
I still think sqoop is the way to go to handle large volumes of data. I wish NiFi had a handy sqoop processor (like in Kylo) but it is easy to do it with groovy (I blogged about it). We just had issues with NiFi and very large flow files. Flows which are using sqoop are not limited by NiFi JVM

ExecuteSQL to support multiple statements

2018-08-15 Thread Boris Tyukin
Hi guys, I need to issue a query like below on Impala. it works fine from impala-shell but NiFi seems not to like multiple statements like that. set max_row_size=7mb; create table blabla as select blabla from blablabla; I thought it was addressed in 1.7 but I got it confused with Hive

Re: AVRO is the only output format with ExecuteSQL

2018-08-13 Thread Boris Tyukin
Just: > cd tmp > checkout-nifi-pr 2945 > > Maybe useful. > > > On August 13, 2018 at 08:36:04, Boris Tyukin (bo...@boristyukin.com) > wrote: > > Matt, you are awesome! 15 files changes and 3k lines of code - man, do not > tell me you did that in just a few days :) > > s

Re: AVRO is the only output format with ExecuteSQL

2018-08-13 Thread Boris Tyukin
b.com/apache/nifi/pull/2945 > On Tue, Aug 7, 2018 at 8:30 PM Boris Tyukin wrote: > > > > Matt, you rock!! thank you!! > > > > On Tue, Aug 7, 2018 at 5:16 PM Matt Burgess wrote: > >> > >> Sounds good, it makes the underlying code a bit more complicated

Re: Question about NiFi and bulk database inserts - is PutSQL the only out of the box option?

2018-08-09 Thread Boris Tyukin
the data, then doing a INSERT FROM SELECT at Hive to pull > the data from the raw external table and convert it to a different format > for the managed target table. > > I'm always interested in performance improvements we can make, especially > in the RDBMS world, so I'm all ears

Re: Question about NiFi and bulk database inserts - is PutSQL the only out of the box option?

2018-08-09 Thread Boris Tyukin
Matt, but it still not using bulk load methods, right?. Some databases have proprietary ways of doing that fast rather than running a bunch of insert statements. Bob, what I've done in the past is dumping data either to HDFS or local disk and then using efficient tools to do this job using bulk

Re: AVRO is the only output format with ExecuteSQL

2018-08-07 Thread Boris Tyukin
t >> can then write the schema to >> >> the "avro.schema" attribute or you can choose "Do Not Write Schema". >> This would still allow the data >> >> to be written in JSON, CSV, etc. >> >> >> >> You could also have the Record Wr

Re: AVRO is the only output format with ExecuteSQL

2018-08-07 Thread Boris Tyukin
n-stream processors read the > schema from this attribute. > This would allow you to use any record-oriented processors you'd like > without having to define the > schema yourself, if you don't want to. > > Thanks > -Mark > > > > On Aug 7, 2018, at 12:37 PM, Boris Tyuk

Re: AVRO is the only output format with ExecuteSQL

2018-08-07 Thread Boris Tyukin
> > > > > On Tue, Aug 7, 2018, 8:45 AM Joe Witt wrote: > > >> > > >> i think we just need to make an ExecuteSqlRecord processor. > > >> > > >> thanks > > >> > > >> On Tue, Aug 7, 2018, 8:41 AM Mike Thomsen >

AVRO is the only output format with ExecuteSQL

2018-08-07 Thread Boris Tyukin
I've been wondering since I started learning NiFi why ExecuteSQL processor only returns AVRO formatted data. All community examples I've seen then convert AVRO to json and pretty much all of them then split json to multiple flows. I found myself doing the same thing over and over and over again.

Re: Ingestion from databases: pure NiFi vs Kylo with Scoop

2018-08-04 Thread Boris Tyukin
Vitaly, The best way is to try yourself and build a simple process to prove your case. I got excited first about Kylo, but quickly realized I could do everything I needed with NiFi. I did not really care about fancy UI with Kylo, but I did love a lots of things - integration with Spark and

Re: NiFi Data Usage via Rest API

2018-07-25 Thread Boris Tyukin
ents). I may just dig into the source if this email goes >> stale. >> >> -Ryan >> >> >> On Wed, Jul 25, 2018 at 9:17 AM, Boris Tyukin >> wrote: >> >>> Ryan, if you have not seen these posts from Pierre, I suggest >>> starting there. He

Re: NiFi Data Usage via Rest API

2018-07-25 Thread Boris Tyukin
Ryan, if you have not seen these posts from Pierre, I suggest starting there. He does a good job explaining different options https://pierrevillard.com/2017/05/11/monitoring-nifi-introduction/ I do agree that 5 minute thing is super confusing and pretty useless and you cannot change that

processing a bunch of rules in NiFi in real-time - do I need distributed cache?

2018-07-24 Thread Boris Tyukin
Hi guys, I could really use an advice. We need to replicate 300 tables from 3 Oracle DBs into Apache Kudu. I'm thinking about doing this: OracleDB1 --> OracleDB2 --> Oracle GoldenGate --> Kafka --> NiFi 3 node cluster --> Kudu OracleDB3 --> GoldenGate will stream changes in 300 tables in near

Re: NiFi ExecuteScript vs multiple processors vs custom processor

2018-07-10 Thread Boris Tyukin
ry if desired. > > On Tue, Jul 10, 2018 at 11:22 AM, Boris Tyukin > wrote: > > thanks Bryan. I saw your blog post on that. I think with registry 0.1 it > was > > not possible to version nested PGs within parent PGs so I could not have > > "templatized" PG wh

Re: NiFi ExecuteScript vs multiple processors vs custom processor

2018-07-10 Thread Boris Tyukin
since there wasn't a better solution at the time. > > Using NiFi Registry should now be the preferred solution, and you can > sync changes in-place. You can have many versioned process groups tied > to the same versioned flow in a registry and update them all. > > Thanks, > >

Re: NiFi ExecuteScript vs multiple processors vs custom processor

2018-07-10 Thread Boris Tyukin
I like Ed's recommendations and doing something similar. I use ISPs for some repetitive tasks, used in multiple places / flows. Unfortunately, NiFi templates are very limited in use for that purpose (you can only import/export them but cannot sync changes in them across flows). Wanted to use

Re: NiFi as RESTful server?

2018-06-27 Thread Boris Tyukin
Kelsey, Take it as an opinion from someone new to NiFi (using it for 1 year or so). I've looked into doing the same thing. The best examples I found were https://pierrevillard.com/2016/04/10/url-shortener-service-with-apache-nifi/comment-page-1/

Re: HandleHttpRequest and Allowed Paths

2018-06-14 Thread Boris Tyukin
? It looks like the path and port are out of place, > e.g.: > > > > localhost:8011/admin > > localhost:8011/info > > > > -Kevin > > > > *From: *Boris Tyukin > *Reply-To: * > *Date: *Thursday, June 14, 2018 at 16:26 > *To: * > *Subject:

HandleHttpRequest and Allowed Paths

2018-06-14 Thread Boris Tyukin
Hi, I am following the example here with a simple web service https://pierrevillard.com/2016/04/10/url-shortener-service-with-apache-nifi/ and want to make different endpoints like localhost/admin:8011 localhost/info:8011 etc. Looks like I need to set "Allowed Paths" property on

Re: Row counts - date lake

2018-06-12 Thread Boris Tyukin
I would be curious to hear how you end up doing it, Faisal. In my experience taking row count from HBase tables was painfully slow and this was one of the reasons we decided to move to Apache Kudu. We tried 5 different ways taking row counts with HBase and it was still painfully slow. Another

Re: Issues with ExecuteSQL and DBCPConnectionPool service in NiFi

2018-05-25 Thread Boris Tyukin
Hi Paul, have you tried to set validation query on the pool? something like SELECT 1 FROM dual I had a bunch of issues with our databases but after we started using validation query, all issues with disconnected sessions were gone. Give it a try if you have not done this yet On Fri, May 25,

Re: InvokeScriptedProcessor from a shared folder

2018-05-23 Thread Boris Tyukin
ttps://github.com/apache/nifi/pull/2734 > > On Tue, May 22, 2018 at 5:00 PM, Boris Tyukin <bo...@boristyukin.com> > wrote: > > Hi Bryan, > > > > yes, here is the trace: > > > > 2018-05-22 16:00:49,339 WARN [NiFi Web Server-805] > > o.a.n.controll

Re: InvokeScriptedProcessor from a shared folder

2018-05-22 Thread Boris Tyukin
, Bryan Bende <bbe...@gmail.com> wrote: > Is there a stacktrace in nifi-app.log at the time you got the validation > error? > > On Tue, May 22, 2018 at 4:35 PM, Boris Tyukin <bo...@boristyukin.com> > wrote: > > I tried both along with nifi restart. > > &g

Re: InvokeScriptedProcessor from a shared folder

2018-05-22 Thread Boris Tyukin
t; wrote: > Were you disabling the processor or just stopping it? I've found with the > scripted processors they will get in an odd state and that's the only way > to reset it. > > > Shawn > ------ > *From:* Boris Tyukin <bo...@boristyukin

Re: InvokeScriptedProcessor from a shared folder

2018-05-22 Thread Boris Tyukin
well it works now...after I recreated processor 3 times and was changing parameters back and forth and now that error is gone. we did not change permissions on nfs or file so not sure why nifi did not like it at first. On Tue, May 22, 2018 at 3:32 PM, Boris Tyukin <bo...@boristyukin.com>

InvokeScriptedProcessor from a shared folder

2018-05-22 Thread Boris Tyukin
Hello, I created a custom groovy processor and saved it in a file. That file was placed on NFS share and I pointed NiFi's InvokeScriptedProcessor to it. When I started a processor, I got a weird error that it failed custom validation. If I copy that file out of NFS to a local directory, it

Re: Faster way to develop/test custom NiFi processors?

2018-05-15 Thread Boris Tyukin
ocs/nifi-docs/html/developer- > guide.html#testing > > [2] https://github.com/apache/nifi/tree/master/nifi-nar- > bundles/nifi-standard-bundle/nifi-standard-processors/src/ > test/java/org/apache/nifi/processors/standard > > > > On Tue, May 15, 2018 at 1:12 PM, Boris Tyuki

Faster way to develop/test custom NiFi processors?

2018-05-15 Thread Boris Tyukin
Hi there, I found a few tutorials on how to create custom NiFi processors. I wonder if there is a better way to test code during initial development other than building a nar file every time, copying to NiFi dir and restarting NiFi to pick up new changes. It does sound like a very long / painful

  1   2   >