NiFi 1.9.0 : Unable to change the state of a processor from DISABLED to ENABLED from rest api

2020-10-15 Thread Mohit Jain
Hi Team,

I have an implementation where I receive the error bulletin in the flow, it
is STOPPED and DISABLED from the rest api. Everything is working fine for
months now.

 A few days ago, I ran into a weird issue with HDFS processors when I
received the error bulletin and I stopped and disabled the flow.
I received the kerberos login error:
*  javax.security.auth.login.LoginException: connect timed out *

As per the implementation a bulletin is received and the flow is stopped
and then disabled using rest api.

Later when I enable the flow using the rest api, I get this error in
response:
Node abc.co.in:8443 is unable to fulfill this request due to:
f99d210c-b20e-3870-ae1b-e1ec4702394f is not stopped.

When I look in the UI, it shows that the processor is disabled.

Once I enable it from the UI it starts working normally.

I'm stuck with it as everytime this issue occurs, someone has to go
into the NiFi UI and enable that processor manually. This is not feasible
in the production environment.

Any help would be appreciated.

Regards,
Mohit


NiFi 1.9.0: ListFile throws NoSuchFileException

2020-10-05 Thread Mohit Jain
Hi Team,

I'm getting the following error in the LisFile Processor:
ListFile[id=4ebbbe93-5e16-31d1-8793-63c704d4eae0]
ListFile[id=4ebbbe93-5e16-31d1-8793-63c704d4eae0] failed to process session
due to java.nio.file.NoSuchFileException: /data/test/testfile.CTL; \
Processor Administratively Yielded for 1 sec: java.io.UncheckedIOException:
java.nio.file.NoSuchFileException: /data/test/testfile.CTL

There are following operations done in the flow:
1. List the file from the folder(other flows also reading from the same
folder).
2. Fetch the file using the FetchFile, move the file from the source folder
to the processed folder.
3. Go ahead with the flow.

It's a 3 Node NiFi 1.9.0 cluster, and the flow is reading from the NFS
mount. ListFile is scheduled to run on the primary node.

Any help would be appreciated.

Thanks,
Mohit


Re: Nifi takes too long to start(~ 30 minutes)

2020-08-24 Thread Mohit Jain
Hi Andy,

Apologies for the late reply.
Yes there are around 25000 processors (1000 flows with around 25 processors
each, I mistyped.)

I started the nifi in the debug mode to get something but nothing concrete.
Following logs gets printed for all the processors:

2020-08-24 22:56:04,416 INFO
org.apache.nifi.controller.StandardProcessorNode:
RouteOnAttribute[id=c4ea2bfc-bd88-3df8-aab2-4edc2b3a7678] disabled so
ScheduledState transitioned from STOPPED to DISABLED.

Also this for the HDFS processors:

2020-08-24 22:56:05,121 DEBUG org.apache.hadoop.util.NativeCodeLoader:
Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError:
Native Library
/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hadoop/lib/native/libhadoop.so.1.0.0
already loaded in another classloader

These are the only logs occurring repeatedly. Finally after half an hour
flow synchronization starts and NiFi is up after 2-3 minutes max.

There are 5 custom processors being used in the flow.

Thanks,
Mohit





On Wed, Aug 19, 2020 at 5:07 AM Andy LoPresto  wrote:

> Mohit,
>
> I’m a little confused by some of the details you report. Initially you
> said there were ~100 flows with 25 processors each (2500); now there are
> 25,000. If you examine the logs, you note that “reaching the election
> process” takes the majority of the time — what messages are being printed
> in the log before the election process starts? Have you taken thread dumps
> at regular intervals during this time to see where the time is being spent?
>
> This could be caused by the cost of fingerprinting such a large flow
> during cluster startup. Are there any custom processors in your NiFi
> instance?
>
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> He/Him
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Aug 16, 2020, at 10:28 PM, Mohit Jain  wrote:
>
> Hi Pierre,
>
> No, the election process takes only a minute or two max, reaching the
> election process takes a lot of time.
> There are around 25k stopped/invalid components in the UI, I marked all of
> them as disabled hoping it would resolve the issue but no improvement in
> the startup time.
>
> On a similar note, NiFi Rest api response is also quite slow when there
> are a lot of components. Would it improve by disabling the components which
> are not in use? Or is there something else that could have been causing the
> issue?
>
> Thanks,
> Mohit
>
> On Thu, Aug 13, 2020 at 9:49 PM Pierre Villard <
> pierre.villard...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm surprised this is something you observe in the election process part.
>> I've constantly seen quick startup times even with thousands of
>> components in the flow.
>> I'd look into the logs and maybe turn on some debug logs to find out
>> what's going on.
>>
>> Pierre
>>
>> Le jeu. 13 août 2020 à 16:33, Joe Witt  a écrit :
>>
>>> Mohit,
>>>
>>> You almost certainly want to take that same flow and setup a cluster on
>>> a more recent version of NiFi to compare startup times.  For flows with
>>> thousands of components there are important improvements which have
>>> occurred in the past year and a half.
>>>
>>> Startup time, user perceived behavior in the UI on continuous
>>> operations, etc.. have been improved. Further you can now hot load new
>>> versions of nars which should reduce the need to restart.
>>>
>>> We also will have 1.12 out hopefully within days so that could be
>>> interesting for you as well.
>>>
>>> Thanks
>>>
>>> On Thu, Aug 13, 2020 at 7:18 AM Mohit Jain 
>>> wrote:
>>>
>>>> Hi Team,
>>>>
>>>> I am using a single node NiFi 1.9.0 cluster. It takes more than 30
>>>> minutes to start each time it is restarted. There are more than 100 flows
>>>> on the NiFi UI with an average of 25 processors per flow. It takes around
>>>> 25-30 minutes to reach the cluster election process after which it gets
>>>> started in a minute.
>>>>
>>>> Is this an expected behaviour that startup time is directly
>>>> proportional to the number of processors in the Canvas? Or is there a way
>>>> to reduce the NiFi startup time?
>>>>
>>>> Any leads would be appreciated.
>>>>
>>>> Thanks,
>>>> Mohit
>>>>
>>>
>


Re: Nifi takes too long to start(~ 30 minutes)

2020-08-16 Thread Mohit Jain
Hi Pierre,

No, the election process takes only a minute or two max, reaching the
election process takes a lot of time.
There are around 25k stopped/invalid components in the UI, I marked all of
them as disabled hoping it would resolve the issue but no improvement in
the startup time.

On a similar note, NiFi Rest api response is also quite slow when there are
a lot of components. Would it improve by disabling the components which are
not in use? Or is there something else that could have been causing the
issue?

Thanks,
Mohit

On Thu, Aug 13, 2020 at 9:49 PM Pierre Villard 
wrote:

> Hi,
>
> I'm surprised this is something you observe in the election process part.
> I've constantly seen quick startup times even with thousands of components
> in the flow.
> I'd look into the logs and maybe turn on some debug logs to find out
> what's going on.
>
> Pierre
>
> Le jeu. 13 août 2020 à 16:33, Joe Witt  a écrit :
>
>> Mohit,
>>
>> You almost certainly want to take that same flow and setup a cluster on a
>> more recent version of NiFi to compare startup times.  For flows with
>> thousands of components there are important improvements which have
>> occurred in the past year and a half.
>>
>> Startup time, user perceived behavior in the UI on continuous operations,
>> etc.. have been improved. Further you can now hot load new versions of nars
>> which should reduce the need to restart.
>>
>> We also will have 1.12 out hopefully within days so that could be
>> interesting for you as well.
>>
>> Thanks
>>
>> On Thu, Aug 13, 2020 at 7:18 AM Mohit Jain 
>> wrote:
>>
>>> Hi Team,
>>>
>>> I am using a single node NiFi 1.9.0 cluster. It takes more than 30
>>> minutes to start each time it is restarted. There are more than 100 flows
>>> on the NiFi UI with an average of 25 processors per flow. It takes around
>>> 25-30 minutes to reach the cluster election process after which it gets
>>> started in a minute.
>>>
>>> Is this an expected behaviour that startup time is directly proportional
>>> to the number of processors in the Canvas? Or is there a way to reduce the
>>> NiFi startup time?
>>>
>>> Any leads would be appreciated.
>>>
>>> Thanks,
>>> Mohit
>>>
>>


Re: Nifi takes too long to start(~ 30 minutes)

2020-08-16 Thread Mohit Jain
Hi Brandon,

There are no flow files in the queue.

Thanks,
Mohit

On Thu, Aug 13, 2020 at 9:56 PM Brandon DeVries 
wrote:

> Mohit,
>
> How many flowfiles are currently on the instance?  Sometimes a very large
> number of flowfiles can result in slower start times.
>
> Brandon
>
> --
> *From:* Pierre Villard 
> *Sent:* Thursday, August 13, 2020 12:10:51 PM
> *To:* users@nifi.apache.org 
> *Subject:* Re: Nifi takes too long to start(~ 30 minutes)
>
> Hi,
>
> I'm surprised this is something you observe in the election process part.
> I've constantly seen quick startup times even with thousands of components
> in the flow.
> I'd look into the logs and maybe turn on some debug logs to find out
> what's going on.
>
> Pierre
>
> Le jeu. 13 août 2020 à 16:33, Joe Witt  a écrit :
>
> Mohit,
>
> You almost certainly want to take that same flow and setup a cluster on a
> more recent version of NiFi to compare startup times.  For flows with
> thousands of components there are important improvements which have
> occurred in the past year and a half.
>
> Startup time, user perceived behavior in the UI on continuous operations,
> etc.. have been improved. Further you can now hot load new versions of nars
> which should reduce the need to restart.
>
> We also will have 1.12 out hopefully within days so that could be
> interesting for you as well.
>
> Thanks
>
> On Thu, Aug 13, 2020 at 7:18 AM Mohit Jain 
> wrote:
>
> Hi Team,
>
> I am using a single node NiFi 1.9.0 cluster. It takes more than 30 minutes
> to start each time it is restarted. There are more than 100 flows on the
> NiFi UI with an average of 25 processors per flow. It takes around 25-30
> minutes to reach the cluster election process after which it gets started
> in a minute.
>
> Is this an expected behaviour that startup time is directly proportional
> to the number of processors in the Canvas? Or is there a way to reduce the
> NiFi startup time?
>
> Any leads would be appreciated.
>
> Thanks,
> Mohit
>
>


Nifi takes too long to start(~ 30 minutes)

2020-08-13 Thread Mohit Jain
Hi Team,

I am using a single node NiFi 1.9.0 cluster. It takes more than 30 minutes
to start each time it is restarted. There are more than 100 flows on the
NiFi UI with an average of 25 processors per flow. It takes around 25-30
minutes to reach the cluster election process after which it gets started
in a minute.

Is this an expected behaviour that startup time is directly proportional to
the number of processors in the Canvas? Or is there a way to reduce the
NiFi startup time?

Any leads would be appreciated.

Thanks,
Mohit


Re: Updating the Concurent Tasks of a NiFi processor using Rest API

2020-07-31 Thread Mohit Jain
Thanks Otto,

That helped.


On Fri, Jul 31, 2020 at 4:54 PM Otto Fowler  wrote:

> If you do it from a browser, and are using the browser debugging tools,
> you will be able to catch the call.  Everything you do in the web can be
> done from rest somehow.
>
>
>
>
> On July 31, 2020 at 06:47:06, Mohit Jain (mo...@open-insights.com) wrote:
>
> Hi Team,
>
> Is there a Rest API to change the number of concurrent tasks? I wasn't
> able to find any in the doc.
>
> Any leads would be appreciated.
>
> Thanks,
> Mohit
>
>


Updating the Concurent Tasks of a NiFi processor using Rest API

2020-07-31 Thread Mohit Jain
Hi Team,

Is there a Rest API to change the number of concurrent tasks? I wasn't able
to find any in the doc.

Any leads would be appreciated.

Thanks,
Mohit


NiFi 1.9.0 MoveHDFS error: Failed to rename on HDFS

2020-07-26 Thread Mohit Jain
Hi Team,

I'm trying to move the file to another location using the MoveHDFS
processor. I have used the *Conflict Resolution Strategy* = *replace. *It
throws the error when there is a file already located in the target
location.

Error log:
MoveHDFS[id=]
Failed to rename on HDFS due to could not move file '/path/to/source' to
its final filename.

Any help would be appreciated.

Regards,
Mohit


Re: Urgent: HDFS processors throwing OOM - Compressed class space exception

2020-07-22 Thread Mohit Jain
Those are the small files not more than 100 MB each.
Nifi is configured to 16g.

Thanks

Get Outlook for iOS<https://aka.ms/o0ukef>

From: Jorge Machado 
Sent: Wednesday, July 22, 2020 2:55:26 PM
To: users@nifi.apache.org 
Subject: Re: Urgent: HDFS processors throwing OOM - Compressed class space 
exception

How big are the files that you are trying to store? How much memory did you 
configure nifi ?

> On 22. Jul 2020, at 06:13, Mohit Jain  wrote:
>
> Hi team,
>
> I’ve been facing the issue while using any HDFS processor, e.g. - PutHDFS 
> throws the error -
> Failed to write to HDFS due to compressed class space: 
> java.lang.OutOFMemoryError
>
> Eventually the node gets disconnected.
>
> Any help would be appreciated.
>
> Regards,
> Mohit
> 



Re: Urgent: HDFS processors throwing OOM - Compressed class space exception

2020-07-21 Thread Mohit Jain
Apologies.
I intended to send it on the user list. I mistakenly sent it on the dev list.. 
that’s why sent it here again.

Thanks,
Mohit

Get Outlook for iOS<https://aka.ms/o0ukef>

From: Joe Witt 
Sent: Wednesday, July 22, 2020 10:05:16 AM
To: users@nifi.apache.org 
Subject: Re: Urgent: HDFS processors throwing OOM - Compressed class space 
exception

you had already sent this urgent note to dev.  now cross posted 40 minutes 
later.  please dont

On Tue, Jul 21, 2020 at 9:14 PM Mohit Jain 
mailto:mo...@open-insights.com>> wrote:
Hi team,

I’ve been facing the issue while using any HDFS processor, e.g. - PutHDFS 
throws the error -
Failed to write to HDFS due to compressed class space: 
java.lang.OutOFMemoryError

Eventually the node gets disconnected.

Any help would be appreciated.

Regards,
Mohit


Re: Urgent: Facing issue while trying to open a NiFi processor

2020-07-11 Thread Mohit Jain
Hi Team,

It seems like the nifi-flow-audit.h2.db is corrupted. I'm not able to
execute the same query on the database directly.
Anything I can do to fix this?
And what are the scenarios when this might happen again?

Regards,
Mohit


On Sat, Jul 11, 2020 at 2:48 PM Mohit Jain  wrote:

> Hi Team,
>
> I'm facing a weird issue while trying to configure any NiFi processor:
>
> org.apache.nifi.admin.dao.DataAccessException:
> org.h2.jdbc.JdbcSQLException: Timeout trying to lock table
> "CONFIGURE_DETAILS"; SQL statement:
> SELECT DISTINCT CD.NAME FROM CONFIGURE_DETAILS CD INNER JOIN ACTION A ON
> CD.ACTION_ID = A.ID WHERE A.SOURCE_ID = ? [50200-176]
>  at
> org.apache.nifi.admin.service.impl.StandardAuditService.getPreviousValues(StandardAuditService.java:102)
> at
> org.apache.nifi.web.StandardNiFiServiceFacade.getComponentHistory(StandardNiFiServiceFacade.java:4537)
> at
> org.apache.nifi.web.StandardNiFiServiceFacade$$FastClassBySpringCGLIB$$358780e0.invoke()
>
> Complete nifi-user.log is attached in the log.
>
> Any input would be appreciated.
>
> Regards,
> Mohit
>


Urgent: Facing issue while trying to open a NiFi processor

2020-07-11 Thread Mohit Jain
Hi Team,

I'm facing a weird issue while trying to configure any NiFi processor:

org.apache.nifi.admin.dao.DataAccessException:
org.h2.jdbc.JdbcSQLException: Timeout trying to lock table
"CONFIGURE_DETAILS"; SQL statement:
SELECT DISTINCT CD.NAME FROM CONFIGURE_DETAILS CD INNER JOIN ACTION A ON
CD.ACTION_ID = A.ID WHERE A.SOURCE_ID = ? [50200-176]
 at
org.apache.nifi.admin.service.impl.StandardAuditService.getPreviousValues(StandardAuditService.java:102)
at
org.apache.nifi.web.StandardNiFiServiceFacade.getComponentHistory(StandardNiFiServiceFacade.java:4537)
at
org.apache.nifi.web.StandardNiFiServiceFacade$$FastClassBySpringCGLIB$$358780e0.invoke()

Complete nifi-user.log is attached in the log.

Any input would be appreciated.

Regards,
Mohit
2020-07-11 09:06:53,063 ERROR [NiFi Web Server-28918] 
o.a.n.w.a.c.AdministrationExceptionMapper 
org.apache.nifi.admin.service.AdministrationException: 
org.apache.nifi.admin.dao.DataAccessException: org.h2.jdbc.JdbcSQLException: 
Timeout trying to lock table "CONFIGURE_DETAILS"; SQL statement:
SELECT DISTINCT CD.NAME FROM CONFIGURE_DETAILS CD INNER JOIN ACTION A ON 
CD.ACTION_ID = A.ID WHERE A.SOURCE_ID = ? [50200-176]. Returning Internal 
Server Error response.
org.apache.nifi.admin.service.AdministrationException: 
org.apache.nifi.admin.dao.DataAccessException: org.h2.jdbc.JdbcSQLException: 
Timeout trying to lock table "CONFIGURE_DETAILS"; SQL statement:
SELECT DISTINCT CD.NAME FROM CONFIGURE_DETAILS CD INNER JOIN ACTION A ON 
CD.ACTION_ID = A.ID WHERE A.SOURCE_ID = ? [50200-176]
at 
org.apache.nifi.admin.service.impl.StandardAuditService.getPreviousValues(StandardAuditService.java:102)
at 
org.apache.nifi.web.StandardNiFiServiceFacade.getComponentHistory(StandardNiFiServiceFacade.java:4537)
at 
org.apache.nifi.web.StandardNiFiServiceFacade$$FastClassBySpringCGLIB$$358780e0.invoke()
at 
org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at 
org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:736)
at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
at 
org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:84)
at 
org.apache.nifi.web.NiFiServiceFacadeLock.proceedWithReadLock(NiFiServiceFacadeLock.java:155)
at 
org.apache.nifi.web.NiFiServiceFacadeLock.getLock(NiFiServiceFacadeLock.java:120)
at sun.reflect.GeneratedMethodAccessor504.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:627)
at 
org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:616)
at 
org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:70)
at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at 
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at 
org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:671)
at 
org.apache.nifi.web.StandardNiFiServiceFacade$$EnhancerBySpringCGLIB$$c0d134ca.getComponentHistory()
at 
org.apache.nifi.web.api.FlowResource.getComponentHistory(FlowResource.java:2548)
at sun.reflect.GeneratedMethodAccessor1088.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76)
at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148)
at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191)
at 
org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200)
at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103)
at 
org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493)
at 
org.glassfish.jersey.server.mod

HiveConnectionPool doesn't get enabled with the Nifi-Api in NiFi 1.9.2

2019-09-09 Thread Mohit Jain
Hi Team,

I have recentlyl upgraded to Nifi-1.9.2 from Nifi-1.6.0. I have written the
wrapper to start and stop the flow using Nifi-Api. While starting the
controller services, HiveConnectionPool is not getting enabled. All other
controller services(e.g., Database controller service, AvroReader,
AvroWriter, etc) gets enabled.

I'm not able to figure out the reason behind it. One thing I witnessed is
that the PutHiveQL is the last processor in the flow. Seems like the last
controller service doesn't get enabled

Kindly Help.

Regards,
Mohit


Unable to login to Nifi UI with ranger

2019-08-13 Thread Mohit Jain
Hi,

I've integrated Nifi-1.9.2 with ranger. When I login to the UI, following
error is shown -

No applicable policies could be found. Contact the system administrator.

I'm not able to figure out what I'm missing. Kindly help.

Regards,
Mohit


Re: GenerateTable fetch throws error with MS SQL 2012+

2019-08-09 Thread Mohit Jain
Hi Matt,
Does it means that all the other DBs(Oracle, MySQL, etc) might fetch
duplicates rows with GenerateTableFetch?

Thanks,
Mohit

On Thu, Aug 8, 2019 at 8:16 PM Matt Burgess  wrote:

> Mohit,
>
> MSSQL seems to have the only parser (of the more popular DBs) that
> complains about an empty ORDER BY clause when doing paging (with
> OFFSET / FETCH NEXT), so the default behavior is to throw an
> exception. This happens when you don't supply a Max Value Column,
> meaning you want to fetch all rows but haven't specified how to order
> them, to ensure that each page gets unique rows and all rows are
> returned. Otherwise each fetch could grab duplicate rows and some rows
> may never be fetched, as the ordering is arbitrary.  For some reason
> other DBs (PostgreSQL, Oracle) lets you try it with the documented
> caveat that the ordering is arbitrary.
>
> So an ordering must be applied, and to that end we added a Custom
> ORDER BY Clause property [1] such that the user can provide their own
> clause if no Max Value Column is specified. This property will be
> available in the upcoming NiFi 1.10.0 release. The only workarounds I
> know of are to use QueryDatabaseTable instead of GenerateTableFetch,
> or to supply a Max Value Column property value in GenerateTableFetch.
>
> Regards,
> Matt
>
> [1] https://issues.apache.org/jira/browse/NIFI-6348
>
> On Thu, Aug 8, 2019 at 6:18 AM Mohit Jain  wrote:
> >
> > Hi Team,
> >
> > I'm facing the following issue while using MS SQL 2012+. in Nifi 1.6.0 -
> >
> > GenerateTableFetch[id=87d43fae-5c02-1f3f-a58f-9fafeac5a640]
> GenerateTableFetch[id=87d43fae-5c02-1f3f-a58f-9fafeac5a640] failed to
> process session due to Order by clause cannot be null or empty when using
> row paging; Processor Administratively Yielded for 1 sec:
> java.lang.IllegalArgumentException: Order by clause cannot be null or empty
> when using row paging
> >
> > It seems like a bug to me, is it resolved in the latest versions of Nifi?
> >
> > Thanks,
> > Mohit
> >
> >
> >
> >
> >
>


GenerateTable fetch throws error with MS SQL 2012+

2019-08-08 Thread Mohit Jain
Hi Team,

I'm facing the following issue while using MS SQL 2012+. in Nifi 1.6.0 -

*GenerateTableFetch[id=87d43fae-5c02-1f3f-a58f-9fafeac5a640]
GenerateTableFetch[id=87d43fae-5c02-1f3f-a58f-9fafeac5a640] failed to
process session due to Order by clause cannot be null or empty when using
row paging; Processor Administratively Yielded for 1 sec:
java.lang.IllegalArgumentException: Order by clause cannot be null or empty
when using row paging*

It seems like a bug to me, is it resolved in the latest versions of Nifi?

Thanks,
Mohit


Re: Unable to login to secured nifi cluster via UI

2019-08-08 Thread Mohit Jain
Thanks Pierre.

On Wed, Aug 7, 2019 at 5:51 PM Pierre Villard 
wrote:

> Hi Mohit,
>
> The initial admin you configured is the user you should use for the first
> connection in order to grant authorizations to additional users/groups. The
> initial admin should have been automatically added in the
> authorizations.xml file created during the first start of NiFi.
>
> Hope this helps,
> Pierre
>
> Le mer. 7 août 2019 à 14:12, Mohit Jain  a
> écrit :
>
>> Hi team,
>>
>> I'm not able to open the nifi canvas in the secured NiFi. It shows the
>> following error message once I provides the credentials -
>> *No applicable policies could be found. Contact the system administrator.*
>>
>> [image: image.png]
>> Kindly help.
>>
>> Thanks,
>> Mohit
>>
>>


Unable to login to secured nifi cluster via UI

2019-08-07 Thread Mohit Jain
Hi team,

I'm not able to open the nifi canvas in the secured NiFi. It shows the
following error message once I provides the credentials -
*No applicable policies could be found. Contact the system administrator.*

[image: image.png]
Kindly help.

Thanks,
Mohit


RE: Unable to see Nifi data lineage in Atlas

2018-07-26 Thread Mohit
Hi, 

 

While looking at the logs, I found  out that ReportingLineageToAtlas is not
able to construct KafkaProducer. 

It throws the following logs - 

 

org.apache.kafka.common.KafkaException: Failed to construct kafka producer

at
org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:33
5)

at
org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:18
8)

at
org.apache.atlas.kafka.KafkaNotification.createProducer(KafkaNotification.ja
va:286)

at
org.apache.atlas.kafka.KafkaNotification.sendInternal(KafkaNotification.java
:207)

at
org.apache.atlas.notification.AbstractNotification.send(AbstractNotification
.java:84)

at
org.apache.atlas.hook.AtlasHook.notifyEntitiesInternal(AtlasHook.java:133)

at
org.apache.atlas.hook.AtlasHook.notifyEntities(AtlasHook.java:118)

at
org.apache.atlas.hook.AtlasHook.notifyEntities(AtlasHook.java:171)

at
org.apache.nifi.atlas.NiFiAtlasHook.commitMessages(NiFiAtlasHook.java:150)

at
org.apache.nifi.atlas.reporting.ReportLineageToAtlas.lambda$consumeNiFiProve
nanceEvents$6(ReportLineageToAtlas.java:721)

at
org.apache.nifi.reporting.util.provenance.ProvenanceEventConsumer.consumeEve
nts(ProvenanceEventConsumer.java:204)

at
org.apache.nifi.atlas.reporting.ReportLineageToAtlas.consumeNiFiProvenanceEv
ents(ReportLineageToAtlas.java:712)

at
org.apache.nifi.atlas.reporting.ReportLineageToAtlas.onTrigger(ReportLineage
ToAtlas.java:664)

at
org.apache.nifi.controller.tasks.ReportingTaskWrapper.run(ReportingTaskWrapp
er.java:41)

at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$
301(ScheduledThreadPoolExecutor.java:180)

at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Sch
eduledThreadPoolExecutor.java:294)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
49)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
24)

at java.lang.Thread.run(Thread.java:748)

Caused by: java.lang.IllegalArgumentException: No enum constant
org.apache.kafka.common.protocol.SecurityProtocol.PLAINTEXTSASL

at java.lang.Enum.valueOf(Enum.java:238)

at
org.apache.kafka.common.protocol.SecurityProtocol.valueOf(SecurityProtocol.j
ava:28)

at
org.apache.kafka.common.protocol.SecurityProtocol.forName(SecurityProtocol.j
ava:89)

at
org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:7
9)

at
org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:27
7)

... 20 common frames omitted

 

Thanks,

Mohit

 

From: Mohit  
Sent: 25 July 2018 17:46
To: users@nifi.apache.org
Subject: Unable to see Nifi data lineage in Atlas

 

Hi all,

 

I have configured ReportingLineageToAtlas reporting task to send Nifi flow
information to Atlas. Nifi is integrated with Ranger.

I am able to see all the information in the Atlas except the lineage. When I
search for hdfs_path or hive_table, I can only see the hive side
information. I can't figure out anything wrong in the configuration.

Is there something in the Ranger configuration that I'm missing?

 

Regards,

Mohit

 

 



Unable to see Nifi data lineage in Atlas

2018-07-25 Thread Mohit
Hi all,

 

I have configured ReportingLineageToAtlas reporting task to send Nifi flow
information to Atlas. Nifi is integrated with Ranger.

I am able to see all the information in the Atlas except the lineage. When I
search for hdfs_path or hive_table, I can only see the hive side
information. I can't figure out anything wrong in the configuration.

Is there something in the Ranger configuration that I'm missing?

 

Regards,

Mohit

 



GenerateTableFetch -> RPG -> ExecuteSQL fetching duplicate records from Netezza but the count is same.

2018-07-20 Thread Mohit
Hi all,

I am fetching data from Netezza using GenerateTableFetch -> RPG ->
ExecuteSQL -> PutHDFS . It is working fine for most of the time, but for
some tables with more than a million rows, it fetches duplicate rows.

 

Partition Size  varies from 3 million to 30 million with respect to table
size. For table with ~300 million rows, size is 30 million and likewise.  

 

For Example -

 

Table : abc

Netezza count -  3265421

Hive Count - 3265421

Duplicate rows in Hive -  97070

 

Is this the expected behaviour while fetching from Netezza?

 

Regards,

Mohit



RE: SelectHiveQl gets stuck when query table containning 12 Billion rows

2018-06-27 Thread Mohit
Thanks Shawn,

I followed a similar approach.

 

Regards,

Mohit

 

From: Shawn Weeks  
Sent: 27 June 2018 19:22
To: users@nifi.apache.org
Subject: Re: SelectHiveQl gets stuck when query table containning 12 Billion
rows

 

Well to get the partitions you can execute a 'show partitions table_name',
then you can use the SplitRecord with an AvroReader and JSON Writer to
generate a flow file for partition. That flow file can then be read with
EvaluateJsonPath to pull the partition_name into an attribute on the flow
file. Then finally a ReplaceText to actual write out the select statement
substituting the partition variable.

 

Thanks

Shawn

  _  

From: Mohit mailto:mohit.j...@open-insights.co.in> >
Sent: Wednesday, June 27, 2018 8:40:20 AM
To: users@nifi.apache.org <mailto:users@nifi.apache.org> 
Subject: RE: SelectHiveQl gets stuck when query table containning 12 Billion
rows 

 

Hi,

 

Yes I tried to fetch around 40 million rows which took time but it was
executed. I'll try with the Avro thing.

 

How to break the  select into multiple part? Can you explain in brief the
partition flow to start with?

 

Thanks,

Mohit

 

From: Shawn Weeks mailto:swe...@weeksconsulting.us> > 
Sent: 27 June 2018 18:51
To: users@nifi.apache.org <mailto:users@nifi.apache.org> 
Subject: Re: SelectHiveQl gets stuck when query table containning 12 Billion
rows

 

It's probably not stuck doing nothing, using a JDBC connection to fetch 12
Billion rows is going to be painful no matter what you do. At those kind of
sizes you're probably better off having Hive create a temporary table in
Avro format and then consuming the Avro files from HDFS into NiFi. The
largest number of rows I've pulled into NiFi via JDBC in a single query is
around 10-20 Million and that took a long time. You can also try breaking
the select into multiple parts and running them simultaneously. I've done
something similar where I first ran a query to get all of the partitions and
then I executed a select for each partition in parallel.

 

Thanks

Shawn

  _  

From: Mohit mailto:mohit.j...@open-insights.co.in> >
Sent: Wednesday, June 27, 2018 8:14:25 AM
To: users@nifi.apache.org <mailto:users@nifi.apache.org> 
Subject: SelectHiveQl gets stuck when query table containning 12 Billion
rows 

 

Hi all,

 

I'm trying to fetch data from hive using SelectHiveQL. It works fine for
small to medium sized tables, but when I try to fetch data from large table
with around 12 billion rows it gets stuck for hours but do nothing.  I have
set the Max Row per flowfile property to 10 million.

We have a 4 node NiFi cluster with 150GB RAM memory each. 

Is there any configuration which is to be manipulated to make this work?

 

Regards,

Mohit



RE: SelectHiveQl gets stuck when query table containning 12 Billion rows

2018-06-27 Thread Mohit
Hi,

 

Yes I tried to fetch around 40 million rows which took time but it was
executed. I'll try with the Avro thing.

 

How to break the  select into multiple part? Can you explain in brief the
partition flow to start with?

 

Thanks,

Mohit

 

From: Shawn Weeks  
Sent: 27 June 2018 18:51
To: users@nifi.apache.org
Subject: Re: SelectHiveQl gets stuck when query table containning 12 Billion
rows

 

It's probably not stuck doing nothing, using a JDBC connection to fetch 12
Billion rows is going to be painful no matter what you do. At those kind of
sizes you're probably better off having Hive create a temporary table in
Avro format and then consuming the Avro files from HDFS into NiFi. The
largest number of rows I've pulled into NiFi via JDBC in a single query is
around 10-20 Million and that took a long time. You can also try breaking
the select into multiple parts and running them simultaneously. I've done
something similar where I first ran a query to get all of the partitions and
then I executed a select for each partition in parallel.

 

Thanks

Shawn

  _  

From: Mohit mailto:mohit.j...@open-insights.co.in> >
Sent: Wednesday, June 27, 2018 8:14:25 AM
To: users@nifi.apache.org <mailto:users@nifi.apache.org> 
Subject: SelectHiveQl gets stuck when query table containning 12 Billion
rows 

 

Hi all,

 

I'm trying to fetch data from hive using SelectHiveQL. It works fine for
small to medium sized tables, but when I try to fetch data from large table
with around 12 billion rows it gets stuck for hours but do nothing.  I have
set the Max Row per flowfile property to 10 million.

We have a 4 node NiFi cluster with 150GB RAM memory each. 

Is there any configuration which is to be manipulated to make this work?

 

Regards,

Mohit



SelectHiveQl gets stuck when query table containning 12 Billion rows

2018-06-27 Thread Mohit
Hi all,

 

I'm trying to fetch data from hive using SelectHiveQL. It works fine for
small to medium sized tables, but when I try to fetch data from large table
with around 12 billion rows it gets stuck for hours but do nothing.  I have
set the Max Row per flowfile property to 10 million.

We have a 4 node NiFi cluster with 150GB RAM memory each. 

Is there any configuration which is to be manipulated to make this work?

 

Regards,

Mohit



RE: Unable to read the Parquet file written by NiFi through Spark when Logical Data Type is set to true.

2018-05-31 Thread Mohit
Hi,

Spark 2.3 has avro-1.7.7.jar. But hive 1.2 also uses avro-1.7.5.jar and I’m 
able to read the parquet file from hive.

Not sure if this is the reason.

 

Thanks,

Mohit

 

From: Mike Thomsen  
Sent: 31 May 2018 14:15
To: users@nifi.apache.org
Subject: Re: Unable to read the Parquet file written by NiFi through Spark when 
Logical Data Type is set to true.

 

Maybe check to see which version of Avro is bundled with your deployment of 
Spark?

 

On Thu, May 31, 2018 at 3:26 AM Mohit mailto:mohit.j...@open-insights.co.in> > wrote:

Hi Mike,

 

I have created the hive external table on the top of parquet and able to read 
it from hive. 

 

While querying hive from spark these are the errors – 

 

For the decimal type, I get the following error – (In hive, data type is 
decimal(12,5))

Caused by: org.apache.spark.sql.execution.QueryExecutionException: Parquet 
column cannot be converted in file 
hdfs://ip-10-0-0-216.ap-south-1.compute.internal:8020/user/hermes/nifi_test1/test_pqt2_dcm/4963966040134.
 Column: [dc_type], Expected: DecimalType(12,5), Found: BINARY

  at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:192)

  at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)

 

For the time(which was converted into TIME_MILLIS) –  (In Hive, the data type 
is int)

Caused by: org.apache.spark.sql.AnalysisException: Parquet type not yet 
supported: INT32 (TIME_MILLIS);

at 
org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.typeNotImplemented$1(ParquetSchemaConverter.scala:105)

at 
org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:141)

  

Thanks,

Mohit

 

From: Mike Thomsen mailto:mikerthom...@gmail.com> > 
Sent: 30 May 2018 17:28
To: users@nifi.apache.org <mailto:users@nifi.apache.org> 
Subject: Re: Unable to read the Parquet file written by NiFi through Spark when 
Logical Data Type is set to true.

 

What's the error from Spark? Logical data types are just a variant on existing 
data types in Avro 1.8.

 

On Wed, May 30, 2018 at 7:54 AM Mohit mailto:mohit.j...@open-insights.co.in> > wrote:

Hi all,

 

I’m fetching the data from RDBMS and writing it to parquet using PutParquet 
processor. I’m not able to read the data from Spark when Logical Data Type is 
true. I’m able to read it from Hive.

Do I have to set some specific properties in the PutParquet processor to make 
it readable from spark as well?

 

Regards,

Mohit



RE: Unable to read the Parquet file written by NiFi through Spark when Logical Data Type is set to true.

2018-05-31 Thread Mohit
Hi Mike,

 

I have created the hive external table on the top of parquet and able to read 
it from hive. 

 

While querying hive from spark these are the errors – 

 

For the decimal type, I get the following error – (In hive, data type is 
decimal(12,5))

Caused by: org.apache.spark.sql.execution.QueryExecutionException: Parquet 
column cannot be converted in file 
hdfs://ip-10-0-0-216.ap-south-1.compute.internal:8020/user/hermes/nifi_test1/test_pqt2_dcm/4963966040134.
 Column: [dc_type], Expected: DecimalType(12,5), Found: BINARY

  at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:192)

  at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)

 

For the time(which was converted into TIME_MILLIS) –  (In Hive, the data type 
is int)

Caused by: org.apache.spark.sql.AnalysisException: Parquet type not yet 
supported: INT32 (TIME_MILLIS);

at 
org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.typeNotImplemented$1(ParquetSchemaConverter.scala:105)

at 
org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:141)

  

Thanks,

Mohit

 

From: Mike Thomsen  
Sent: 30 May 2018 17:28
To: users@nifi.apache.org
Subject: Re: Unable to read the Parquet file written by NiFi through Spark when 
Logical Data Type is set to true.

 

What's the error from Spark? Logical data types are just a variant on existing 
data types in Avro 1.8.

 

On Wed, May 30, 2018 at 7:54 AM Mohit mailto:mohit.j...@open-insights.co.in> > wrote:

Hi all,

 

I’m fetching the data from RDBMS and writing it to parquet using PutParquet 
processor. I’m not able to read the data from Spark when Logical Data Type is 
true. I’m able to read it from Hive.

Do I have to set some specific properties in the PutParquet processor to make 
it readable from spark as well?

 

Regards,

Mohit



Unable to read the Parquet file written by NiFi through Spark when Logical Data Type is set to true.

2018-05-30 Thread Mohit
Hi all,

 

I'm fetching the data from RDBMS and writing it to parquet using PutParquet
processor. I'm not able to read the data from Spark when Logical Data Type
is true. I'm able to read it from Hive.

Do I have to set some specific properties in the PutParquet processor to
make it readable from spark as well?

 

Regards,

Mohit



RE: PutDatabaseRecord Error while ingesting to Netezza

2018-05-28 Thread Mohit
Hi,

describe hive table – I have attached the desc hive table

schema – I’m using the SelectHiveQL processor and setting the CSV header to 
true. Then I’m using the CSVReader with Schema Access Strategy – Use String 
Fields from Header.

 

PS - I tried to write to MySQL and it is working fine with both CSVReader and 
AvroReader. For Netezza it is not able to write.

 

Thanks, 

Mohit

 

From: Pierre Villard <pierre.villard...@gmail.com> 
Sent: 24 May 2018 13:49
To: users@nifi.apache.org
Subject: Re: PutDatabaseRecord Error while ingesting to Netezza

 

Hi,

 

Can we have more details: describe of the Hive table and schema you're using in 
the processor?

 

Pierre

 

2018-05-24 8:26 GMT+02:00 Mohit <mohit.j...@open-insights.co.in 
<mailto:mohit.j...@open-insights.co.in> >:

Hi all,

I’m facing the error while writing the data to Netezza using put database 
records. I get the following error – 

 

PutDatabaseRecord[id=90aad845-0163-1000--28414a59] Failed to process 
StandardFlowFileRecord[uuid=f06b1510-9f66-42c1-a0f8-29ffc7a4d30f,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1527141341281-340, container=default, 
section=340], offset=572216, 
length=52],offset=0,name=8068481294062298.0.csv,size=35] due to None of the 
fields in the record map to the columns defined by the test_data table:

 

I’m reading the data from hive and writing to the flowfile in the csv format.

 

Please let me know if I’m doing anything wrong.

 

Thanks,

Mohit

 

cal_dt  string
cal_tm  string
event_typ_cdstring
tndr_typ_cd string
lgl_entity_nbr   int
lgl_entity_sub_idint
ntwk_id  int
site_id  int
lane_nbr   int
chanl_cd   string
id_nbrstring
id_sys_nbrint
id_sys_issuer_nbr int
rd_amt string
tt_store_sys_ord_amt   string
per_id int
tran_nbrbigint
tran_nbrbigint
term_id int
lat_coord string
long_coordstring
device_present_indstring
cnsmr_id_cntint
loy_cnsmr_id_cntint
ndc_ord_qty int
ndc_ord_amt string
que_trade_item_qty   int
trade_item_purch_amtstring
pos_trade_item_purch_amtstring
trade_item_purch_qtyint
gtz_trade_item_qty  int
gtz_trade_item_amt  string
vent_tms   string
data_srcstring
data_qual_cdstring
mngng_cd  string
batch_nbrint

Detailed Table Information
Database:   test
Owner:  hdp-df
CreateTime: Fri May 25 09:16:35 EDT 2018
LastAccessTime: UNKNOWN
Protect Mode:   None
Retention:  0
Location:   hdfs://
Table Type: EXTERNAL_TABLE
Table Parameters:
   EXTERNALTRUE
   numFiles1
   totalSize   266050048
   transient_lastDdlTime   1527254195

# Storage Information
SerDe Library:  
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat:   
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Compressed: No
Num Buckets:-1
Bucket Columns: []
Sort Columns:   []
Storage Desc Params:
   serialization.format1

PutDatabaseRecord Error while ingesting to Netezza

2018-05-24 Thread Mohit
Hi all,

I'm facing the error while writing the data to Netezza using put database
records. I get the following error - 

 

PutDatabaseRecord[id=90aad845-0163-1000--28414a59] Failed to process
StandardFlowFileRecord[uuid=f06b1510-9f66-42c1-a0f8-29ffc7a4d30f,claim=Stand
ardContentClaim [resourceClaim=StandardResourceClaim[id=1527141341281-340,
container=default, section=340], offset=572216,
length=52],offset=0,name=8068481294062298.0.csv,size=35] due to None of the
fields in the record map to the columns defined by the test_data table:

 

I'm reading the data from hive and writing to the flowfile in the csv
format.

 

Please let me know if I'm doing anything wrong.

 

Thanks,

Mohit



RE: Nifi Remote Process Group FlowFile Distribution among nodes

2018-05-10 Thread Mohit
Thanks Koji,
It was helpful.

Mohit.

-Original Message-
From: Koji Kawamura <ijokaruma...@gmail.com> 
Sent: 09 May 2018 05:53
To: users@nifi.apache.org
Subject: Re: Nifi Remote Process Group FlowFile Distribution among nodes

Hi,

You can see the description for default behavior by hovering your mouse on the 
question mark icon (?) next to the "Batch Settings".
In short, for sending (to a remote input port), 500 milliseconds batch duration.
For pulling (from a remote output port), 5 seconds batch duration.

Thanks,
Koji

On Tue, May 8, 2018 at 4:56 PM, Mohit <mohit.j...@open-insights.co.in> wrote:
> Hi,
>
> What are the configurations in default batch settings?
>
> Thanks,
> Mohit
>
> -Original Message-
> From: Koji Kawamura <ijokaruma...@gmail.com>
> Sent: 08 May 2018 13:19
> To: users@nifi.apache.org
> Subject: Re: Nifi Remote Process Group FlowFile Distribution among 
> nodes
>
> Hi Mohit,
>
> NiFi RPG batches multiple FlowFiles into the same Site-to-Site transaction, 
> and the default batch settings are configured for higher throughput.
> If you prefer more granular distribution, you can lower the batch 
> configurations from "Manage Remote Ports" context menu of a 
> RemoteProcessGroup.
> The batch size configuration from UI is added since NiFi 1.2.0, and the JIRA 
> can be a reference.
> https://issues.apache.org/jira/browse/NIFI-1202
>
> Thanks,
> Koji
>
> On Tue, May 8, 2018 at 2:24 PM, Mohit <mohit.j...@open-insights.co.in> wrote:
>> Hi,
>>
>>
>>
>> I need some clarity on how flowfile is distributed among different 
>> nodes in a Nifi cluster.
>>
>>
>>
>> I have a flow where I’m using GenerateTableFetch  to fetch the data 
>> from database. Source table has around 40 million records. I tried 
>> with different partition size which led to create different number of 
>> flowfiles.
>>
>> When there are less number of flowfiles(~40), RPG sends it to only 
>> one node(in a 4 node cluster) but when there are large number of 
>> flowfiles(~400), it distribute the flowfile among all the nodes.
>>
>> Are there some rules or best practices to fine tune the flow, so that 
>> the flowfiles are evenly distributed across the nodes even if there 
>> are less number of flowfiles.
>>
>>
>>
>> Regards,
>>
>> Mohit
>



RE: Nifi Remote Process Group FlowFile Distribution among nodes

2018-05-08 Thread Mohit
Hi,

What are the configurations in default batch settings?

Thanks,
Mohit

-Original Message-
From: Koji Kawamura <ijokaruma...@gmail.com> 
Sent: 08 May 2018 13:19
To: users@nifi.apache.org
Subject: Re: Nifi Remote Process Group FlowFile Distribution among nodes

Hi Mohit,

NiFi RPG batches multiple FlowFiles into the same Site-to-Site transaction, and 
the default batch settings are configured for higher throughput.
If you prefer more granular distribution, you can lower the batch 
configurations from "Manage Remote Ports" context menu of a RemoteProcessGroup.
The batch size configuration from UI is added since NiFi 1.2.0, and the JIRA 
can be a reference.
https://issues.apache.org/jira/browse/NIFI-1202

Thanks,
Koji

On Tue, May 8, 2018 at 2:24 PM, Mohit <mohit.j...@open-insights.co.in> wrote:
> Hi,
>
>
>
> I need some clarity on how flowfile is distributed among different 
> nodes in a Nifi cluster.
>
>
>
> I have a flow where I’m using GenerateTableFetch  to fetch the data 
> from database. Source table has around 40 million records. I tried 
> with different partition size which led to create different number of 
> flowfiles.
>
> When there are less number of flowfiles(~40), RPG sends it to only one 
> node(in a 4 node cluster) but when there are large number of 
> flowfiles(~400), it distribute the flowfile among all the nodes.
>
> Are there some rules or best practices to fine tune the flow, so that 
> the flowfiles are evenly distributed across the nodes even if there 
> are less number of flowfiles.
>
>
>
> Regards,
>
> Mohit



RE: Validate Record issue

2018-05-02 Thread Mohit
Mark,

 

Thanks for the input. I’ll implement the same.

 

Regards,

Mohit

 

From: Mark Payne <marka...@hotmail.com> 
Sent: 02 May 2018 23:13
To: users@nifi.apache.org
Subject: Re: Validate Record issue

 

Mohit, 

 

Correct - I was saying that it *should* allow that but due to a bug (NIFI-5141) 
it currently does not. So in the meantime,

you would have to update your schema to allow for either double or int (or 
long, if you prefer) types.

 

Thanks

-Mark

 





On May 2, 2018, at 1:38 PM, Mohit <mohit.j...@open-insights.co.in 
<mailto:mohit.j...@open-insights.co.in> > wrote:

 

Hi Mark,

 

I set the Strict type checking to false, still it doesn’t allowed.

 

Thanks,

Mohit

 

From: Mark Payne <marka...@hotmail.com <mailto:marka...@hotmail.com> > 
Sent: 02 May 2018 23:00
To: users@nifi.apache.org <mailto:users@nifi.apache.org> 
Subject: Re: Validate Record issue

 

Mohit,

 

If you look at the Provenance events that are emitted by the processor, they 
show the reason that the records

are considered invalid. Specifically, for this use case, it shows: The 
following 2 fields had values whose type did not match the schema: [/hs_kbps, 
/site_id]

It appears that in your incoming data, the values are integers, instead of 
doubles. If you have the "Strict Type Checking"

value set to "false" in the processor, then it should allow this. 
Unfortunately, though, it appears that there is a bug

that causes integer values not to be considered validate when the schema says 
that it is a double. I created a JIRA [1] for this.

 

In the meantime, if you update your schema to allow for those fields to be 
["null", "long", "double"] then you should be good.

 

Thanks

-Mark

 

[1]  <https://issues.apache.org/jira/browse/NIFI-5141> 
https://issues.apache.org/jira/browse/NIFI-5141

 






On May 2, 2018, at 11:57 AM, Mohit < <mailto:mohit.j...@open-insights.co.in> 
mohit.j...@open-insights.co.in> wrote:

 

Hi,

I’m using ValidateRecord processor to validate CSV and writing to avro. 

For a file it is transferring all the record to an invalid relationship. It is 
working fine with ConvertCsvToAvro processor.

 

Avro Schema - 
{"type":"record","name":"cell_kpi_dump_geo","namespace":"cell_kpi_dump_geo","fields":[{"name":"month","type":["null","string"],"default":null},

{"name":"cell","type":["null","string"],"default":null},{"name":"availability","type":["int","null"],"default":0},{"name":"cssr_speech","type":["int","null"],"default":0},

{"name":"dcr_speech","type":["int","null"],"default":0},{"name":"hs_kbps","type":["double","null"],"default":0.0},{"name":"eul_kbps","type":["int","null"],"default":0},

{"name":"tech","type":["null","string"],"default":null},{"name":"site_id","type":["double","null"],"default":0.0},{"name":"longitude","type":["double","null"],"default":0.0},{"name":"latitude","type":["double","null"],"default":0.0}]}

 

 

Sample record – 

 

May-16,KA4371D,95,100,0,151,,2G,4371,-1.606926,6.67223

 

Is there something I’m doing wrong?

 

 

Regards,

Mohit

 



RE: Validate Record issue

2018-05-02 Thread Mohit
Hi Mark,

 

I set the Strict type checking to false, still it doesn’t allowed.

 

Thanks,

Mohit

 

From: Mark Payne <marka...@hotmail.com> 
Sent: 02 May 2018 23:00
To: users@nifi.apache.org
Subject: Re: Validate Record issue

 

Mohit,

 

If you look at the Provenance events that are emitted by the processor, they 
show the reason that the records

are considered invalid. Specifically, for this use case, it shows: The 
following 2 fields had values whose type did not match the schema: [/hs_kbps, 
/site_id]

It appears that in your incoming data, the values are integers, instead of 
doubles. If you have the "Strict Type Checking"

value set to "false" in the processor, then it should allow this. 
Unfortunately, though, it appears that there is a bug

that causes integer values not to be considered validate when the schema says 
that it is a double. I created a JIRA [1] for this.

 

In the meantime, if you update your schema to allow for those fields to be 
["null", "long", "double"] then you should be good.

 

Thanks

-Mark

 

[1] https://issues.apache.org/jira/browse/NIFI-5141

 





On May 2, 2018, at 11:57 AM, Mohit <mohit.j...@open-insights.co.in 
<mailto:mohit.j...@open-insights.co.in> > wrote:

 

Hi,

I’m using ValidateRecord processor to validate CSV and writing to avro. 

For a file it is transferring all the record to an invalid relationship. It is 
working fine with ConvertCsvToAvro processor.

 

Avro Schema - 
{"type":"record","name":"cell_kpi_dump_geo","namespace":"cell_kpi_dump_geo","fields":[{"name":"month","type":["null","string"],"default":null},

{"name":"cell","type":["null","string"],"default":null},{"name":"availability","type":["int","null"],"default":0},{"name":"cssr_speech","type":["int","null"],"default":0},

{"name":"dcr_speech","type":["int","null"],"default":0},{"name":"hs_kbps","type":["double","null"],"default":0.0},{"name":"eul_kbps","type":["int","null"],"default":0},

{"name":"tech","type":["null","string"],"default":null},{"name":"site_id","type":["double","null"],"default":0.0},{"name":"longitude","type":["double","null"],"default":0.0},{"name":"latitude","type":["double","null"],"default":0.0}]}

 

 

Sample record – 

 

May-16,KA4371D,95,100,0,151,,2G,4371,-1.606926,6.67223

 

Is there something I’m doing wrong?

 

 

Regards,

Mohit

 



Validate Record issue

2018-05-02 Thread Mohit
Hi,

I'm using ValidateRecord processor to validate CSV and writing to avro. 

For a file it is transferring all the record to an invalid relationship. It
is working fine with ConvertCsvToAvro processor.

 

Avro Schema -
{"type":"record","name":"cell_kpi_dump_geo","namespace":"cell_kpi_dump_geo",
"fields":[{"name":"month","type":["null","string"],"default":null},

{"name":"cell","type":["null","string"],"default":null},{"name":"availabilit
y","type":["int","null"],"default":0},{"name":"cssr_speech","type":["int","n
ull"],"default":0},

{"name":"dcr_speech","type":["int","null"],"default":0},{"name":"hs_kbps","t
ype":["double","null"],"default":0.0},{"name":"eul_kbps","type":["int","null
"],"default":0},

{"name":"tech","type":["null","string"],"default":null},{"name":"site_id","t
ype":["double","null"],"default":0.0},{"name":"longitude","type":["double","
null"],"default":0.0},{"name":"latitude","type":["double","null"],"default":
0.0}]}

 

 

Sample record - 

 

May-16,KA4371D,95,100,0,151,,2G,4371,-1.606926,6.67223

 

Is there something I'm doing wrong?

 

 

Regards,

Mohit



RE: Nifi 1.6.0 ValidateRecord Processor- AvroRecordSetWriter issue

2018-04-24 Thread Mohit
Matt,

I have used all default compression settings as of now, i.e., it is set to none 
for AvroRecordSetWriter, PutHDFS, ConverAvroToORC.

Regards,
Mohit

-Original Message-
From: Matt Burgess <mattyb...@apache.org> 
Sent: 24 April 2018 20:58
To: users@nifi.apache.org
Subject: Re: Nifi 1.6.0 ValidateRecord Processor- AvroRecordSetWriter issue

Mohit,

What are you using for compression settings for ConvertCSVToAvro, 
ConvertAvroToORC, AvroRecordSetWriter, PutHDFS, etc.?  The default for 
ConvertCSVToAvro is Snappy where the rest default to None (although IIRC 
ConvertAvroToORC will retain an incoming Snappy codec in the outgoing ORC).  If 
you used all defaults, I'm surprised the exact opposite isn't the case, as 
we've seen issues with processors like CompressContent using Snappy not working 
with Hive tables.

Also, are you using the same schema in ConvertCSVToAvro as you are in 
AvroRecordSetWriter?  If you view the Avro flow files (in the UI, listing the 
queue then viewing a flow file in the queue) before ConvertAvroToORC, do they 
appear to look the same in terms of the data and the data types?

Regards,
Matt



On Tue, Apr 24, 2018 at 9:29 AM, Mohit <mohit.j...@open-insights.co.in> wrote:
> Hi,
>
> I'm using the hive.ddl attribute formed by ConvertAvroToORC for the DDL 
> statement, and passing it to custom processor(ReplaceText route the file to 
> failure if maximum buffer size is more than 1 mb size. For large data it is 
> not a good option.) which modify the content of the flowfile to the hive.ddl 
> statement + location of external table. I don’t think this is causing any 
> issue.
>
> I tried to create the table on the top of avro file as well, it also displays 
> null.
>
> In the AvroRecordSetWriter doc, it says " Writes the contents of a RecordSet 
> in Binary Avro format. ". Is this format different than what ConvertCsvToAvro 
> writes? Because same flow is working with ValidateRecord(write it to csv)  
> +ConvertCsvToAvro processor.
>
> Thanks,
> Mohit
>
> -Original Message-
> From: Matt Burgess <mattyb...@apache.org>
> Sent: 24 April 2018 18:43
> To: users@nifi.apache.org
> Subject: Re: Nifi 1.6.0 ValidateRecord Processor- AvroRecordSetWriter 
> issue
>
> Mohit,
>
> Can you share the config for your ConvertAvroToORC processor? Also, by 
> "CreateHiveTable", do you mean ReplaceText (to set the content to the 
> hive.ddl attribute formed by ConvertAvroToORC) -> PutHiveQL (to execute the 
> DDL)?  If not, are you using a custom processor or ExecuteStreamCommand or 
> something else?
>
> If you are not using the generated DDL to create the table, can you share 
> your CREATE TABLE statement for the target table? I'm guessing there's a 
> mismatch somewhere between the data and the table definition.
>
> Regards,
> Matt
>
>
> On Tue, Apr 24, 2018 at 9:09 AM, Mohit <mohit.j...@open-insights.co.in> wrote:
>> Hi all,
>>
>>
>>
>> I’m using ValidateRecord processor to validate the csv and convert it 
>> into Avro. Later,  I convert this avro to orc using ConvertAvroToORC 
>> processor and write it to hdfs and create a hive table on the top of it.
>>
>> When I query the table, it displays null, though the record count is 
>> matching.
>>
>>
>>
>> Flow -  ValidateRecord -> ConvertAvroToORC -> PutHDFS -> 
>> CreateHiveTable
>>
>>
>>
>>
>>
>> To debug, I also tried to write the avro data to hdfs and created the 
>> hive table on the top of it. It is also displaying null results.
>>
>>
>>
>> Flow -  ValidateRecord -> ConvertCSVToAvro -> PutHDFS   I manually created
>> hive table with avro format.
>>
>>
>>
>>
>>
>> When I use ValidateRecord + ConvertCSVToAvro, it is working fine.
>>
>> Flow -  ValidateRecord -> ConvertCSVToAvro  ->  ConvertAvroToORC -> 
>> PutHDFS
>> -> CreateHiveTable
>>
>>
>>
>>
>>
>> Is there anything I’m doing wrong?
>>
>>
>>
>> Thanks,
>>
>> Mohit
>>
>>
>>
>>
>



RE: Nifi 1.6.0 ValidateRecord Processor- AvroRecordSetWriter issue

2018-04-24 Thread Mohit
Hi,

I'm using the hive.ddl attribute formed by ConvertAvroToORC for the DDL 
statement, and passing it to custom processor(ReplaceText route the file to 
failure if maximum buffer size is more than 1 mb size. For large data it is not 
a good option.) which modify the content of the flowfile to the hive.ddl 
statement + location of external table. I don’t think this is causing any 
issue.  

I tried to create the table on the top of avro file as well, it also displays 
null.

In the AvroRecordSetWriter doc, it says " Writes the contents of a RecordSet in 
Binary Avro format. ". Is this format different than what ConvertCsvToAvro 
writes? Because same flow is working with ValidateRecord(write it to csv)  
+ConvertCsvToAvro processor.

Thanks,
Mohit

-Original Message-
From: Matt Burgess <mattyb...@apache.org> 
Sent: 24 April 2018 18:43
To: users@nifi.apache.org
Subject: Re: Nifi 1.6.0 ValidateRecord Processor- AvroRecordSetWriter issue

Mohit,

Can you share the config for your ConvertAvroToORC processor? Also, by 
"CreateHiveTable", do you mean ReplaceText (to set the content to the hive.ddl 
attribute formed by ConvertAvroToORC) -> PutHiveQL (to execute the DDL)?  If 
not, are you using a custom processor or ExecuteStreamCommand or something else?

If you are not using the generated DDL to create the table, can you share your 
CREATE TABLE statement for the target table? I'm guessing there's a mismatch 
somewhere between the data and the table definition.

Regards,
Matt


On Tue, Apr 24, 2018 at 9:09 AM, Mohit <mohit.j...@open-insights.co.in> wrote:
> Hi all,
>
>
>
> I’m using ValidateRecord processor to validate the csv and convert it 
> into Avro. Later,  I convert this avro to orc using ConvertAvroToORC 
> processor and write it to hdfs and create a hive table on the top of it.
>
> When I query the table, it displays null, though the record count is 
> matching.
>
>
>
> Flow -  ValidateRecord -> ConvertAvroToORC -> PutHDFS -> 
> CreateHiveTable
>
>
>
>
>
> To debug, I also tried to write the avro data to hdfs and created the 
> hive table on the top of it. It is also displaying null results.
>
>
>
> Flow -  ValidateRecord -> ConvertCSVToAvro -> PutHDFS   I manually created
> hive table with avro format.
>
>
>
>
>
> When I use ValidateRecord + ConvertCSVToAvro, it is working fine.
>
> Flow -  ValidateRecord -> ConvertCSVToAvro  ->  ConvertAvroToORC -> 
> PutHDFS
> -> CreateHiveTable
>
>
>
>
>
> Is there anything I’m doing wrong?
>
>
>
> Thanks,
>
> Mohit
>
>
>
>



Nifi 1.6.0 ValidateRecord Processor- AvroRecordSetWriter issue

2018-04-24 Thread Mohit
Hi all,

 

I'm using ValidateRecord processor to validate the csv and convert it into
Avro. Later,  I convert this avro to orc using ConvertAvroToORC processor
and write it to hdfs and create a hive table on the top of it. 

When I query the table, it displays null, though the record count is
matching.

 

Flow -  ValidateRecord -> ConvertAvroToORC -> PutHDFS -> CreateHiveTable

 

 

To debug, I also tried to write the avro data to hdfs and created the hive
table on the top of it. It is also displaying null results.

 

Flow -  ValidateRecord -> ConvertCSVToAvro -> PutHDFS   I manually created
hive table with avro format.

 

 

When I use ValidateRecord + ConvertCSVToAvro, it is working fine.

Flow -  ValidateRecord -> ConvertCSVToAvro  ->  ConvertAvroToORC -> PutHDFS
-> CreateHiveTable

 

 

Is there anything I'm doing wrong?

 

Thanks,

Mohit

 

 



RE: Generate a surrogate key/sequence for each avro record

2018-04-23 Thread Mohit
I'm able to achieve it using QueryRecord processor.

 

Thanks,

Mohit

 

From: Mohit <mohit.j...@open-insights.co.in> 
Sent: 23 April 2018 15:59
To: users@nifi.apache.org
Subject: Generate a surrogate key/sequence for each avro record

 

Hi, 

 

I'm trying to added some extra info like a surrogate key/sequence number to
the data fetch from rdbms. 

Is there a way to generate surrogate key for each avro record in nifi?

 

Regards,
Mohit



Generate a surrogate key/sequence for each avro record

2018-04-23 Thread Mohit
Hi, 

 

I'm trying to added some extra info like a surrogate key/sequence number to
the data fetch from rdbms. 

Is there a way to generate surrogate key for each avro record in nifi?

 

Regards,
Mohit



Nifi Custom processor code coverage

2018-04-10 Thread Mohit
Hi,

 

I'm testing a custom processor. I've used a customValidate(ValidationContext
context) method to validate the properties. When I run coverage, that piece
of code is not covered.  I've used assertNotVaild() but it doesn't increase
my code coverage.  

Is there any way to test that part? My code coverage is affected by it.

 

It would be helpful if there is any document available for nifi-mock.

 

Regards,

Mohit



RE: NiFi 1.6

2018-04-06 Thread Mohit
Joe, 

What is the expected date?

 

From: Joe Witt  
Sent: 06 April 2018 02:20
To: users@nifi.apache.org
Subject: Re: NiFi 1.6

 

dan

 

It is presently working through the release candidate vote process.  As it 
stands now it could be out tomorrow.

 

Please do help by reviewing the rc if you have time.  If you have questions on 
how to do it just let us know and we can help.

 

thanks

joe

 

On Thu, Apr 5, 2018, 1:46 PM dan young  > wrote:

any updates on when 1.6 is going to drop?

 

dano

 



RE: ConvertCSVToAvro taking a lot of time when passing single record as an input.

2018-04-03 Thread Mohit
Hi Mike,

 

I intentionally did this, just to check how the processor handles invalid 
record.

 

Thanks,

Mohit

From: Mike Thomsen <mikerthom...@gmail.com> 
Sent: 03 April 2018 18:17
To: users@nifi.apache.org
Subject: Re: ConvertCSVToAvro taking a lot of time when passing single record 
as an input.

 

Mohit,

 

Looking at your schema: 

 

{

   "type": "record",

   "name": "test",

   "namespace": "test",

   "fields": [{

  "name": "name",

  "type": ["null", "int"],

  "default": null

   }, {

  "name": "age",

  "type": ["null", "string"],

      "default": null

   }]

}

 

It looks like you have your fields' types backwards. (Name should be string, 
age should be int)

 

On Tue, Apr 3, 2018 at 8:41 AM, Mohit <mohit.j...@open-insights.co.in 
<mailto:mohit.j...@open-insights.co.in> > wrote:

Pierre,

Thanks for the information. This would be really helpful if Nifi 1.6.0 releases 
this week. There is a lot of pending tasks dependent on it. 

 

Mohit

 

From: Pierre Villard <pierre.villard...@gmail.com 
<mailto:pierre.villard...@gmail.com> > 
Sent: 03 April 2018 18:03


To: users@nifi.apache.org <mailto:users@nifi.apache.org> 
Subject: Re: ConvertCSVToAvro taking a lot of time when passing single record 
as an input.

 

Hi Mohit,

This has been fixed with https://issues.apache.org/jira/browse/NIFI-4955.

Besides, what Marks suggested with NIFI-4883 
<https://issues.apache.org/jira/browse/NIFI-4883>  is now merged in master.

Both will be in NiFi 1.6.0 to be released (hopefully this week).

Pierre

 

 

2018-04-03 12:37 GMT+02:00 Mohit <mohit.j...@open-insights.co.in 
<mailto:mohit.j...@open-insights.co.in> >:

Hi,

I am using the ValidateRecord processor, but seems like it change the order of 
the data. When I set the property Include Header Line to true, I found that the 
record isn’t corrupted but the order is varied.

 

For example- 

 

Actual order –

bsc,cell_name,site_id,site_name,longitude,latitude,status,region,districts_216,area,town_city,cgi,cell_id,new_id,azimuth,cell_type,territory_name

ATEBSC2,AC0139B,0139,0139LA_PALM,-0.14072,5.56353,Operational,Greater Accra 
Region,LA DADE-KOTOPON 
MUNICIPAL,LA_PALM,LABADI,62001-152-1392,1392,2332401392,60,2G,ACCRA METRO MAIN

 

Order after converting using CSVRecordSetWriter- 

cgi,latitude,territory_name,azimuth,cell_type,cell_id,longitude,cell_name,area,new_id,districts_216,site_name,town_city,bsc,site_id,region,status

62001-152-1392,5.56353,ACCRA METRO 
MAIN,60,2G,1392,-0.14072,AC0139B,LA_PALM,2332401392,LA DADE-KOTOPON 
MUNICIPAL,0139LA_PALM,LABADI,ATEBSC2,0139,Greater Accra Region,Operational

 

Is there any way to maintain the order of the record? 

 

Thanks,

Mohit

 

From: Mark Payne <marka...@hotmail.com <mailto:marka...@hotmail.com> > 
Sent: 02 April 2018 20:23


To: users@nifi.apache.org <mailto:users@nifi.apache.org> 
Subject: Re: ConvertCSVToAvro taking a lot of time when passing single record 
as an input.

 

Mohit,

 

I see. I think this is an issue because the Avro Writer expects that the data 
must be in the proper schema,

or else it will throw an Exception when trying to write the data. To address 
this, we should update ValidateRecord

to support a different Record Writer to use for valid data vs. invalid data. 
There already is a JIRA [1] for this improvement.

 

In the meantime, it probably makes sense to use a CSV Reader and a CSV Writer 
for the Validate Record processor,

then use ConvertRecord only for the valid records. Or, since you're running 
into this issue it may make sense for your

use case to continue on with the ConvertCSVToAvro processor for now. But 
splitting the records up to run against that

Processor may result in lower performance, as you've noted.

 

Thanks

-Mark

 

[1] https://issues.apache.org/jira/browse/NIFI-4883

 

 

On Apr 2, 2018, at 10:26 AM, Mohit <mohit.j...@open-insights.co.in 
<mailto:mohit.j...@open-insights.co.in> > wrote:

 

Mark,

 

Error:- 

ValidateRecord[id=5a9c3616-ab7c-17c1--e6c2fc5d] 
ValidateRecord[id=5a9c3616-ab7c-17c1--e6c2fc5d] failed to process due 
to org.apache.nifi.serialization.record.util.IllegalTypeConversionException: 
Cannot convert value mohit of type class java.lang.String because no compatible 
types exist in the UNION for field name; rolling back session: Cannot convert 
value mohit of type class java.lang.String because no compatible types exist in 
the UNION for field name

 

I have a file with only one record :-  m

RE: ConvertCSVToAvro taking a lot of time when passing single record as an input.

2018-04-03 Thread Mohit
Pierre,

Thanks for the information. This would be really helpful if Nifi 1.6.0 releases 
this week. There is a lot of pending tasks dependent on it. 

 

Mohit

 

From: Pierre Villard <pierre.villard...@gmail.com> 
Sent: 03 April 2018 18:03
To: users@nifi.apache.org
Subject: Re: ConvertCSVToAvro taking a lot of time when passing single record 
as an input.

 

Hi Mohit,

This has been fixed with https://issues.apache.org/jira/browse/NIFI-4955.

Besides, what Marks suggested with NIFI-4883 
<https://issues.apache.org/jira/browse/NIFI-4883>  is now merged in master.

Both will be in NiFi 1.6.0 to be released (hopefully this week).

Pierre

 

 

2018-04-03 12:37 GMT+02:00 Mohit <mohit.j...@open-insights.co.in 
<mailto:mohit.j...@open-insights.co.in> >:

Hi,

I am using the ValidateRecord processor, but seems like it change the order of 
the data. When I set the property Include Header Line to true, I found that the 
record isn’t corrupted but the order is varied.

 

For example- 

 

Actual order –

bsc,cell_name,site_id,site_name,longitude,latitude,status,region,districts_216,area,town_city,cgi,cell_id,new_id,azimuth,cell_type,territory_name

ATEBSC2,AC0139B,0139,0139LA_PALM,-0.14072,5.56353,Operational,Greater Accra 
Region,LA DADE-KOTOPON 
MUNICIPAL,LA_PALM,LABADI,62001-152-1392,1392,2332401392,60,2G,ACCRA METRO MAIN

 

Order after converting using CSVRecordSetWriter- 

cgi,latitude,territory_name,azimuth,cell_type,cell_id,longitude,cell_name,area,new_id,districts_216,site_name,town_city,bsc,site_id,region,status

62001-152-1392,5.56353,ACCRA METRO 
MAIN,60,2G,1392,-0.14072,AC0139B,LA_PALM,2332401392,LA DADE-KOTOPON 
MUNICIPAL,0139LA_PALM,LABADI,ATEBSC2,0139,Greater Accra Region,Operational

 

Is there any way to maintain the order of the record? 

 

Thanks,

Mohit

 

From: Mark Payne <marka...@hotmail.com <mailto:marka...@hotmail.com> > 
Sent: 02 April 2018 20:23


To: users@nifi.apache.org <mailto:users@nifi.apache.org> 
Subject: Re: ConvertCSVToAvro taking a lot of time when passing single record 
as an input.

 

Mohit,

 

I see. I think this is an issue because the Avro Writer expects that the data 
must be in the proper schema,

or else it will throw an Exception when trying to write the data. To address 
this, we should update ValidateRecord

to support a different Record Writer to use for valid data vs. invalid data. 
There already is a JIRA [1] for this improvement.

 

In the meantime, it probably makes sense to use a CSV Reader and a CSV Writer 
for the Validate Record processor,

then use ConvertRecord only for the valid records. Or, since you're running 
into this issue it may make sense for your

use case to continue on with the ConvertCSVToAvro processor for now. But 
splitting the records up to run against that

Processor may result in lower performance, as you've noted.

 

Thanks

-Mark

 

[1] https://issues.apache.org/jira/browse/NIFI-4883

 

 

On Apr 2, 2018, at 10:26 AM, Mohit <mohit.j...@open-insights.co.in 
<mailto:mohit.j...@open-insights.co.in> > wrote:

 

Mark,

 

Error:- 

ValidateRecord[id=5a9c3616-ab7c-17c1--e6c2fc5d] 
ValidateRecord[id=5a9c3616-ab7c-17c1--e6c2fc5d] failed to process due 
to org.apache.nifi.serialization.record.util.IllegalTypeConversionException: 
Cannot convert value mohit of type class java.lang.String because no compatible 
types exist in the UNION for field name; rolling back session: Cannot convert 
value mohit of type class java.lang.String because no compatible types exist in 
the UNION for field name

 

I have a file with only one record :-  mohit,25

Just to check how it works, I’ve given incorrect schema: (int for string field)

{"type":"record","name":"test","namespace":"test","fields":[{"name":"name","type":["null","int"],"default":null},{"name":"age","type":["null","string"],"default":null}]}

 

It doesn’t pass the record to invalid relationship. But it keeps the file in 
the queue prior to validateRecord processor.

 

Mohit

 

 

From: Mark Payne <marka...@hotmail.com <mailto:marka...@hotmail.com> > 
Sent: 02 April 2018 19:53
To: users@nifi.apache.org <mailto:users@nifi.apache.org> 
Subject: Re: ConvertCSVToAvro taking a lot of time when passing single record 
as an input.

 

What is the error that you're seeing? 

 





On Apr 2, 2018, at 10:22 AM, Mohit < <mailto:mohit.j...@open-insights.co.in> 
mohit.j...@open-insights.co.in> wrote:

 

Hi Mark, 

 

I tried the ValidateRecord processor, it is converting the flowfile if it is 
valid. But If the records are not valid, it is passing to the invalid 
relationship. Instead it keeps on throwing bulletins keeping the flowfile in 
the queue.

 

Any suggestion?

 

Mohit

 

From: Mark Payne

RE: ConvertCSVToAvro taking a lot of time when passing single record as an input.

2018-04-03 Thread Mohit
Hi,

I am using the ValidateRecord processor, but seems like it change the order of 
the data. When I set the property Include Header Line to true, I found that the 
record isn’t corrupted but the order is varied.

 

For example- 

 

Actual order –

bsc,cell_name,site_id,site_name,longitude,latitude,status,region,districts_216,area,town_city,cgi,cell_id,new_id,azimuth,cell_type,territory_name

ATEBSC2,AC0139B,0139,0139LA_PALM,-0.14072,5.56353,Operational,Greater Accra 
Region,LA DADE-KOTOPON 
MUNICIPAL,LA_PALM,LABADI,62001-152-1392,1392,2332401392,60,2G,ACCRA METRO MAIN

 

Order after converting using CSVRecordSetWriter- 

cgi,latitude,territory_name,azimuth,cell_type,cell_id,longitude,cell_name,area,new_id,districts_216,site_name,town_city,bsc,site_id,region,status

62001-152-1392,5.56353,ACCRA METRO 
MAIN,60,2G,1392,-0.14072,AC0139B,LA_PALM,2332401392,LA DADE-KOTOPON 
MUNICIPAL,0139LA_PALM,LABADI,ATEBSC2,0139,Greater Accra Region,Operational

 

Is there any way to maintain the order of the record? 

 

Thanks,

Mohit

 

From: Mark Payne <marka...@hotmail.com> 
Sent: 02 April 2018 20:23
To: users@nifi.apache.org
Subject: Re: ConvertCSVToAvro taking a lot of time when passing single record 
as an input.

 

Mohit,

 

I see. I think this is an issue because the Avro Writer expects that the data 
must be in the proper schema,

or else it will throw an Exception when trying to write the data. To address 
this, we should update ValidateRecord

to support a different Record Writer to use for valid data vs. invalid data. 
There already is a JIRA [1] for this improvement.

 

In the meantime, it probably makes sense to use a CSV Reader and a CSV Writer 
for the Validate Record processor,

then use ConvertRecord only for the valid records. Or, since you're running 
into this issue it may make sense for your

use case to continue on with the ConvertCSVToAvro processor for now. But 
splitting the records up to run against that

Processor may result in lower performance, as you've noted.

 

Thanks

-Mark

 

[1] https://issues.apache.org/jira/browse/NIFI-4883

 





On Apr 2, 2018, at 10:26 AM, Mohit <mohit.j...@open-insights.co.in 
<mailto:mohit.j...@open-insights.co.in> > wrote:

 

Mark,

 

Error:- 

ValidateRecord[id=5a9c3616-ab7c-17c1--e6c2fc5d] 
ValidateRecord[id=5a9c3616-ab7c-17c1--e6c2fc5d] failed to process due 
to org.apache.nifi.serialization.record.util.IllegalTypeConversionException: 
Cannot convert value mohit of type class java.lang.String because no compatible 
types exist in the UNION for field name; rolling back session: Cannot convert 
value mohit of type class java.lang.String because no compatible types exist in 
the UNION for field name

 

I have a file with only one record :-  mohit,25

Just to check how it works, I’ve given incorrect schema: (int for string field)

{"type":"record","name":"test","namespace":"test","fields":[{"name":"name","type":["null","int"],"default":null},{"name":"age","type":["null","string"],"default":null}]}

 

It doesn’t pass the record to invalid relationship. But it keeps the file in 
the queue prior to validateRecord processor.

 

Mohit

 

 

From: Mark Payne <marka...@hotmail.com <mailto:marka...@hotmail.com> > 
Sent: 02 April 2018 19:53
To: users@nifi.apache.org <mailto:users@nifi.apache.org> 
Subject: Re: ConvertCSVToAvro taking a lot of time when passing single record 
as an input.

 

What is the error that you're seeing? 

 






On Apr 2, 2018, at 10:22 AM, Mohit < <mailto:mohit.j...@open-insights.co.in> 
mohit.j...@open-insights.co.in> wrote:

 

Hi Mark, 

 

I tried the ValidateRecord processor, it is converting the flowfile if it is 
valid. But If the records are not valid, it is passing to the invalid 
relationship. Instead it keeps on throwing bulletins keeping the flowfile in 
the queue.

 

Any suggestion?

 

Mohit

 

From: Mark Payne < <mailto:marka...@hotmail.com> marka...@hotmail.com> 
Sent: 02 April 2018 19:02
To:  <mailto:users@nifi.apache.org> users@nifi.apache.org
Subject: Re: ConvertCSVToAvro taking a lot of time when passing single record 
as an input.

 

Mohit,

 

You can certainly dial back that number of Concurrent Tasks. Setting that to 
something like

10 is a pretty big number. Setting it to a thousand means that you'll likely 
starve out other

processors that are waiting on a thread and will generally perform a lot worse 
because you have

1,000 different threads competing with each other to try to pull the next 
FlowFile.

 

You can use the ValidateRecord processor and configure a schema that indicates 
what you expect

the data to look like. Then you can route any invalid records to one route and 
valid records to another

route. This will ensure

RE: ConvertCSVToAvro taking a lot of time when passing single record as an input.

2018-04-02 Thread Mohit
Mark,

 

Error:- 

ValidateRecord[id=5a9c3616-ab7c-17c1--e6c2fc5d] 
ValidateRecord[id=5a9c3616-ab7c-17c1--e6c2fc5d] failed to process due 
to org.apache.nifi.serialization.record.util.IllegalTypeConversionException: 
Cannot convert value mohit of type class java.lang.String because no compatible 
types exist in the UNION for field name; rolling back session: Cannot convert 
value mohit of type class java.lang.String because no compatible types exist in 
the UNION for field name

 

I have a file with only one record :-  mohit,25

Just to check how it works, I’ve given incorrect schema: (int for string field)

{"type":"record","name":"test","namespace":"test","fields":[{"name":"name","type":["null","int"],"default":null},{"name":"age","type":["null","string"],"default":null}]}

 

It doesn’t pass the record to invalid relationship. But it keeps the file in 
the queue prior to validateRecord processor.

 

Mohit

 

 

From: Mark Payne <marka...@hotmail.com> 
Sent: 02 April 2018 19:53
To: users@nifi.apache.org
Subject: Re: ConvertCSVToAvro taking a lot of time when passing single record 
as an input.

 

What is the error that you're seeing? 

 





On Apr 2, 2018, at 10:22 AM, Mohit <mohit.j...@open-insights.co.in 
<mailto:mohit.j...@open-insights.co.in> > wrote:

 

Hi Mark, 

 

I tried the ValidateRecord processor, it is converting the flowfile if it is 
valid. But If the records are not valid, it is passing to the invalid 
relationship. Instead it keeps on throwing bulletins keeping the flowfile in 
the queue.

 

Any suggestion?

 

Mohit

 

From: Mark Payne <marka...@hotmail.com <mailto:marka...@hotmail.com> > 
Sent: 02 April 2018 19:02
To: users@nifi.apache.org <mailto:users@nifi.apache.org> 
Subject: Re: ConvertCSVToAvro taking a lot of time when passing single record 
as an input.

 

Mohit,

 

You can certainly dial back that number of Concurrent Tasks. Setting that to 
something like

10 is a pretty big number. Setting it to a thousand means that you'll likely 
starve out other

processors that are waiting on a thread and will generally perform a lot worse 
because you have

1,000 different threads competing with each other to try to pull the next 
FlowFile.

 

You can use the ValidateRecord processor and configure a schema that indicates 
what you expect

the data to look like. Then you can route any invalid records to one route and 
valid records to another

route. This will ensure that all data that goes to the 'valid' relationship is 
routed one way and any other

data is routed to the 'invalid' relationship.

 

Thanks

-Mark

 

 






On Apr 2, 2018, at 9:22 AM, Mohit < <mailto:mohit.j...@open-insights.co.in> 
mohit.j...@open-insights.co.in> wrote:

 

Hi Mark,

 

The main intention to use such flow is to track bad records. The records which 
doesn’t get converted should be tracked somewhere. For that purpose I’m using 
Split-Merge approach.

 

Meanwhile, I’m able to improve the performance by increasing the ‘Concurrent 
Tasks’ to 1000.  Now ConvertCSVToAvro is able to convert 6-7k per second, which 
though not optimum but quite better than 45-50 records per seconds. 

 

Is there any other improvement I can do?

 

Mohit

 

From: Mark Payne < <mailto:marka...@hotmail.com> marka...@hotmail.com> 
Sent: 02 April 2018 18:30
To:  <mailto:users@nifi.apache.org> users@nifi.apache.org
Subject: Re: ConvertCSVToAvro taking a lot of time when passing single record 
as an input.

 

Mohit, 

 

I agree that 45-50 records per second is quite slow. I'm not very familiar with 
the implementation of

ConvertCSVToAvro, but it may well be that it must perform some sort of 
initialization for each FlowFile

that it receives, which would explain why it's fast for a single incoming 
FlowFile and slow for a large number.

 

Additionally, when you start splitting the data like that, you're generating a 
lot more FlowFiles, which means

a lot more updates to both the FlowFile Repository and the Provenance 
Repository. As a result, you're basically

taxing the NiFi framework far more than if you keep the data as a single 
FlowFile. On my laptop, though, I would

expect more than 45-50 FlowFiles per second through most processors, but I 
don't know what kind of hardware

you are running on.

 

In general, though, it is best to keep data together instead of splitting it 
apart. Since the ConvertCSVToAvro can

handle many CSV records, is there a reason to split the data to begin with? 
Also, I would recommend you look

at using the Record-based processors [1][2] such as ConvertRecord instead of 
the ConvertABCtoXYZ processors, as

those are older processors and often don't work as well and the Record-oriented 
processors often allow you 

RE: ConvertCSVToAvro taking a lot of time when passing single record as an input.

2018-04-02 Thread Mohit
Hi Mark, 

 

I tried the ValidateRecord processor, it is converting the flowfile if it is 
valid. But If the records are not valid, it is passing to the invalid 
relationship. Instead it keeps on throwing bulletins keeping the flowfile in 
the queue.

 

Any suggestion?

 

Mohit

 

From: Mark Payne <marka...@hotmail.com> 
Sent: 02 April 2018 19:02
To: users@nifi.apache.org
Subject: Re: ConvertCSVToAvro taking a lot of time when passing single record 
as an input.

 

Mohit,

 

You can certainly dial back that number of Concurrent Tasks. Setting that to 
something like

10 is a pretty big number. Setting it to a thousand means that you'll likely 
starve out other

processors that are waiting on a thread and will generally perform a lot worse 
because you have

1,000 different threads competing with each other to try to pull the next 
FlowFile.

 

You can use the ValidateRecord processor and configure a schema that indicates 
what you expect

the data to look like. Then you can route any invalid records to one route and 
valid records to another

route. This will ensure that all data that goes to the 'valid' relationship is 
routed one way and any other

data is routed to the 'invalid' relationship.

 

Thanks

-Mark

 

 





On Apr 2, 2018, at 9:22 AM, Mohit <mohit.j...@open-insights.co.in 
<mailto:mohit.j...@open-insights.co.in> > wrote:

 

Hi Mark,

 

The main intention to use such flow is to track bad records. The records which 
doesn’t get converted should be tracked somewhere. For that purpose I’m using 
Split-Merge approach.

 

Meanwhile, I’m able to improve the performance by increasing the ‘Concurrent 
Tasks’ to 1000.  Now ConvertCSVToAvro is able to convert 6-7k per second, which 
though not optimum but quite better than 45-50 records per seconds. 

 

Is there any other improvement I can do?

 

Mohit

 

From: Mark Payne <marka...@hotmail.com <mailto:marka...@hotmail.com> > 
Sent: 02 April 2018 18:30
To: users@nifi.apache.org <mailto:users@nifi.apache.org> 
Subject: Re: ConvertCSVToAvro taking a lot of time when passing single record 
as an input.

 

Mohit, 

 

I agree that 45-50 records per second is quite slow. I'm not very familiar with 
the implementation of

ConvertCSVToAvro, but it may well be that it must perform some sort of 
initialization for each FlowFile

that it receives, which would explain why it's fast for a single incoming 
FlowFile and slow for a large number.

 

Additionally, when you start splitting the data like that, you're generating a 
lot more FlowFiles, which means

a lot more updates to both the FlowFile Repository and the Provenance 
Repository. As a result, you're basically

taxing the NiFi framework far more than if you keep the data as a single 
FlowFile. On my laptop, though, I would

expect more than 45-50 FlowFiles per second through most processors, but I 
don't know what kind of hardware

you are running on.

 

In general, though, it is best to keep data together instead of splitting it 
apart. Since the ConvertCSVToAvro can

handle many CSV records, is there a reason to split the data to begin with? 
Also, I would recommend you look

at using the Record-based processors [1][2] such as ConvertRecord instead of 
the ConvertABCtoXYZ processors, as

those are older processors and often don't work as well and the Record-oriented 
processors often allow you to keep

data together as a single FlowFile throughout your entire flow, which makes the 
performance far better and makes the

flow much easier to design.

 

Thanks

-Mark

 

 

 

[1]  <https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi> 
https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi

[2]  
<https://bryanbende.com/development/2017/06/20/apache-nifi-records-and-schema-registries>
 
https://bryanbende.com/development/2017/06/20/apache-nifi-records-and-schema-registries

 






On Apr 2, 2018, at 8:49 AM, Mohit < <mailto:mohit.j...@open-insights.co.in> 
mohit.j...@open-insights.co.in> wrote:

 

Hi,

 

I’m trying to capture bad records from ConvertCSVToAvro processor. For that, 
I’m using two SplitText processors in a row to create chunks and then each 
record per flow file.

 

My flow is  - ListFile -> FetchFile -> SplitText(1 records) -> SplitText(1 
record) -> ConvertCSVToAvro -> *(futher processing)

 

I have a 10 MB file with 15 columns per row and 64000 records. Normal flow 
(without SplitText) completes in few seconds. But when I’m using the above 
flow, ConvertCSVToAvro processor works drastically slow(45-50 rec/sec).

I’m not able to conclude where I’m doing wrong in the flow. 

 

I’m using Nifi 1.5.0 .

 

Any quick input would be appreciated.

 

 

 

Thanks,

Mohit

 



RE: ConvertCSVToAvro taking a lot of time when passing single record as an input.

2018-04-02 Thread Mohit
Mark,

 

Agree with you. I’ll try the ValidateRecord Processor, if it serve the purpose.

 

Thanks,

Mohit

 

From: Mark Payne <marka...@hotmail.com> 
Sent: 02 April 2018 19:02
To: users@nifi.apache.org
Subject: Re: ConvertCSVToAvro taking a lot of time when passing single record 
as an input.

 

Mohit,

 

You can certainly dial back that number of Concurrent Tasks. Setting that to 
something like

10 is a pretty big number. Setting it to a thousand means that you'll likely 
starve out other

processors that are waiting on a thread and will generally perform a lot worse 
because you have

1,000 different threads competing with each other to try to pull the next 
FlowFile.

 

You can use the ValidateRecord processor and configure a schema that indicates 
what you expect

the data to look like. Then you can route any invalid records to one route and 
valid records to another

route. This will ensure that all data that goes to the 'valid' relationship is 
routed one way and any other

data is routed to the 'invalid' relationship.

 

Thanks

-Mark

 

 





On Apr 2, 2018, at 9:22 AM, Mohit <mohit.j...@open-insights.co.in 
<mailto:mohit.j...@open-insights.co.in> > wrote:

 

Hi Mark,

 

The main intention to use such flow is to track bad records. The records which 
doesn’t get converted should be tracked somewhere. For that purpose I’m using 
Split-Merge approach.

 

Meanwhile, I’m able to improve the performance by increasing the ‘Concurrent 
Tasks’ to 1000.  Now ConvertCSVToAvro is able to convert 6-7k per second, which 
though not optimum but quite better than 45-50 records per seconds. 

 

Is there any other improvement I can do?

 

Mohit

 

From: Mark Payne <marka...@hotmail.com <mailto:marka...@hotmail.com> > 
Sent: 02 April 2018 18:30
To: users@nifi.apache.org <mailto:users@nifi.apache.org> 
Subject: Re: ConvertCSVToAvro taking a lot of time when passing single record 
as an input.

 

Mohit, 

 

I agree that 45-50 records per second is quite slow. I'm not very familiar with 
the implementation of

ConvertCSVToAvro, but it may well be that it must perform some sort of 
initialization for each FlowFile

that it receives, which would explain why it's fast for a single incoming 
FlowFile and slow for a large number.

 

Additionally, when you start splitting the data like that, you're generating a 
lot more FlowFiles, which means

a lot more updates to both the FlowFile Repository and the Provenance 
Repository. As a result, you're basically

taxing the NiFi framework far more than if you keep the data as a single 
FlowFile. On my laptop, though, I would

expect more than 45-50 FlowFiles per second through most processors, but I 
don't know what kind of hardware

you are running on.

 

In general, though, it is best to keep data together instead of splitting it 
apart. Since the ConvertCSVToAvro can

handle many CSV records, is there a reason to split the data to begin with? 
Also, I would recommend you look

at using the Record-based processors [1][2] such as ConvertRecord instead of 
the ConvertABCtoXYZ processors, as

those are older processors and often don't work as well and the Record-oriented 
processors often allow you to keep

data together as a single FlowFile throughout your entire flow, which makes the 
performance far better and makes the

flow much easier to design.

 

Thanks

-Mark

 

 

 

[1]  <https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi> 
https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi

[2]  
<https://bryanbende.com/development/2017/06/20/apache-nifi-records-and-schema-registries>
 
https://bryanbende.com/development/2017/06/20/apache-nifi-records-and-schema-registries

 






On Apr 2, 2018, at 8:49 AM, Mohit < <mailto:mohit.j...@open-insights.co.in> 
mohit.j...@open-insights.co.in> wrote:

 

Hi,

 

I’m trying to capture bad records from ConvertCSVToAvro processor. For that, 
I’m using two SplitText processors in a row to create chunks and then each 
record per flow file.

 

My flow is  - ListFile -> FetchFile -> SplitText(1 records) -> SplitText(1 
record) -> ConvertCSVToAvro -> *(futher processing)

 

I have a 10 MB file with 15 columns per row and 64000 records. Normal flow 
(without SplitText) completes in few seconds. But when I’m using the above 
flow, ConvertCSVToAvro processor works drastically slow(45-50 rec/sec).

I’m not able to conclude where I’m doing wrong in the flow. 

 

I’m using Nifi 1.5.0 .

 

Any quick input would be appreciated.

 

 

 

Thanks,

Mohit

 



RE: ConvertCSVToAvro taking a lot of time when passing single record as an input.

2018-04-02 Thread Mohit
Hi Mark,

 

The main intention to use such flow is to track bad records. The records which 
doesn’t get converted should be tracked somewhere. For that purpose I’m using 
Split-Merge approach.

 

Meanwhile, I’m able to improve the performance by increasing the ‘Concurrent 
Tasks’ to 1000.  Now ConvertCSVToAvro is able to convert 6-7k per second, which 
though not optimum but quite better than 45-50 records per seconds. 

 

Is there any other improvement I can do?

 

Mohit

 

From: Mark Payne <marka...@hotmail.com> 
Sent: 02 April 2018 18:30
To: users@nifi.apache.org
Subject: Re: ConvertCSVToAvro taking a lot of time when passing single record 
as an input.

 

Mohit, 

 

I agree that 45-50 records per second is quite slow. I'm not very familiar with 
the implementation of

ConvertCSVToAvro, but it may well be that it must perform some sort of 
initialization for each FlowFile

that it receives, which would explain why it's fast for a single incoming 
FlowFile and slow for a large number.

 

Additionally, when you start splitting the data like that, you're generating a 
lot more FlowFiles, which means

a lot more updates to both the FlowFile Repository and the Provenance 
Repository. As a result, you're basically

taxing the NiFi framework far more than if you keep the data as a single 
FlowFile. On my laptop, though, I would

expect more than 45-50 FlowFiles per second through most processors, but I 
don't know what kind of hardware

you are running on.

 

In general, though, it is best to keep data together instead of splitting it 
apart. Since the ConvertCSVToAvro can

handle many CSV records, is there a reason to split the data to begin with? 
Also, I would recommend you look

at using the Record-based processors [1][2] such as ConvertRecord instead of 
the ConvertABCtoXYZ processors, as

those are older processors and often don't work as well and the Record-oriented 
processors often allow you to keep

data together as a single FlowFile throughout your entire flow, which makes the 
performance far better and makes the

flow much easier to design.

 

Thanks

-Mark

 

 

 

[1] https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi

[2] 
https://bryanbende.com/development/2017/06/20/apache-nifi-records-and-schema-registries

 





On Apr 2, 2018, at 8:49 AM, Mohit <mohit.j...@open-insights.co.in 
<mailto:mohit.j...@open-insights.co.in> > wrote:

 

Hi,

 

I’m trying to capture bad records from ConvertCSVToAvro processor. For that, 
I’m using two SplitText processors in a row to create chunks and then each 
record per flow file.

 

My flow is  - ListFile -> FetchFile -> SplitText(1 records) -> SplitText(1 
record) -> ConvertCSVToAvro -> *(futher processing)

 

I have a 10 MB file with 15 columns per row and 64000 records. Normal flow 
(without SplitText) completes in few seconds. But when I’m using the above 
flow, ConvertCSVToAvro processor works drastically slow(45-50 rec/sec).

I’m not able to conclude where I’m doing wrong in the flow. 

 

I’m using Nifi 1.5.0 .

 

Any quick input would be appreciated.

 

 

 

Thanks,

Mohit

 



ConvertCSVToAvro taking a lot of time when passing single record as an input.

2018-04-02 Thread Mohit
Hi,

 

I'm trying to capture bad records from ConvertCSVToAvro processor. For that,
I'm using two SplitText processors in a row to create chunks and then each
record per flow file.

 

My flow is  - ListFile -> FetchFile -> SplitText(1 records) ->
SplitText(1 record) -> ConvertCSVToAvro -> *(futher processing)

 

I have a 10 MB file with 15 columns per row and 64000 records. Normal flow
(without SplitText) completes in few seconds. But when I'm using the above
flow, ConvertCSVToAvro processor works drastically slow(45-50 rec/sec).

I'm not able to conclude where I'm doing wrong in the flow. 

 

I'm using Nifi 1.5.0 .

 

Any quick input would be appreciated.

 

 

 

Thanks,

Mohit