Re: CSV record parsing with custom date formats

2018-02-13 Thread Bryan Bende
As a possible work around, there were date functions added to record path
in 1.5.0, so if you had a schema that treated the field as a string, you
could reformat the column in place using UpdateRecord to get it into
whatever format it needs to be in.

On Tue, Feb 13, 2018 at 9:17 PM Koji Kawamura 
wrote:

> Hi Derek,
>
> By looking at the code briefly, I guess you are using ValidateRecord
> processor with CSVReader and AvroWriter..
> As you pointed out, it seems DataTypeUtils.isCompatibleDataType does
> not use the date format user defined at CSVReader.
>
> Is it possible for you to share followings for us to reproduce and
> understand it better?
> - Sample input CSV file
> - NiFi flow template using CSVReader and AvroWriter
>
> Thanks,
> Koji
>
> On Wed, Feb 14, 2018 at 7:11 AM, Derek Straka  wrote:
> > I have a question about the expected behavior of convertSimpleIfPossible
> in
> > CSVRecordReader.java (NiFi 1.5.0).
> >
> > I have a custom CSV file that I am taking to an avro schema using
> > ValidateRecord.  The schema contains a logical date type and the CSV has
> > the date in the format MM/DD/.  I expected to provide the date string
> > in the controller element for the CSV reader and have everything parse
> > happily, but it ends up throwing an exception when it tries to parse
> things
> > in the avro writer (String->Date).  I don't think I should be blaming the
> > avro writer because I expected the CSV reader to parse the date for me.
> >
> > I did a little digging in the CSVRecordReader.java, and I see everything
> > flows through convertSimpleIfPossible when parsing the data, and each
> data
> > type is checked with DataTypeUtils.isCompatibleDataType prior to actually
> > trying to perform the conversion.
> >
> > The date string doesn't use the user provided format in the call to
> > DataTypeUtils.isCompatibleDataType, but instead uses the default for date
> > types.  The validation ends up failing when it uses the default date
> string
> > (-MM-DD), so it won't use LAZY_DATE_FORMAT as I expected.  Am I
> totally
> > off base, or it this unexpected behavior?
> >
> > Thanks.
> >
> > -Derek
>
-- 
Sent from Gmail Mobile


Re: CSV record parsing with custom date formats

2018-02-13 Thread Koji Kawamura
Hi Derek,

By looking at the code briefly, I guess you are using ValidateRecord
processor with CSVReader and AvroWriter..
As you pointed out, it seems DataTypeUtils.isCompatibleDataType does
not use the date format user defined at CSVReader.

Is it possible for you to share followings for us to reproduce and
understand it better?
- Sample input CSV file
- NiFi flow template using CSVReader and AvroWriter

Thanks,
Koji

On Wed, Feb 14, 2018 at 7:11 AM, Derek Straka  wrote:
> I have a question about the expected behavior of convertSimpleIfPossible in
> CSVRecordReader.java (NiFi 1.5.0).
>
> I have a custom CSV file that I am taking to an avro schema using
> ValidateRecord.  The schema contains a logical date type and the CSV has
> the date in the format MM/DD/.  I expected to provide the date string
> in the controller element for the CSV reader and have everything parse
> happily, but it ends up throwing an exception when it tries to parse things
> in the avro writer (String->Date).  I don't think I should be blaming the
> avro writer because I expected the CSV reader to parse the date for me.
>
> I did a little digging in the CSVRecordReader.java, and I see everything
> flows through convertSimpleIfPossible when parsing the data, and each data
> type is checked with DataTypeUtils.isCompatibleDataType prior to actually
> trying to perform the conversion.
>
> The date string doesn't use the user provided format in the call to
> DataTypeUtils.isCompatibleDataType, but instead uses the default for date
> types.  The validation ends up failing when it uses the default date string
> (-MM-DD), so it won't use LAZY_DATE_FORMAT as I expected.  Am I totally
> off base, or it this unexpected behavior?
>
> Thanks.
>
> -Derek


CSV record parsing with custom date formats

2018-02-13 Thread Derek Straka
I have a question about the expected behavior of convertSimpleIfPossible in
CSVRecordReader.java (NiFi 1.5.0).

I have a custom CSV file that I am taking to an avro schema using
ValidateRecord.  The schema contains a logical date type and the CSV has
the date in the format MM/DD/.  I expected to provide the date string
in the controller element for the CSV reader and have everything parse
happily, but it ends up throwing an exception when it tries to parse things
in the avro writer (String->Date).  I don't think I should be blaming the
avro writer because I expected the CSV reader to parse the date for me.

I did a little digging in the CSVRecordReader.java, and I see everything
flows through convertSimpleIfPossible when parsing the data, and each data
type is checked with DataTypeUtils.isCompatibleDataType prior to actually
trying to perform the conversion.

The date string doesn't use the user provided format in the call to
DataTypeUtils.isCompatibleDataType, but instead uses the default for date
types.  The validation ends up failing when it uses the default date string
(-MM-DD), so it won't use LAZY_DATE_FORMAT as I expected.  Am I totally
off base, or it this unexpected behavior?

Thanks.

-Derek


Policies and access to core attributes

2018-02-13 Thread Mark Bean
In order to use Provenance to view details of a flow file, a user must
belong to the 'view the data' policy for a given component(s) along the
flow. For example, the lineage graph will show "UNKNOWN" for any component
which the user does not possess 'view the data'. Not only can the user not
identify which processor this is, but the user cannot view even core
attributes of the flowfile such as flowfile UUID either.

We use a custom authorizer which may restrict a user from 'view the data'
based on certain flowfile attribute(s). This creates a situation where the
NiFi Admins can potentially lose insight to the flow of data through the
system. An Admin can see that a given flowfile traversed X-number of
components, but cannot identify what components they were nor where the
flowfile ultimately was delivered.

It is necessary to maintain the ability to restrict even an Admin from
seeing flowfile content and user-defined attributes. However, it would be
highly desirable for the Admins to be able to view flowfile core attributes
throughout the flow. The information presented on the Details tab of a
Provenance event would suffice.

Can the information on this tab be separated from the 'view the data'
policy? Likely, this means creating a new policy type which does not
currently exist.

Comments/suggestions?

Thanks,
Mark


Re: Will you accept contributions in Scala?

2018-02-13 Thread Russell Bateman
I completely second Mike's sentiment here as well as cautionary 
statements by other contributors over this thread's history (Matt 
Burgess' post in particular). Besides fueling flame wars and religious 
inquisition, you cannot open the door to Scala without opening it to 
other (JVM languages at least) and there's no umbrella that will shield 
you from the chaos doing so will unleash rendering NiFi difficult to 
develop to (and maybe even to deploy). As it is presently, all 
JVM-language developers are able to consume NiFi easily for development 
and to deal with their own deployment issues.


Russ

On 02/13/2018 10:06 AM, Mike Thomsen wrote:

Milan,

I don't think you can do that without creating a lot of fuel for a flame
war. I have personally never met a Scala developer who was incapable of
writing decent Java. The same is true of Groovy. I think anyone who finds
the requirement to use Java over their language of choice to be a deal
breaker on contributing is probably someone unlikely to be more help than
trouble anyway.

Mike

On Tue, Feb 13, 2018 at 11:35 AM, Milan Das  wrote:


I think we should not add blindly any language but should be open to add
couple of language like Scala.
In Bigdata world Scala/Python/Java are widely accepted.

Thanks,
Milan Das
Interset

On 2/13/18, 10:20 AM, "Weiss, Adam"  wrote:

 I think it makes the most sense to me for us to publish a separate
repo with a module and nar build for now and post when it's available in
the users group.

 Thanks for the discussion everyone, hopefully we can start making some
helpful contributions soon.

 -Adam


 On 2018/02/10 23:43:31, Tony Kurc > wrote:
 > It is like Matt read my mind.>
 >
 > On Sat, Feb 10, 2018 at 6:26 PM, Matt Burgess > wrote:>
 >
 > > I'm fine with a vote, but I'll be voting to keep Java as the
single>
 > > language for the (non-test) code. I share the same concerns as
many of>
 > > the other folks as far as accepting other languages, it's mainly
the>
 > > "slippery slope" argument that I don't want to turn into a>
 > > JVM-language flame war.  If Scala, why not Groovy? Certainly the>
 > > syntax is closer to Java, and the community has accepted it as a
valid>
 > > language for writing unit tests, although we stopped short for>
 > > allowing it for the deployable NiFi codebase, for the same reasons>
 > > IIRC.  If Scala and/or Groovy, why not Kotlin?  The same argument>
 > > (albeit more tenuous) goes for Clojure and just about every other
JVM>
 > > language (although I don't expect a call for LuaJ processors lol).>
 > >>
 > > Whether we decide to support various languages ad-hoc or not, I
would>
 > > strenuously object to multiple/hybrid build systems for the
deployed>
 > > artifacts. If I could switch NiFi completely to Gradle I would,
but I>
 > > realize there are good reasons for not doing so (yet?) in the
Apache>
 > > NiFi community, and I would never want any hybrid Maven/Gradle
build>
 > > for the deployable code, likewise for SBT, Leiningen, etc. With a>
 > > custom Mojo for Maven NAR builds, and the complexity for hybrid
builds>
 > > in general, I think this would create a maintenance nightmare.>
 > >>
 > > The language thing is a tough decision though, it's not awesome
that>
 > > specifying a single language can be a barrier to a more diverse>
 > > community, certainly Scala-based bundles would be more than
welcome in>
 > > the overall NiFi ecosystem, I just think the cons outweigh the
pros>
 > > for the baseline code. I've written Groovy processors/NARs using>
 > > Gradle as the build system, and I'm good with keeping them in my
own>
 > > repo, especially when the Extension Registry becomes a thing. I
can>
 > > see the Extension Registry perhaps making this a moot point, but>
 > > clearly we need to have this discussion in the meantime.>
 > >>
 > > Regards,>
 > > Matt>
 > >>
 > >>
 > > On Sat, Feb 10, 2018 at 5:23 PM, Andrew Grande > wrote:>
 > > > Wasn't there a warning trigger about the NiFi distro size from
Apache>
 > > > recently? IMO, before talking alternative languages, solve the
modularity>
 > > > and NAR distribution problem. I think the implementation of a
module>
 > > won't>
 > > > matter much then, the point being not everything has to go in
the core,>
 > > > base distribution, but can still be easily sourced from a known
repo, for>
 > > > example.>
 > > >>
 > > > I have a feeling NiFi 1.6+ can be approaching 2GB distro size
soon :)>
 > > >>
 > > > Andrew>
 > > >>
 > > > On Sat, Feb 10, 2018, 5:12 PM Joey Frazee >>
 > > wrote:>
 > > >>
 > > >> This 

Re: Will you accept contributions in Scala?

2018-02-13 Thread Mike Thomsen
Milan,

I don't think you can do that without creating a lot of fuel for a flame
war. I have personally never met a Scala developer who was incapable of
writing decent Java. The same is true of Groovy. I think anyone who finds
the requirement to use Java over their language of choice to be a deal
breaker on contributing is probably someone unlikely to be more help than
trouble anyway.

Mike

On Tue, Feb 13, 2018 at 11:35 AM, Milan Das  wrote:

> I think we should not add blindly any language but should be open to add
> couple of language like Scala.
> In Bigdata world Scala/Python/Java are widely accepted.
>
> Thanks,
> Milan Das
> Interset
>
> On 2/13/18, 10:20 AM, "Weiss, Adam"  wrote:
>
> I think it makes the most sense to me for us to publish a separate
> repo with a module and nar build for now and post when it's available in
> the users group.
>
> Thanks for the discussion everyone, hopefully we can start making some
> helpful contributions soon.
>
> -Adam
>
>
> On 2018/02/10 23:43:31, Tony Kurc  gmail.com>> wrote:
> > It is like Matt read my mind.>
> >
> > On Sat, Feb 10, 2018 at 6:26 PM, Matt Burgess  > wrote:>
> >
> > > I'm fine with a vote, but I'll be voting to keep Java as the
> single>
> > > language for the (non-test) code. I share the same concerns as
> many of>
> > > the other folks as far as accepting other languages, it's mainly
> the>
> > > "slippery slope" argument that I don't want to turn into a>
> > > JVM-language flame war.  If Scala, why not Groovy? Certainly the>
> > > syntax is closer to Java, and the community has accepted it as a
> valid>
> > > language for writing unit tests, although we stopped short for>
> > > allowing it for the deployable NiFi codebase, for the same reasons>
> > > IIRC.  If Scala and/or Groovy, why not Kotlin?  The same argument>
> > > (albeit more tenuous) goes for Clojure and just about every other
> JVM>
> > > language (although I don't expect a call for LuaJ processors lol).>
> > >>
> > > Whether we decide to support various languages ad-hoc or not, I
> would>
> > > strenuously object to multiple/hybrid build systems for the
> deployed>
> > > artifacts. If I could switch NiFi completely to Gradle I would,
> but I>
> > > realize there are good reasons for not doing so (yet?) in the
> Apache>
> > > NiFi community, and I would never want any hybrid Maven/Gradle
> build>
> > > for the deployable code, likewise for SBT, Leiningen, etc. With a>
> > > custom Mojo for Maven NAR builds, and the complexity for hybrid
> builds>
> > > in general, I think this would create a maintenance nightmare.>
> > >>
> > > The language thing is a tough decision though, it's not awesome
> that>
> > > specifying a single language can be a barrier to a more diverse>
> > > community, certainly Scala-based bundles would be more than
> welcome in>
> > > the overall NiFi ecosystem, I just think the cons outweigh the
> pros>
> > > for the baseline code. I've written Groovy processors/NARs using>
> > > Gradle as the build system, and I'm good with keeping them in my
> own>
> > > repo, especially when the Extension Registry becomes a thing. I
> can>
> > > see the Extension Registry perhaps making this a moot point, but>
> > > clearly we need to have this discussion in the meantime.>
> > >>
> > > Regards,>
> > > Matt>
> > >>
> > >>
> > > On Sat, Feb 10, 2018 at 5:23 PM, Andrew Grande  > wrote:>
> > > > Wasn't there a warning trigger about the NiFi distro size from
> Apache>
> > > > recently? IMO, before talking alternative languages, solve the
> modularity>
> > > > and NAR distribution problem. I think the implementation of a
> module>
> > > won't>
> > > > matter much then, the point being not everything has to go in
> the core,>
> > > > base distribution, but can still be easily sourced from a known
> repo, for>
> > > > example.>
> > > >>
> > > > I have a feeling NiFi 1.6+ can be approaching 2GB distro size
> soon :)>
> > > >>
> > > > Andrew>
> > > >>
> > > > On Sat, Feb 10, 2018, 5:12 PM Joey Frazee  >>
> > > wrote:>
> > > >>
> > > >> This probably necessitates a vote, yeah?>
> > > >>>
> > > >> Frankly, I’m usually happier writing Scala, and I’ve not
> encountered any>
> > > >> problems using processors written in Scala, but I think it’ll
> be>
> > > important>
> > > >> to tread lightly.>
> > > >>>
> > > >> There’s a few things that pop into my head:>
> > > >>>
> > > >> - Maintainability and reviewability. A very very good Java
> developer>
> > > need>
> > > >> not, by definition, either know how to 

Re: Will you accept contributions in Scala?

2018-02-13 Thread Milan Das
I think we should not add blindly any language but should be open to add couple 
of language like Scala.
In Bigdata world Scala/Python/Java are widely accepted.

Thanks,
Milan Das
Interset 

On 2/13/18, 10:20 AM, "Weiss, Adam"  wrote:

I think it makes the most sense to me for us to publish a separate repo 
with a module and nar build for now and post when it's available in the users 
group.

Thanks for the discussion everyone, hopefully we can start making some 
helpful contributions soon.

-Adam


On 2018/02/10 23:43:31, Tony Kurc > 
wrote:
> It is like Matt read my mind.>
>
> On Sat, Feb 10, 2018 at 6:26 PM, Matt Burgess 
> wrote:>
>
> > I'm fine with a vote, but I'll be voting to keep Java as the single>
> > language for the (non-test) code. I share the same concerns as many of>
> > the other folks as far as accepting other languages, it's mainly the>
> > "slippery slope" argument that I don't want to turn into a>
> > JVM-language flame war.  If Scala, why not Groovy? Certainly the>
> > syntax is closer to Java, and the community has accepted it as a valid>
> > language for writing unit tests, although we stopped short for>
> > allowing it for the deployable NiFi codebase, for the same reasons>
> > IIRC.  If Scala and/or Groovy, why not Kotlin?  The same argument>
> > (albeit more tenuous) goes for Clojure and just about every other JVM>
> > language (although I don't expect a call for LuaJ processors lol).>
> >>
> > Whether we decide to support various languages ad-hoc or not, I would>
> > strenuously object to multiple/hybrid build systems for the deployed>
> > artifacts. If I could switch NiFi completely to Gradle I would, but I>
> > realize there are good reasons for not doing so (yet?) in the Apache>
> > NiFi community, and I would never want any hybrid Maven/Gradle build>
> > for the deployable code, likewise for SBT, Leiningen, etc. With a>
> > custom Mojo for Maven NAR builds, and the complexity for hybrid builds>
> > in general, I think this would create a maintenance nightmare.>
> >>
> > The language thing is a tough decision though, it's not awesome that>
> > specifying a single language can be a barrier to a more diverse>
> > community, certainly Scala-based bundles would be more than welcome in>
> > the overall NiFi ecosystem, I just think the cons outweigh the pros>
> > for the baseline code. I've written Groovy processors/NARs using>
> > Gradle as the build system, and I'm good with keeping them in my own>
> > repo, especially when the Extension Registry becomes a thing. I can>
> > see the Extension Registry perhaps making this a moot point, but>
> > clearly we need to have this discussion in the meantime.>
> >>
> > Regards,>
> > Matt>
> >>
> >>
> > On Sat, Feb 10, 2018 at 5:23 PM, Andrew Grande 
> wrote:>
> > > Wasn't there a warning trigger about the NiFi distro size from Apache>
> > > recently? IMO, before talking alternative languages, solve the 
modularity>
> > > and NAR distribution problem. I think the implementation of a module>
> > won't>
> > > matter much then, the point being not everything has to go in the 
core,>
> > > base distribution, but can still be easily sourced from a known repo, 
for>
> > > example.>
> > >>
> > > I have a feeling NiFi 1.6+ can be approaching 2GB distro size soon :)>
> > >>
> > > Andrew>
> > >>
> > > On Sat, Feb 10, 2018, 5:12 PM Joey Frazee 
>>
> > wrote:>
> > >>
> > >> This probably necessitates a vote, yeah?>
> > >>>
> > >> Frankly, I’m usually happier writing Scala, and I’ve not encountered 
any>
> > >> problems using processors written in Scala, but I think it’ll be>
> > important>
> > >> to tread lightly.>
> > >>>
> > >> There’s a few things that pop into my head:>
> > >>>
> > >> - Maintainability and reviewability. A very very good Java developer>
> > need>
> > >> not, by definition, either know how to write or identify good Scala 
or>
> > spot>
> > >> problems and bugs.>
> > >> - Every Scala processor would either end up with a 5MB 
scala-lang.jar>
> > >> packaged into the .nar or we’d have to start including it in the 
core>
> > >> somewhere, if it’s not. It’s possible it might have already gotten>
> > pulled>
> > >> up from other dependencies.>
> > >> - Style. There’s a tremendous amount of variation in Scala style 
because>
> > >> of its type system, implicits, macros, and functional nature. There 
are>
> > >> very good people out there that can write good Scala that isn’t>
> > readable by>
> > >> the 99%.>

Re: Will you accept contributions in Scala?

2018-02-13 Thread Weiss, Adam
I think it makes the most sense to me for us to publish a separate repo with a 
module and nar build for now and post when it's available in the users group.

Thanks for the discussion everyone, hopefully we can start making some helpful 
contributions soon.

-Adam


On 2018/02/10 23:43:31, Tony Kurc > wrote:
> It is like Matt read my mind.>
>
> On Sat, Feb 10, 2018 at 6:26 PM, Matt Burgess 
> > wrote:>
>
> > I'm fine with a vote, but I'll be voting to keep Java as the single>
> > language for the (non-test) code. I share the same concerns as many of>
> > the other folks as far as accepting other languages, it's mainly the>
> > "slippery slope" argument that I don't want to turn into a>
> > JVM-language flame war.  If Scala, why not Groovy? Certainly the>
> > syntax is closer to Java, and the community has accepted it as a valid>
> > language for writing unit tests, although we stopped short for>
> > allowing it for the deployable NiFi codebase, for the same reasons>
> > IIRC.  If Scala and/or Groovy, why not Kotlin?  The same argument>
> > (albeit more tenuous) goes for Clojure and just about every other JVM>
> > language (although I don't expect a call for LuaJ processors lol).>
> >>
> > Whether we decide to support various languages ad-hoc or not, I would>
> > strenuously object to multiple/hybrid build systems for the deployed>
> > artifacts. If I could switch NiFi completely to Gradle I would, but I>
> > realize there are good reasons for not doing so (yet?) in the Apache>
> > NiFi community, and I would never want any hybrid Maven/Gradle build>
> > for the deployable code, likewise for SBT, Leiningen, etc. With a>
> > custom Mojo for Maven NAR builds, and the complexity for hybrid builds>
> > in general, I think this would create a maintenance nightmare.>
> >>
> > The language thing is a tough decision though, it's not awesome that>
> > specifying a single language can be a barrier to a more diverse>
> > community, certainly Scala-based bundles would be more than welcome in>
> > the overall NiFi ecosystem, I just think the cons outweigh the pros>
> > for the baseline code. I've written Groovy processors/NARs using>
> > Gradle as the build system, and I'm good with keeping them in my own>
> > repo, especially when the Extension Registry becomes a thing. I can>
> > see the Extension Registry perhaps making this a moot point, but>
> > clearly we need to have this discussion in the meantime.>
> >>
> > Regards,>
> > Matt>
> >>
> >>
> > On Sat, Feb 10, 2018 at 5:23 PM, Andrew Grande 
> > > wrote:>
> > > Wasn't there a warning trigger about the NiFi distro size from Apache>
> > > recently? IMO, before talking alternative languages, solve the modularity>
> > > and NAR distribution problem. I think the implementation of a module>
> > won't>
> > > matter much then, the point being not everything has to go in the core,>
> > > base distribution, but can still be easily sourced from a known repo, for>
> > > example.>
> > >>
> > > I have a feeling NiFi 1.6+ can be approaching 2GB distro size soon :)>
> > >>
> > > Andrew>
> > >>
> > > On Sat, Feb 10, 2018, 5:12 PM Joey Frazee 
> > > >>
> > wrote:>
> > >>
> > >> This probably necessitates a vote, yeah?>
> > >>>
> > >> Frankly, I’m usually happier writing Scala, and I’ve not encountered any>
> > >> problems using processors written in Scala, but I think it’ll be>
> > important>
> > >> to tread lightly.>
> > >>>
> > >> There’s a few things that pop into my head:>
> > >>>
> > >> - Maintainability and reviewability. A very very good Java developer>
> > need>
> > >> not, by definition, either know how to write or identify good Scala or>
> > spot>
> > >> problems and bugs.>
> > >> - Every Scala processor would either end up with a 5MB scala-lang.jar>
> > >> packaged into the .nar or we’d have to start including it in the core>
> > >> somewhere, if it’s not. It’s possible it might have already gotten>
> > pulled>
> > >> up from other dependencies.>
> > >> - Style. There’s a tremendous amount of variation in Scala style because>
> > >> of its type system, implicits, macros, and functional nature. There are>
> > >> very good people out there that can write good Scala that isn’t>
> > readable by>
> > >> the 99%.>
> > >> - Binary compatibility. Scala tends to be a little more brazen about>
> > >> breaking binary compatibility in major releases and those happen a bit>
> > more>
> > >> often than with Java. That’s not a problem for any potential source>
> > code in>
> > >> the project, but it could present some dependency issues someday.>
> > >> - Testing. There’s N > 1 test frameworks and testing styles within>
> > those,>
> > >> so there’s a lot of options for introducing more variability into the>
> > tests.>
> > >> - NiFi uses a lot of statics in setting up properties and relationships>
> > >> and the like, and