Re: Flume 1.7.0 release

2016-06-13 Thread Hari Shreedharan
Sure. I can help with the release. My time to work on this would be
limited, so it might take me a day or two between updates.
On Mon, Jun 13, 2016 at 9:29 PM Saikat Kanjilal  wrote:

> Works for me,  let me know how I can get involved/help.
>
> Sent from my iPhone
>
> > On Jun 13, 2016, at 9:24 PM, Lior Zeno  wrote:
> >
> > Saikat, I still think that we should discuss it here. We can talk in
> > private about specific issues if you'd like.
> >
> > I'll open an umbrella issue for this release. It will include all
> necessary
> > steps, e.g. keys and docs, but also jira cleaning and reviewing all
> pending
> > patches we have. I know it's a lot of work, but it has to be done.
> >
> > Hari, thank you. Would you please mentor?
> >> On Jun 14, 2016 3:21 AM, "Hari Shreedharan" 
> wrote:
> >>
> >> Only committers can commit to the repo. In the past, when release
> managers
> >> were not committers, one committer mentored the release manager. For the
> >> release, patches would be posted as usual on a release and the committer
> >> would commit the patches. Basically follow all steps as is, and then
> just
> >> post patches to jiras. See the umbrella jira for Flume 1.6 release:
> >> https://issues.apache.org/jira/browse/FLUME-2674
> >>
> >> Lior - I have added you as a contributor on jira. This should allow you
> to
> >> create jira tickets and assign them to yourself.
> >>
> >> On Mon, Jun 13, 2016 at 4:31 PM Saikat Kanjilal 
> >> wrote:
> >>
> >>> Should we have a google hangout session to figure this out?
> >>>
>  From: liorz...@gmail.com
>  Date: Mon, 13 Jun 2016 22:44:29 +0300
>  Subject: Re: Flume 1.7.0 release
>  To: dev@flume.apache.org
> 
>  Guys, thank you for your support and motivation. There are still two
> >> big
>  issues we need to figure out before we can proceed:
>  (a) Who can commit to the repo?
>  (b) Who has JIRA permissions?
> 
>  Hari, I'll be happy to run this release, if that's fine by everyone.
> 
> > On Mon, Jun 13, 2016 at 6:52 AM, Lior Zeno 
> wrote:
> >
> > Hi Shiwei,
> >
> > Please see this:
> > https://cwiki.apache.org/confluence/display/FLUME/How+to+Contribute.
> > In a nutshell, issues are manages on JIRA, and code contributions are
> >>> done
> > via patches (not pull requests).
> > Regarding a release plan, that is true, we will need to discuss a
> >>> roadmap
> > and manage more carefully our release plans. However, since there was
> >>> not
> > any new release in the past year, the next version of Flume will have
> > plenty of new features and bug fixes :)
> >
> > On Mon, Jun 13, 2016 at 6:26 AM, shiwei qin 
> >>> wrote:
> >
> >> I'd like to be able to release the new version as soon as possible.
> >>> Also I
> >> never see release plan. In fact, I most want is the ability to flex
> >> developers migrate to github, There are a lot of people do not
> >> really
> >>> know
> >> how to contribute code to flume, like me.
> >>
> >> 2016-06-13 6:43 GMT+08:00 Attila Simon :
> >>
> >>> Hi All,
> >>>
> >>> I would love to hear more and get involved.
> >>>
> >>> Just another enthusiastic developer candidate,
> >>> Attila
> >>>
> >>>
> >>> On Sun, Jun 12, 2016 at 7:09 AM, Saikat Kanjilal <
> >>> sxk1...@hotmail.com>
> >>> wrote:
> >>>
>  Sure will do, meanwhile should we get together over google
> >>> hangout to
>  discuss the outstanding JIRA's.  I'm looking to get more
> >> actively
> >>> involved
>  in the project.
> 
> > Date: Sun, 12 Jun 2016 16:56:30 +0300
> > Subject: Re: Flume 1.7.0 release
> > From: liorz...@gmail.com
> > To: dev@flume.apache.org
> >
> > Please open a new thread on your proposal with motivation,
> >>> design
> >> and
> >>> an
> > example. If other shipping frameworks, such as fluentd or
> >>> logstash,
> >>> have
>  a
> > similar feature then please add it as a reference.
> >
> > In this thread, we will continue discussing the next release
> >> and
> >> how we
>  can
> > solve our problems in order to release more than once a year.
> > Actually I never even got to patch submit page, I just wanted
> >> feedback
>  from
> > the community around interest for a graph sink to write or
> >> read
> >>> from
> > neo4j/orientdb, I've started dev efforts but am worried that
> >> no
> >>> one
> >> has
> > responded.   What are active next steps to get to a more
> >> vibrant
>  community
> > and move this plugin along?  I am in the initial dev and
> >> design
> >> stage
> >>> for
> > the plugin.
> >
> > Sent from my iPhone
> >
> >> On Jun 12, 2016, at 2:03 AM, Lior Zeno 
> >> wrote:
> >>
> >> This is absolutely tr

Re: Flume 1.7.0 release

2016-06-13 Thread Saikat Kanjilal
Works for me,  let me know how I can get involved/help.

Sent from my iPhone

> On Jun 13, 2016, at 9:24 PM, Lior Zeno  wrote:
> 
> Saikat, I still think that we should discuss it here. We can talk in
> private about specific issues if you'd like.
> 
> I'll open an umbrella issue for this release. It will include all necessary
> steps, e.g. keys and docs, but also jira cleaning and reviewing all pending
> patches we have. I know it's a lot of work, but it has to be done.
> 
> Hari, thank you. Would you please mentor?
>> On Jun 14, 2016 3:21 AM, "Hari Shreedharan"  wrote:
>> 
>> Only committers can commit to the repo. In the past, when release managers
>> were not committers, one committer mentored the release manager. For the
>> release, patches would be posted as usual on a release and the committer
>> would commit the patches. Basically follow all steps as is, and then just
>> post patches to jiras. See the umbrella jira for Flume 1.6 release:
>> https://issues.apache.org/jira/browse/FLUME-2674
>> 
>> Lior - I have added you as a contributor on jira. This should allow you to
>> create jira tickets and assign them to yourself.
>> 
>> On Mon, Jun 13, 2016 at 4:31 PM Saikat Kanjilal 
>> wrote:
>> 
>>> Should we have a google hangout session to figure this out?
>>> 
 From: liorz...@gmail.com
 Date: Mon, 13 Jun 2016 22:44:29 +0300
 Subject: Re: Flume 1.7.0 release
 To: dev@flume.apache.org
 
 Guys, thank you for your support and motivation. There are still two
>> big
 issues we need to figure out before we can proceed:
 (a) Who can commit to the repo?
 (b) Who has JIRA permissions?
 
 Hari, I'll be happy to run this release, if that's fine by everyone.
 
> On Mon, Jun 13, 2016 at 6:52 AM, Lior Zeno  wrote:
> 
> Hi Shiwei,
> 
> Please see this:
> https://cwiki.apache.org/confluence/display/FLUME/How+to+Contribute.
> In a nutshell, issues are manages on JIRA, and code contributions are
>>> done
> via patches (not pull requests).
> Regarding a release plan, that is true, we will need to discuss a
>>> roadmap
> and manage more carefully our release plans. However, since there was
>>> not
> any new release in the past year, the next version of Flume will have
> plenty of new features and bug fixes :)
> 
> On Mon, Jun 13, 2016 at 6:26 AM, shiwei qin 
>>> wrote:
> 
>> I'd like to be able to release the new version as soon as possible.
>>> Also I
>> never see release plan. In fact, I most want is the ability to flex
>> developers migrate to github, There are a lot of people do not
>> really
>>> know
>> how to contribute code to flume, like me.
>> 
>> 2016-06-13 6:43 GMT+08:00 Attila Simon :
>> 
>>> Hi All,
>>> 
>>> I would love to hear more and get involved.
>>> 
>>> Just another enthusiastic developer candidate,
>>> Attila
>>> 
>>> 
>>> On Sun, Jun 12, 2016 at 7:09 AM, Saikat Kanjilal <
>>> sxk1...@hotmail.com>
>>> wrote:
>>> 
 Sure will do, meanwhile should we get together over google
>>> hangout to
 discuss the outstanding JIRA's.  I'm looking to get more
>> actively
>>> involved
 in the project.
 
> Date: Sun, 12 Jun 2016 16:56:30 +0300
> Subject: Re: Flume 1.7.0 release
> From: liorz...@gmail.com
> To: dev@flume.apache.org
> 
> Please open a new thread on your proposal with motivation,
>>> design
>> and
>>> an
> example. If other shipping frameworks, such as fluentd or
>>> logstash,
>>> have
 a
> similar feature then please add it as a reference.
> 
> In this thread, we will continue discussing the next release
>> and
>> how we
 can
> solve our problems in order to release more than once a year.
> Actually I never even got to patch submit page, I just wanted
>> feedback
 from
> the community around interest for a graph sink to write or
>> read
>>> from
> neo4j/orientdb, I've started dev efforts but am worried that
>> no
>>> one
>> has
> responded.   What are active next steps to get to a more
>> vibrant
 community
> and move this plugin along?  I am in the initial dev and
>> design
>> stage
>>> for
> the plugin.
> 
> Sent from my iPhone
> 
>> On Jun 12, 2016, at 2:03 AM, Lior Zeno 
>> wrote:
>> 
>> This is absolutely true. The availability of the project's
>> committers
 is
>> very low, which leads to cases like yours where people
>> submit
>> patches
 and
>> never get a response, even after a year of waiting. I
>> believe
>> that we
 have
>> to be more active and available, since low availability
>> discourages
>> contributors.
>> I think that such discussions should take p

Re: Flume 1.7.0 release

2016-06-13 Thread Lior Zeno
Saikat, I still think that we should discuss it here. We can talk in
private about specific issues if you'd like.

I'll open an umbrella issue for this release. It will include all necessary
steps, e.g. keys and docs, but also jira cleaning and reviewing all pending
patches we have. I know it's a lot of work, but it has to be done.

Hari, thank you. Would you please mentor?
On Jun 14, 2016 3:21 AM, "Hari Shreedharan"  wrote:

> Only committers can commit to the repo. In the past, when release managers
> were not committers, one committer mentored the release manager. For the
> release, patches would be posted as usual on a release and the committer
> would commit the patches. Basically follow all steps as is, and then just
> post patches to jiras. See the umbrella jira for Flume 1.6 release:
> https://issues.apache.org/jira/browse/FLUME-2674
>
> Lior - I have added you as a contributor on jira. This should allow you to
> create jira tickets and assign them to yourself.
>
> On Mon, Jun 13, 2016 at 4:31 PM Saikat Kanjilal 
> wrote:
>
> > Should we have a google hangout session to figure this out?
> >
> > > From: liorz...@gmail.com
> > > Date: Mon, 13 Jun 2016 22:44:29 +0300
> > > Subject: Re: Flume 1.7.0 release
> > > To: dev@flume.apache.org
> > >
> > > Guys, thank you for your support and motivation. There are still two
> big
> > > issues we need to figure out before we can proceed:
> > > (a) Who can commit to the repo?
> > > (b) Who has JIRA permissions?
> > >
> > > Hari, I'll be happy to run this release, if that's fine by everyone.
> > >
> > > On Mon, Jun 13, 2016 at 6:52 AM, Lior Zeno  wrote:
> > >
> > > > Hi Shiwei,
> > > >
> > > > Please see this:
> > > > https://cwiki.apache.org/confluence/display/FLUME/How+to+Contribute.
> > > > In a nutshell, issues are manages on JIRA, and code contributions are
> > done
> > > > via patches (not pull requests).
> > > > Regarding a release plan, that is true, we will need to discuss a
> > roadmap
> > > > and manage more carefully our release plans. However, since there was
> > not
> > > > any new release in the past year, the next version of Flume will have
> > > > plenty of new features and bug fixes :)
> > > >
> > > > On Mon, Jun 13, 2016 at 6:26 AM, shiwei qin 
> > wrote:
> > > >
> > > >> I'd like to be able to release the new version as soon as possible.
> > Also I
> > > >> never see release plan. In fact, I most want is the ability to flex
> > > >> developers migrate to github, There are a lot of people do not
> really
> > know
> > > >> how to contribute code to flume, like me.
> > > >>
> > > >> 2016-06-13 6:43 GMT+08:00 Attila Simon :
> > > >>
> > > >> > Hi All,
> > > >> >
> > > >> > I would love to hear more and get involved.
> > > >> >
> > > >> > Just another enthusiastic developer candidate,
> > > >> > Attila
> > > >> >
> > > >> >
> > > >> > On Sun, Jun 12, 2016 at 7:09 AM, Saikat Kanjilal <
> > sxk1...@hotmail.com>
> > > >> > wrote:
> > > >> >
> > > >> > > Sure will do, meanwhile should we get together over google
> > hangout to
> > > >> > > discuss the outstanding JIRA's.  I'm looking to get more
> actively
> > > >> > involved
> > > >> > > in the project.
> > > >> > >
> > > >> > > > Date: Sun, 12 Jun 2016 16:56:30 +0300
> > > >> > > > Subject: Re: Flume 1.7.0 release
> > > >> > > > From: liorz...@gmail.com
> > > >> > > > To: dev@flume.apache.org
> > > >> > > >
> > > >> > > > Please open a new thread on your proposal with motivation,
> > design
> > > >> and
> > > >> > an
> > > >> > > > example. If other shipping frameworks, such as fluentd or
> > logstash,
> > > >> > have
> > > >> > > a
> > > >> > > > similar feature then please add it as a reference.
> > > >> > > >
> > > >> > > > In this thread, we will continue discussing the next release
> and
> > > >> how we
> > > >> > > can
> > > >> > > > solve our problems in order to release more than once a year.
> > > >> > > > Actually I never even got to patch submit page, I just wanted
> > > >> feedback
> > > >> > > from
> > > >> > > > the community around interest for a graph sink to write or
> read
> > from
> > > >> > > > neo4j/orientdb, I've started dev efforts but am worried that
> no
> > one
> > > >> has
> > > >> > > > responded.   What are active next steps to get to a more
> vibrant
> > > >> > > community
> > > >> > > > and move this plugin along?  I am in the initial dev and
> design
> > > >> stage
> > > >> > for
> > > >> > > > the plugin.
> > > >> > > >
> > > >> > > > Sent from my iPhone
> > > >> > > >
> > > >> > > > > On Jun 12, 2016, at 2:03 AM, Lior Zeno 
> > > >> wrote:
> > > >> > > > >
> > > >> > > > > This is absolutely true. The availability of the project's
> > > >> committers
> > > >> > > is
> > > >> > > > > very low, which leads to cases like yours where people
> submit
> > > >> patches
> > > >> > > and
> > > >> > > > > never get a response, even after a year of waiting. I
> believe
> > > >> that we
> > > >> > > have
> > > >> > > > > to be more active and available, since low availability
> 

Re: Review Request 48161: FLUME-2918: TaildirSource is underperforming with huge parent directories

2016-06-13 Thread Mike Percy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48161/#review137440
---




flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
 (line 96)


s/consisting/consists/



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
 (line 168)


Would you mind renaming actualMTime to currentParentDirMTime? For 
consistency with lastSeenParentDirMTime



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
 (line 218)


why remove parentDir? Just wondering.



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
 (line 49)


I didn't mean to force you to add full Javadoc. You can add it if you want, 
but all I wanted was at least a quick comment to summarize what this helper 
function did.


- Mike Percy


On June 13, 2016, 2:14 p.m., Attila Simon wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48161/
> ---
> 
> (Updated June 13, 2016, 2:14 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2918
> https://issues.apache.org/jira/browse/FLUME-2918
> 
> 
> Repository: flume-git
> 
> 
> Description
> ---
> 
> The way TailDir source checks which files should be tracked was improved. 
> Existing implementation caused unneccessary high CPU usage for huge (+50K 
> files) directories. This fix allows users to eliminate continous listing of 
> parent directory (on each Source.process invocation) and introduce a more 
> performant method for listing&matching files.
> 
> used java.nio.file.DirectoryStream to filter files
> made pattern match calculation optionally cached
> added junit tests
> added javadoc
> added license
> 
> 
> Diffs
> -
> 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/ReliableTaildirEventReader.java
>  5b6d465 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
>  PRE-CREATION 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSource.java
>  8816327 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSourceConfigurationConstants.java
>  6165276 
>   
> flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
>  PRE-CREATION 
>   
> flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java
>  f9e614c 
> 
> Diff: https://reviews.apache.org/r/48161/diff/
> 
> 
> Testing
> ---
> 
> mvn clean install -DskipTests -> built
> junit tests for flume-taildir-source module -> passed
> 
> 
> Thanks,
> 
> Attila Simon
> 
>



Re: Review Request 48161: FLUME-2918: TaildirSource is underperforming with huge parent directories

2016-06-13 Thread Mike Percy


> On June 7, 2016, 1:44 a.m., Mike Percy wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java,
> >  line 161
> > 
> >
> > nit: spurious parenthesis before lastSeenParentDirMTime
> 
> Attila Simon wrote:
> the condition was described in the javadoc, unfortunately it is ugly but 
> needed

How about this?

  List getMatchingFiles() {
long now = System.currentTimeMillis();
long currentParentDirMTime = parentDir.lastModified();
// Only check a maximum of once per second.
if (!cachePatternMatching ||
(currentParentDirMTime > lastSeenParentDirMTime &&
 TimeUnit.SECONDS.toMillis(TimeUnit.MILLISECONDS.toSeconds(now)) > 
lastCheckedTime)) {
  lastMatchedFiles = getMatchingFilesNoCache();
  Collections.sort(lastMatchedFiles, new 
TailFile.CompareByLastModifiedTime());
  lastSeenParentDirMTime = currentParentDirMTime;
  lastCheckedTime = 
TimeUnit.SECONDS.toMillis(TimeUnit.MILLISECONDS.toSeconds(now));
}
return lastMatchedFiles;
  }

Except that we should replace the sorting with a helper function that only runs 
stat() once per item.


- Mike


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48161/#review136086
---


On June 13, 2016, 2:14 p.m., Attila Simon wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48161/
> ---
> 
> (Updated June 13, 2016, 2:14 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2918
> https://issues.apache.org/jira/browse/FLUME-2918
> 
> 
> Repository: flume-git
> 
> 
> Description
> ---
> 
> The way TailDir source checks which files should be tracked was improved. 
> Existing implementation caused unneccessary high CPU usage for huge (+50K 
> files) directories. This fix allows users to eliminate continous listing of 
> parent directory (on each Source.process invocation) and introduce a more 
> performant method for listing&matching files.
> 
> used java.nio.file.DirectoryStream to filter files
> made pattern match calculation optionally cached
> added junit tests
> added javadoc
> added license
> 
> 
> Diffs
> -
> 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/ReliableTaildirEventReader.java
>  5b6d465 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
>  PRE-CREATION 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSource.java
>  8816327 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSourceConfigurationConstants.java
>  6165276 
>   
> flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
>  PRE-CREATION 
>   
> flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java
>  f9e614c 
> 
> Diff: https://reviews.apache.org/r/48161/diff/
> 
> 
> Testing
> ---
> 
> mvn clean install -DskipTests -> built
> junit tests for flume-taildir-source module -> passed
> 
> 
> Thanks,
> 
> Attila Simon
> 
>



Re: Review Request 48161: FLUME-2918: TaildirSource is underperforming with huge parent directories

2016-06-13 Thread Mike Percy


> On June 7, 2016, 1:44 a.m., Mike Percy wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java,
> >  line 164
> > 
> >
> > This sorting function seems problematic to me. It can call stat() up to 
> > n^2 times (assuming quicksort). Shouldn't we get the last modified time of 
> > each file in the list once and then sort it? Why are we sorting, anyway?
> > 
> > By the way, I just checked the OpenJDK source code and indeed every 
> > File.lastModified() maps to FileSystem.getLastModifiedTime(f) which maps to 
> > stat(2). See 
> > https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/solaris/native/java/io/UnixFileSystem_md.c#L205
> 
> Attila Simon wrote:
> Sorting was part of the original implementation (my change didn't make it 
> worse). I wanted to be non-intrusive.

If you want to maintain the sorting behavior (and I still don't know how it is 
used or why it is required) then please do the stat() call on each of the files 
in the list, and cache the mtime of each file, then sort the files based on 
those cached mtimes. It just doesn't make any sense to do IO to read the 
metadata O(n^2) times.


> On June 7, 2016, 1:44 a.m., Mike Percy wrote:
> > flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java,
> >  line 49
> > 
> >
> > style nit: leave spaces around your brackets. We use the java style 
> > guide in Flume (admittedly not as consistently as I wish we did, though)
> 
> Attila Simon wrote:
> Related to coding style nits: is there any importable flume related style 
> configuration for intellij somewhere I haven't found already? If yes could 
> you please point me to its direction?

Nope, but a checkstyle file would be welcome that enforced Java style with 
2-space indent.


- Mike


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48161/#review136086
---


On June 13, 2016, 2:14 p.m., Attila Simon wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48161/
> ---
> 
> (Updated June 13, 2016, 2:14 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2918
> https://issues.apache.org/jira/browse/FLUME-2918
> 
> 
> Repository: flume-git
> 
> 
> Description
> ---
> 
> The way TailDir source checks which files should be tracked was improved. 
> Existing implementation caused unneccessary high CPU usage for huge (+50K 
> files) directories. This fix allows users to eliminate continous listing of 
> parent directory (on each Source.process invocation) and introduce a more 
> performant method for listing&matching files.
> 
> used java.nio.file.DirectoryStream to filter files
> made pattern match calculation optionally cached
> added junit tests
> added javadoc
> added license
> 
> 
> Diffs
> -
> 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/ReliableTaildirEventReader.java
>  5b6d465 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
>  PRE-CREATION 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSource.java
>  8816327 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSourceConfigurationConstants.java
>  6165276 
>   
> flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
>  PRE-CREATION 
>   
> flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java
>  f9e614c 
> 
> Diff: https://reviews.apache.org/r/48161/diff/
> 
> 
> Testing
> ---
> 
> mvn clean install -DskipTests -> built
> junit tests for flume-taildir-source module -> passed
> 
> 
> Thanks,
> 
> Attila Simon
> 
>



Re: Review Request 48161: FLUME-2918: TaildirSource is underperforming with huge parent directories

2016-06-13 Thread Mike Percy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48161/#review137437
---




flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
 (line 109)


still too much copy / paste in your assertion messages, here and below in 
this file


- Mike Percy


On June 13, 2016, 2:14 p.m., Attila Simon wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48161/
> ---
> 
> (Updated June 13, 2016, 2:14 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2918
> https://issues.apache.org/jira/browse/FLUME-2918
> 
> 
> Repository: flume-git
> 
> 
> Description
> ---
> 
> The way TailDir source checks which files should be tracked was improved. 
> Existing implementation caused unneccessary high CPU usage for huge (+50K 
> files) directories. This fix allows users to eliminate continous listing of 
> parent directory (on each Source.process invocation) and introduce a more 
> performant method for listing&matching files.
> 
> used java.nio.file.DirectoryStream to filter files
> made pattern match calculation optionally cached
> added junit tests
> added javadoc
> added license
> 
> 
> Diffs
> -
> 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/ReliableTaildirEventReader.java
>  5b6d465 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
>  PRE-CREATION 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSource.java
>  8816327 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSourceConfigurationConstants.java
>  6165276 
>   
> flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
>  PRE-CREATION 
>   
> flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java
>  f9e614c 
> 
> Diff: https://reviews.apache.org/r/48161/diff/
> 
> 
> Testing
> ---
> 
> mvn clean install -DskipTests -> built
> junit tests for flume-taildir-source module -> passed
> 
> 
> Thanks,
> 
> Attila Simon
> 
>



Re: Flume 1.7.0 release

2016-06-13 Thread Hari Shreedharan
Only committers can commit to the repo. In the past, when release managers
were not committers, one committer mentored the release manager. For the
release, patches would be posted as usual on a release and the committer
would commit the patches. Basically follow all steps as is, and then just
post patches to jiras. See the umbrella jira for Flume 1.6 release:
https://issues.apache.org/jira/browse/FLUME-2674

Lior - I have added you as a contributor on jira. This should allow you to
create jira tickets and assign them to yourself.

On Mon, Jun 13, 2016 at 4:31 PM Saikat Kanjilal  wrote:

> Should we have a google hangout session to figure this out?
>
> > From: liorz...@gmail.com
> > Date: Mon, 13 Jun 2016 22:44:29 +0300
> > Subject: Re: Flume 1.7.0 release
> > To: dev@flume.apache.org
> >
> > Guys, thank you for your support and motivation. There are still two big
> > issues we need to figure out before we can proceed:
> > (a) Who can commit to the repo?
> > (b) Who has JIRA permissions?
> >
> > Hari, I'll be happy to run this release, if that's fine by everyone.
> >
> > On Mon, Jun 13, 2016 at 6:52 AM, Lior Zeno  wrote:
> >
> > > Hi Shiwei,
> > >
> > > Please see this:
> > > https://cwiki.apache.org/confluence/display/FLUME/How+to+Contribute.
> > > In a nutshell, issues are manages on JIRA, and code contributions are
> done
> > > via patches (not pull requests).
> > > Regarding a release plan, that is true, we will need to discuss a
> roadmap
> > > and manage more carefully our release plans. However, since there was
> not
> > > any new release in the past year, the next version of Flume will have
> > > plenty of new features and bug fixes :)
> > >
> > > On Mon, Jun 13, 2016 at 6:26 AM, shiwei qin 
> wrote:
> > >
> > >> I'd like to be able to release the new version as soon as possible.
> Also I
> > >> never see release plan. In fact, I most want is the ability to flex
> > >> developers migrate to github, There are a lot of people do not really
> know
> > >> how to contribute code to flume, like me.
> > >>
> > >> 2016-06-13 6:43 GMT+08:00 Attila Simon :
> > >>
> > >> > Hi All,
> > >> >
> > >> > I would love to hear more and get involved.
> > >> >
> > >> > Just another enthusiastic developer candidate,
> > >> > Attila
> > >> >
> > >> >
> > >> > On Sun, Jun 12, 2016 at 7:09 AM, Saikat Kanjilal <
> sxk1...@hotmail.com>
> > >> > wrote:
> > >> >
> > >> > > Sure will do, meanwhile should we get together over google
> hangout to
> > >> > > discuss the outstanding JIRA's.  I'm looking to get more actively
> > >> > involved
> > >> > > in the project.
> > >> > >
> > >> > > > Date: Sun, 12 Jun 2016 16:56:30 +0300
> > >> > > > Subject: Re: Flume 1.7.0 release
> > >> > > > From: liorz...@gmail.com
> > >> > > > To: dev@flume.apache.org
> > >> > > >
> > >> > > > Please open a new thread on your proposal with motivation,
> design
> > >> and
> > >> > an
> > >> > > > example. If other shipping frameworks, such as fluentd or
> logstash,
> > >> > have
> > >> > > a
> > >> > > > similar feature then please add it as a reference.
> > >> > > >
> > >> > > > In this thread, we will continue discussing the next release and
> > >> how we
> > >> > > can
> > >> > > > solve our problems in order to release more than once a year.
> > >> > > > Actually I never even got to patch submit page, I just wanted
> > >> feedback
> > >> > > from
> > >> > > > the community around interest for a graph sink to write or read
> from
> > >> > > > neo4j/orientdb, I've started dev efforts but am worried that no
> one
> > >> has
> > >> > > > responded.   What are active next steps to get to a more vibrant
> > >> > > community
> > >> > > > and move this plugin along?  I am in the initial dev and design
> > >> stage
> > >> > for
> > >> > > > the plugin.
> > >> > > >
> > >> > > > Sent from my iPhone
> > >> > > >
> > >> > > > > On Jun 12, 2016, at 2:03 AM, Lior Zeno 
> > >> wrote:
> > >> > > > >
> > >> > > > > This is absolutely true. The availability of the project's
> > >> committers
> > >> > > is
> > >> > > > > very low, which leads to cases like yours where people submit
> > >> patches
> > >> > > and
> > >> > > > > never get a response, even after a year of waiting. I believe
> > >> that we
> > >> > > have
> > >> > > > > to be more active and available, since low availability
> > >> discourages
> > >> > > > > contributors.
> > >> > > > > I think that such discussions should take place either here
> or on
> > >> > JIRA,
> > >> > > > > since it does not limit the discussion to a small group of
> people,
> > >> > but
> > >> > > > > instead allows the community to be a part of the project's
> future.
> > >> > > > >
> > >> > > > > On Sun, Jun 12, 2016 at 2:09 AM, Saikat Kanjilal <
> > >> > sxk1...@hotmail.com>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > >> I would like to help out managing Jira's but never heard back
> > >> from
> > >> > the
> > >> > > > >> community about the graph sink that I've been working on.
> Does it
> > >> > make
> > >> > > > >> s

RE: Flume 1.7.0 release

2016-06-13 Thread Saikat Kanjilal
Should we have a google hangout session to figure this out?

> From: liorz...@gmail.com
> Date: Mon, 13 Jun 2016 22:44:29 +0300
> Subject: Re: Flume 1.7.0 release
> To: dev@flume.apache.org
> 
> Guys, thank you for your support and motivation. There are still two big
> issues we need to figure out before we can proceed:
> (a) Who can commit to the repo?
> (b) Who has JIRA permissions?
> 
> Hari, I'll be happy to run this release, if that's fine by everyone.
> 
> On Mon, Jun 13, 2016 at 6:52 AM, Lior Zeno  wrote:
> 
> > Hi Shiwei,
> >
> > Please see this:
> > https://cwiki.apache.org/confluence/display/FLUME/How+to+Contribute.
> > In a nutshell, issues are manages on JIRA, and code contributions are done
> > via patches (not pull requests).
> > Regarding a release plan, that is true, we will need to discuss a roadmap
> > and manage more carefully our release plans. However, since there was not
> > any new release in the past year, the next version of Flume will have
> > plenty of new features and bug fixes :)
> >
> > On Mon, Jun 13, 2016 at 6:26 AM, shiwei qin  wrote:
> >
> >> I'd like to be able to release the new version as soon as possible. Also I
> >> never see release plan. In fact, I most want is the ability to flex
> >> developers migrate to github, There are a lot of people do not really know
> >> how to contribute code to flume, like me.
> >>
> >> 2016-06-13 6:43 GMT+08:00 Attila Simon :
> >>
> >> > Hi All,
> >> >
> >> > I would love to hear more and get involved.
> >> >
> >> > Just another enthusiastic developer candidate,
> >> > Attila
> >> >
> >> >
> >> > On Sun, Jun 12, 2016 at 7:09 AM, Saikat Kanjilal 
> >> > wrote:
> >> >
> >> > > Sure will do, meanwhile should we get together over google hangout to
> >> > > discuss the outstanding JIRA's.  I'm looking to get more actively
> >> > involved
> >> > > in the project.
> >> > >
> >> > > > Date: Sun, 12 Jun 2016 16:56:30 +0300
> >> > > > Subject: Re: Flume 1.7.0 release
> >> > > > From: liorz...@gmail.com
> >> > > > To: dev@flume.apache.org
> >> > > >
> >> > > > Please open a new thread on your proposal with motivation, design
> >> and
> >> > an
> >> > > > example. If other shipping frameworks, such as fluentd or logstash,
> >> > have
> >> > > a
> >> > > > similar feature then please add it as a reference.
> >> > > >
> >> > > > In this thread, we will continue discussing the next release and
> >> how we
> >> > > can
> >> > > > solve our problems in order to release more than once a year.
> >> > > > Actually I never even got to patch submit page, I just wanted
> >> feedback
> >> > > from
> >> > > > the community around interest for a graph sink to write or read from
> >> > > > neo4j/orientdb, I've started dev efforts but am worried that no one
> >> has
> >> > > > responded.   What are active next steps to get to a more vibrant
> >> > > community
> >> > > > and move this plugin along?  I am in the initial dev and design
> >> stage
> >> > for
> >> > > > the plugin.
> >> > > >
> >> > > > Sent from my iPhone
> >> > > >
> >> > > > > On Jun 12, 2016, at 2:03 AM, Lior Zeno 
> >> wrote:
> >> > > > >
> >> > > > > This is absolutely true. The availability of the project's
> >> committers
> >> > > is
> >> > > > > very low, which leads to cases like yours where people submit
> >> patches
> >> > > and
> >> > > > > never get a response, even after a year of waiting. I believe
> >> that we
> >> > > have
> >> > > > > to be more active and available, since low availability
> >> discourages
> >> > > > > contributors.
> >> > > > > I think that such discussions should take place either here or on
> >> > JIRA,
> >> > > > > since it does not limit the discussion to a small group of people,
> >> > but
> >> > > > > instead allows the community to be a part of the project's future.
> >> > > > >
> >> > > > > On Sun, Jun 12, 2016 at 2:09 AM, Saikat Kanjilal <
> >> > sxk1...@hotmail.com>
> >> > > > > wrote:
> >> > > > >
> >> > > > >> I would like to help out managing Jira's but never heard back
> >> from
> >> > the
> >> > > > >> community about the graph sink that I've been working on. Does it
> >> > make
> >> > > > >> sense to do a google hangout to discuss roadmap/upcoming
> >> features?
> >> > > > >>
> >> > > > >> Sent from my iPhone
> >> > > > >>
> >> > > > >>> On Jun 11, 2016, at 10:19 AM, Lior Zeno 
> >> > wrote:
> >> > > > >>>
> >> > > > >>> Let's first examine our JIRA issues. We have fixed issues with
> >> an
> >> > > empty
> >> > > > >>> fixVersion. In addition, we still have unresolved issues with
> >> > > > >>> fixVersion=v1.7.0. Let's deal with these first. I would do it
> >> > myself,
> >> > > > >> but I
> >> > > > >>> don't have the appropriate permissions for that.
> >> > > > >>>
> >> > > > >>> On Sat, Jun 11, 2016 at 8:10 PM, Hari Shreedharan <
> >> > > > >> hshreedha...@apache.org>
> >> > > > >>> wrote:
> >> > > > >>>
> >> > > >  Sound good to me. If we have a volunteer to run the release I
> >> will
> >> > > > >> gladly
> >> > > >  help out.
> >> > > 

Re: Flume 1.7.0 release

2016-06-13 Thread Lior Zeno
Guys, thank you for your support and motivation. There are still two big
issues we need to figure out before we can proceed:
(a) Who can commit to the repo?
(b) Who has JIRA permissions?

Hari, I'll be happy to run this release, if that's fine by everyone.

On Mon, Jun 13, 2016 at 6:52 AM, Lior Zeno  wrote:

> Hi Shiwei,
>
> Please see this:
> https://cwiki.apache.org/confluence/display/FLUME/How+to+Contribute.
> In a nutshell, issues are manages on JIRA, and code contributions are done
> via patches (not pull requests).
> Regarding a release plan, that is true, we will need to discuss a roadmap
> and manage more carefully our release plans. However, since there was not
> any new release in the past year, the next version of Flume will have
> plenty of new features and bug fixes :)
>
> On Mon, Jun 13, 2016 at 6:26 AM, shiwei qin  wrote:
>
>> I'd like to be able to release the new version as soon as possible. Also I
>> never see release plan. In fact, I most want is the ability to flex
>> developers migrate to github, There are a lot of people do not really know
>> how to contribute code to flume, like me.
>>
>> 2016-06-13 6:43 GMT+08:00 Attila Simon :
>>
>> > Hi All,
>> >
>> > I would love to hear more and get involved.
>> >
>> > Just another enthusiastic developer candidate,
>> > Attila
>> >
>> >
>> > On Sun, Jun 12, 2016 at 7:09 AM, Saikat Kanjilal 
>> > wrote:
>> >
>> > > Sure will do, meanwhile should we get together over google hangout to
>> > > discuss the outstanding JIRA's.  I'm looking to get more actively
>> > involved
>> > > in the project.
>> > >
>> > > > Date: Sun, 12 Jun 2016 16:56:30 +0300
>> > > > Subject: Re: Flume 1.7.0 release
>> > > > From: liorz...@gmail.com
>> > > > To: dev@flume.apache.org
>> > > >
>> > > > Please open a new thread on your proposal with motivation, design
>> and
>> > an
>> > > > example. If other shipping frameworks, such as fluentd or logstash,
>> > have
>> > > a
>> > > > similar feature then please add it as a reference.
>> > > >
>> > > > In this thread, we will continue discussing the next release and
>> how we
>> > > can
>> > > > solve our problems in order to release more than once a year.
>> > > > Actually I never even got to patch submit page, I just wanted
>> feedback
>> > > from
>> > > > the community around interest for a graph sink to write or read from
>> > > > neo4j/orientdb, I've started dev efforts but am worried that no one
>> has
>> > > > responded.   What are active next steps to get to a more vibrant
>> > > community
>> > > > and move this plugin along?  I am in the initial dev and design
>> stage
>> > for
>> > > > the plugin.
>> > > >
>> > > > Sent from my iPhone
>> > > >
>> > > > > On Jun 12, 2016, at 2:03 AM, Lior Zeno 
>> wrote:
>> > > > >
>> > > > > This is absolutely true. The availability of the project's
>> committers
>> > > is
>> > > > > very low, which leads to cases like yours where people submit
>> patches
>> > > and
>> > > > > never get a response, even after a year of waiting. I believe
>> that we
>> > > have
>> > > > > to be more active and available, since low availability
>> discourages
>> > > > > contributors.
>> > > > > I think that such discussions should take place either here or on
>> > JIRA,
>> > > > > since it does not limit the discussion to a small group of people,
>> > but
>> > > > > instead allows the community to be a part of the project's future.
>> > > > >
>> > > > > On Sun, Jun 12, 2016 at 2:09 AM, Saikat Kanjilal <
>> > sxk1...@hotmail.com>
>> > > > > wrote:
>> > > > >
>> > > > >> I would like to help out managing Jira's but never heard back
>> from
>> > the
>> > > > >> community about the graph sink that I've been working on. Does it
>> > make
>> > > > >> sense to do a google hangout to discuss roadmap/upcoming
>> features?
>> > > > >>
>> > > > >> Sent from my iPhone
>> > > > >>
>> > > > >>> On Jun 11, 2016, at 10:19 AM, Lior Zeno 
>> > wrote:
>> > > > >>>
>> > > > >>> Let's first examine our JIRA issues. We have fixed issues with
>> an
>> > > empty
>> > > > >>> fixVersion. In addition, we still have unresolved issues with
>> > > > >>> fixVersion=v1.7.0. Let's deal with these first. I would do it
>> > myself,
>> > > > >> but I
>> > > > >>> don't have the appropriate permissions for that.
>> > > > >>>
>> > > > >>> On Sat, Jun 11, 2016 at 8:10 PM, Hari Shreedharan <
>> > > > >> hshreedha...@apache.org>
>> > > > >>> wrote:
>> > > > >>>
>> > > >  Sound good to me. If we have a volunteer to run the release I
>> will
>> > > > >> gladly
>> > > >  help out.
>> > > > > On Sat, Jun 11, 2016 at 6:54 AM Lior Zeno > >
>> > > wrote:
>> > > > >
>> > > > > Anybody?
>> > > > >
>> > > > >> On Thu, Jun 9, 2016 at 7:24 PM, Lior Zeno <
>> liorz...@gmail.com>
>> > > wrote:
>> > > > >>
>> > > > >> Hi guys,
>> > > > >>
>> > > > >> I think we should work together towards a new release. It has
>> > > been a
>> > > >  year
>> > > > >> since the last release, and there are many new features 

RE: [Discuss graph source/sink design proposal]

2016-06-13 Thread Saikat Kanjilal
Hari/MikeP,I've has this proposal open for many months now, is there any way 
you guys can take a look at the jira and design proposal and provide feedback.  
Thanks

> From: liorz...@gmail.com
> Date: Mon, 13 Jun 2016 22:09:58 +0300
> Subject: Re: [Discuss graph source/sink design proposal]
> To: dev@flume.apache.org
> 
> Got it.
> 
> On Mon, Jun 13, 2016 at 10:05 PM, Saikat Kanjilal 
> wrote:
> 
> > That's a responsibility of the graph db not flume, flume is responsible
> > for delivering the events and has no understanding of connectivity of the
> > data.  The goal in using flume is to connect incoming data that is
> > heterogeneous and transform that data before dumping it into the graph db.
> >
> > Sent from my iPhone
> >
> > > On Jun 13, 2016, at 11:09 AM, Lior Zeno  wrote:
> > >
> > > I got this part. How events are linked together? Do you expect an
> > adjacency
> > > list incorporated in the header?
> > >
> > > On Mon, Jun 13, 2016 at 8:59 PM, Saikat Kanjilal 
> > > wrote:
> > >
> > >> The use case is a flume developer wanting to connect data coming into
> > and
> > >> out of flume sinks/sources to a graph database
> > >>
> > >> Sent from my iPhone
> > >>
> > >>> On Jun 13, 2016, at 10:55 AM, Lior Zeno  wrote:
> > >>>
> > >>> I'm not sure that I follow here. Can you please give a detailed
> > use-case?
> > >>>
> >  On Mon, Jun 13, 2016 at 7:20 AM, Lior Zeno 
> > wrote:
> > 
> >  Thanks. I'll review this and share my comments later on today.
> > > On Jun 13, 2016 2:30 AM, "Saikat Kanjilal" 
> > >> wrote:
> > >
> > > Motivation/Design: The graph/sink source plugin will be used to
> > > custom transformations to connected data and dynamically apply these
> > > transformations to send data to any sync, an example of a set of
> > > destination sinks include elasticsearch/relational databases/spark
> > rdd
> > > etc.   Note that this plugin will serve as a source and a sink
> > >> depending
> > > on the configurations.  For v1 I am targeting that we plug into neo4j
> > > database using the neo4j-jdbc interface (
> > > https://github.com/larusba/neo4j-jdbc)
> > > to build http payloads to talk to neo4j.  Once our neo4j interface
> > will
> > > allow us to build generic interfaces and plug in any graph store in
> > the
> > > future.
> > > The
> > > design will consist of a hybrid piece of infrastructure serving both
> > as
> > > a source and a sink connected to the current flume infrastructure
> > > (since all the current sinks and sources are living in their own
> > > directories I would suggest this live somewhere else in the flume
> > > directory structure.  Listed below is some classes I have partially
> > > configured to kick off this
> > > discussion
> > > NeoRestClient
> > > Roles and Responsibilities: Interface to neo4j, unpack and pack data
> > > structures to perform CRUD operation on a local or remote noe4j
> > >> instance
> > > APIS:
> > > //inputs flume event
> > > //outputs flume data structure identifying success metrics around the
> > > operation
> > > //description: transform the flume event into a graph node
> > > insertNode(NeoNode nodeToInsert)
> > > searchNode(NeoNode nodeToSearch,Algorithm useAStarOrDijkstra)
> > > deleteNode(NeoNode nodeToDelete)
> > >
> > >
> > > Note that I would also like to offer up the chance to present cipher
> > > queries (http://neo4j.com/developer/cypher-query-language/) to the
> > > source/sink infrastructure
> > >
> > > Neo4jDynamicSerializer
> > > Roles and responsibilities: serialize flume headers and body and use
> > >> the
> > > Neo4jRestClient to perform crud on neo4j
> > >
> > >
> > > Both the source and the sink infrastructure will use the same
> > > infrastructure above.
> > >
> > >
> > > That should be enough of a first cut for design/motivation and JIRA
> > > details, would love to kick off the discussion at this point.
> > > Thanks in advance
> > >
> > >
> > >
> > >
> > >
> > >> From: sxk1...@hotmail.com
> > >> To: dev@flume.apache.org
> > >> Subject: [Discuss graph source/sink design proposal]
> > >> Date: Sun, 12 Jun 2016 15:01:14 -0700
> > >>
> > >> Jira with details here:
> > > https://issues.apache.org/jira/browse/FLUME-2035
> > >>
> > >> Please respond with your questions.
> > >>
> >
  

Re: Enforce coding conventions at compilation time

2016-06-13 Thread Lior Zeno
We can use the same style Kafka is using:
https://github.com/apache/kafka/blob/trunk/checkstyle/checkstyle.xml

On Mon, Jun 13, 2016 at 10:14 PM, Hari Shreedharan 
wrote:

> I agree. Checkstyle is pretty useful - we should add it.
>
> On Sat, Jun 11, 2016 at 7:50 AM Lior Zeno  wrote:
>
> > Hi guys, we should make the reviewing process easier and more focused on
> > correctness rather than style issues. I suggest enforcing our code style
> (
> > https://cwiki.apache.org/confluence/display/FLUME/Code+Formatting) at
> > compile time using the maven checkstyle plugin (
> > https://maven.apache.org/plugins/maven-checkstyle-plugin/). This will
> make
> > the reviewing process easier, and will make sure that all committed code
> is
> > strictly following our code style.
> > What do you think?
> > Thanks
> >
>


Re: Enforce coding conventions at compilation time

2016-06-13 Thread Hari Shreedharan
I agree. Checkstyle is pretty useful - we should add it.

On Sat, Jun 11, 2016 at 7:50 AM Lior Zeno  wrote:

> Hi guys, we should make the reviewing process easier and more focused on
> correctness rather than style issues. I suggest enforcing our code style (
> https://cwiki.apache.org/confluence/display/FLUME/Code+Formatting) at
> compile time using the maven checkstyle plugin (
> https://maven.apache.org/plugins/maven-checkstyle-plugin/). This will make
> the reviewing process easier, and will make sure that all committed code is
> strictly following our code style.
> What do you think?
> Thanks
>


Re: [Discuss graph source/sink design proposal]

2016-06-13 Thread Lior Zeno
Got it.

On Mon, Jun 13, 2016 at 10:05 PM, Saikat Kanjilal 
wrote:

> That's a responsibility of the graph db not flume, flume is responsible
> for delivering the events and has no understanding of connectivity of the
> data.  The goal in using flume is to connect incoming data that is
> heterogeneous and transform that data before dumping it into the graph db.
>
> Sent from my iPhone
>
> > On Jun 13, 2016, at 11:09 AM, Lior Zeno  wrote:
> >
> > I got this part. How events are linked together? Do you expect an
> adjacency
> > list incorporated in the header?
> >
> > On Mon, Jun 13, 2016 at 8:59 PM, Saikat Kanjilal 
> > wrote:
> >
> >> The use case is a flume developer wanting to connect data coming into
> and
> >> out of flume sinks/sources to a graph database
> >>
> >> Sent from my iPhone
> >>
> >>> On Jun 13, 2016, at 10:55 AM, Lior Zeno  wrote:
> >>>
> >>> I'm not sure that I follow here. Can you please give a detailed
> use-case?
> >>>
>  On Mon, Jun 13, 2016 at 7:20 AM, Lior Zeno 
> wrote:
> 
>  Thanks. I'll review this and share my comments later on today.
> > On Jun 13, 2016 2:30 AM, "Saikat Kanjilal" 
> >> wrote:
> >
> > Motivation/Design: The graph/sink source plugin will be used to
> > custom transformations to connected data and dynamically apply these
> > transformations to send data to any sync, an example of a set of
> > destination sinks include elasticsearch/relational databases/spark
> rdd
> > etc.   Note that this plugin will serve as a source and a sink
> >> depending
> > on the configurations.  For v1 I am targeting that we plug into neo4j
> > database using the neo4j-jdbc interface (
> > https://github.com/larusba/neo4j-jdbc)
> > to build http payloads to talk to neo4j.  Once our neo4j interface
> will
> > allow us to build generic interfaces and plug in any graph store in
> the
> > future.
> > The
> > design will consist of a hybrid piece of infrastructure serving both
> as
> > a source and a sink connected to the current flume infrastructure
> > (since all the current sinks and sources are living in their own
> > directories I would suggest this live somewhere else in the flume
> > directory structure.  Listed below is some classes I have partially
> > configured to kick off this
> > discussion
> > NeoRestClient
> > Roles and Responsibilities: Interface to neo4j, unpack and pack data
> > structures to perform CRUD operation on a local or remote noe4j
> >> instance
> > APIS:
> > //inputs flume event
> > //outputs flume data structure identifying success metrics around the
> > operation
> > //description: transform the flume event into a graph node
> > insertNode(NeoNode nodeToInsert)
> > searchNode(NeoNode nodeToSearch,Algorithm useAStarOrDijkstra)
> > deleteNode(NeoNode nodeToDelete)
> >
> >
> > Note that I would also like to offer up the chance to present cipher
> > queries (http://neo4j.com/developer/cypher-query-language/) to the
> > source/sink infrastructure
> >
> > Neo4jDynamicSerializer
> > Roles and responsibilities: serialize flume headers and body and use
> >> the
> > Neo4jRestClient to perform crud on neo4j
> >
> >
> > Both the source and the sink infrastructure will use the same
> > infrastructure above.
> >
> >
> > That should be enough of a first cut for design/motivation and JIRA
> > details, would love to kick off the discussion at this point.
> > Thanks in advance
> >
> >
> >
> >
> >
> >> From: sxk1...@hotmail.com
> >> To: dev@flume.apache.org
> >> Subject: [Discuss graph source/sink design proposal]
> >> Date: Sun, 12 Jun 2016 15:01:14 -0700
> >>
> >> Jira with details here:
> > https://issues.apache.org/jira/browse/FLUME-2035
> >>
> >> Please respond with your questions.
> >>
>


Re: [Discuss graph source/sink design proposal]

2016-06-13 Thread Saikat Kanjilal
That's a responsibility of the graph db not flume, flume is responsible for 
delivering the events and has no understanding of connectivity of the data.  
The goal in using flume is to connect incoming data that is heterogeneous and 
transform that data before dumping it into the graph db.

Sent from my iPhone

> On Jun 13, 2016, at 11:09 AM, Lior Zeno  wrote:
> 
> I got this part. How events are linked together? Do you expect an adjacency
> list incorporated in the header?
> 
> On Mon, Jun 13, 2016 at 8:59 PM, Saikat Kanjilal 
> wrote:
> 
>> The use case is a flume developer wanting to connect data coming into and
>> out of flume sinks/sources to a graph database
>> 
>> Sent from my iPhone
>> 
>>> On Jun 13, 2016, at 10:55 AM, Lior Zeno  wrote:
>>> 
>>> I'm not sure that I follow here. Can you please give a detailed use-case?
>>> 
 On Mon, Jun 13, 2016 at 7:20 AM, Lior Zeno  wrote:
 
 Thanks. I'll review this and share my comments later on today.
> On Jun 13, 2016 2:30 AM, "Saikat Kanjilal" 
>> wrote:
> 
> Motivation/Design: The graph/sink source plugin will be used to
> custom transformations to connected data and dynamically apply these
> transformations to send data to any sync, an example of a set of
> destination sinks include elasticsearch/relational databases/spark rdd
> etc.   Note that this plugin will serve as a source and a sink
>> depending
> on the configurations.  For v1 I am targeting that we plug into neo4j
> database using the neo4j-jdbc interface (
> https://github.com/larusba/neo4j-jdbc)
> to build http payloads to talk to neo4j.  Once our neo4j interface will
> allow us to build generic interfaces and plug in any graph store in the
> future.
> The
> design will consist of a hybrid piece of infrastructure serving both as
> a source and a sink connected to the current flume infrastructure
> (since all the current sinks and sources are living in their own
> directories I would suggest this live somewhere else in the flume
> directory structure.  Listed below is some classes I have partially
> configured to kick off this
> discussion
> NeoRestClient
> Roles and Responsibilities: Interface to neo4j, unpack and pack data
> structures to perform CRUD operation on a local or remote noe4j
>> instance
> APIS:
> //inputs flume event
> //outputs flume data structure identifying success metrics around the
> operation
> //description: transform the flume event into a graph node
> insertNode(NeoNode nodeToInsert)
> searchNode(NeoNode nodeToSearch,Algorithm useAStarOrDijkstra)
> deleteNode(NeoNode nodeToDelete)
> 
> 
> Note that I would also like to offer up the chance to present cipher
> queries (http://neo4j.com/developer/cypher-query-language/) to the
> source/sink infrastructure
> 
> Neo4jDynamicSerializer
> Roles and responsibilities: serialize flume headers and body and use
>> the
> Neo4jRestClient to perform crud on neo4j
> 
> 
> Both the source and the sink infrastructure will use the same
> infrastructure above.
> 
> 
> That should be enough of a first cut for design/motivation and JIRA
> details, would love to kick off the discussion at this point.
> Thanks in advance
> 
> 
> 
> 
> 
>> From: sxk1...@hotmail.com
>> To: dev@flume.apache.org
>> Subject: [Discuss graph source/sink design proposal]
>> Date: Sun, 12 Jun 2016 15:01:14 -0700
>> 
>> Jira with details here:
> https://issues.apache.org/jira/browse/FLUME-2035
>> 
>> Please respond with your questions.
>> 


Re: [Discuss graph source/sink design proposal]

2016-06-13 Thread Lior Zeno
I got this part. How events are linked together? Do you expect an adjacency
list incorporated in the header?

On Mon, Jun 13, 2016 at 8:59 PM, Saikat Kanjilal 
wrote:

> The use case is a flume developer wanting to connect data coming into and
> out of flume sinks/sources to a graph database
>
> Sent from my iPhone
>
> > On Jun 13, 2016, at 10:55 AM, Lior Zeno  wrote:
> >
> > I'm not sure that I follow here. Can you please give a detailed use-case?
> >
> >> On Mon, Jun 13, 2016 at 7:20 AM, Lior Zeno  wrote:
> >>
> >> Thanks. I'll review this and share my comments later on today.
> >>> On Jun 13, 2016 2:30 AM, "Saikat Kanjilal" 
> wrote:
> >>>
> >>> Motivation/Design: The graph/sink source plugin will be used to
> >>> custom transformations to connected data and dynamically apply these
> >>> transformations to send data to any sync, an example of a set of
> >>> destination sinks include elasticsearch/relational databases/spark rdd
> >>> etc.   Note that this plugin will serve as a source and a sink
> depending
> >>> on the configurations.  For v1 I am targeting that we plug into neo4j
> >>> database using the neo4j-jdbc interface (
> >>> https://github.com/larusba/neo4j-jdbc)
> >>> to build http payloads to talk to neo4j.  Once our neo4j interface will
> >>> allow us to build generic interfaces and plug in any graph store in the
> >>> future.
> >>> The
> >>> design will consist of a hybrid piece of infrastructure serving both as
> >>> a source and a sink connected to the current flume infrastructure
> >>> (since all the current sinks and sources are living in their own
> >>> directories I would suggest this live somewhere else in the flume
> >>> directory structure.  Listed below is some classes I have partially
> >>> configured to kick off this
> >>> discussion
> >>> NeoRestClient
> >>> Roles and Responsibilities: Interface to neo4j, unpack and pack data
> >>> structures to perform CRUD operation on a local or remote noe4j
> instance
> >>> APIS:
> >>> //inputs flume event
> >>> //outputs flume data structure identifying success metrics around the
> >>> operation
> >>> //description: transform the flume event into a graph node
> >>> insertNode(NeoNode nodeToInsert)
> >>> searchNode(NeoNode nodeToSearch,Algorithm useAStarOrDijkstra)
> >>> deleteNode(NeoNode nodeToDelete)
> >>>
> >>>
> >>> Note that I would also like to offer up the chance to present cipher
> >>> queries (http://neo4j.com/developer/cypher-query-language/) to the
> >>> source/sink infrastructure
> >>>
> >>> Neo4jDynamicSerializer
> >>> Roles and responsibilities: serialize flume headers and body and use
> the
> >>> Neo4jRestClient to perform crud on neo4j
> >>>
> >>>
> >>> Both the source and the sink infrastructure will use the same
> >>> infrastructure above.
> >>>
> >>>
> >>> That should be enough of a first cut for design/motivation and JIRA
> >>> details, would love to kick off the discussion at this point.
> >>> Thanks in advance
> >>>
> >>>
> >>>
> >>>
> >>>
>  From: sxk1...@hotmail.com
>  To: dev@flume.apache.org
>  Subject: [Discuss graph source/sink design proposal]
>  Date: Sun, 12 Jun 2016 15:01:14 -0700
> 
>  Jira with details here:
> >>> https://issues.apache.org/jira/browse/FLUME-2035
> 
>  Please respond with your questions.
> >>
> >>
>


Re: [Discuss graph source/sink design proposal]

2016-06-13 Thread Saikat Kanjilal
The use case is a flume developer wanting to connect data coming into and out 
of flume sinks/sources to a graph database

Sent from my iPhone

> On Jun 13, 2016, at 10:55 AM, Lior Zeno  wrote:
> 
> I'm not sure that I follow here. Can you please give a detailed use-case?
> 
>> On Mon, Jun 13, 2016 at 7:20 AM, Lior Zeno  wrote:
>> 
>> Thanks. I'll review this and share my comments later on today.
>>> On Jun 13, 2016 2:30 AM, "Saikat Kanjilal"  wrote:
>>> 
>>> Motivation/Design: The graph/sink source plugin will be used to
>>> custom transformations to connected data and dynamically apply these
>>> transformations to send data to any sync, an example of a set of
>>> destination sinks include elasticsearch/relational databases/spark rdd
>>> etc.   Note that this plugin will serve as a source and a sink depending
>>> on the configurations.  For v1 I am targeting that we plug into neo4j
>>> database using the neo4j-jdbc interface (
>>> https://github.com/larusba/neo4j-jdbc)
>>> to build http payloads to talk to neo4j.  Once our neo4j interface will
>>> allow us to build generic interfaces and plug in any graph store in the
>>> future.
>>> The
>>> design will consist of a hybrid piece of infrastructure serving both as
>>> a source and a sink connected to the current flume infrastructure
>>> (since all the current sinks and sources are living in their own
>>> directories I would suggest this live somewhere else in the flume
>>> directory structure.  Listed below is some classes I have partially
>>> configured to kick off this
>>> discussion
>>> NeoRestClient
>>> Roles and Responsibilities: Interface to neo4j, unpack and pack data
>>> structures to perform CRUD operation on a local or remote noe4j instance
>>> APIS:
>>> //inputs flume event
>>> //outputs flume data structure identifying success metrics around the
>>> operation
>>> //description: transform the flume event into a graph node
>>> insertNode(NeoNode nodeToInsert)
>>> searchNode(NeoNode nodeToSearch,Algorithm useAStarOrDijkstra)
>>> deleteNode(NeoNode nodeToDelete)
>>> 
>>> 
>>> Note that I would also like to offer up the chance to present cipher
>>> queries (http://neo4j.com/developer/cypher-query-language/) to the
>>> source/sink infrastructure
>>> 
>>> Neo4jDynamicSerializer
>>> Roles and responsibilities: serialize flume headers and body and use the
>>> Neo4jRestClient to perform crud on neo4j
>>> 
>>> 
>>> Both the source and the sink infrastructure will use the same
>>> infrastructure above.
>>> 
>>> 
>>> That should be enough of a first cut for design/motivation and JIRA
>>> details, would love to kick off the discussion at this point.
>>> Thanks in advance
>>> 
>>> 
>>> 
>>> 
>>> 
 From: sxk1...@hotmail.com
 To: dev@flume.apache.org
 Subject: [Discuss graph source/sink design proposal]
 Date: Sun, 12 Jun 2016 15:01:14 -0700
 
 Jira with details here:
>>> https://issues.apache.org/jira/browse/FLUME-2035
 
 Please respond with your questions.
>> 
>> 


Re: [Discuss graph source/sink design proposal]

2016-06-13 Thread Lior Zeno
I'm not sure that I follow here. Can you please give a detailed use-case?

On Mon, Jun 13, 2016 at 7:20 AM, Lior Zeno  wrote:

> Thanks. I'll review this and share my comments later on today.
> On Jun 13, 2016 2:30 AM, "Saikat Kanjilal"  wrote:
>
>> Motivation/Design: The graph/sink source plugin will be used to
>> custom transformations to connected data and dynamically apply these
>> transformations to send data to any sync, an example of a set of
>> destination sinks include elasticsearch/relational databases/spark rdd
>> etc.   Note that this plugin will serve as a source and a sink depending
>>  on the configurations.  For v1 I am targeting that we plug into neo4j
>> database using the neo4j-jdbc interface (
>> https://github.com/larusba/neo4j-jdbc)
>>  to build http payloads to talk to neo4j.  Once our neo4j interface will
>>  allow us to build generic interfaces and plug in any graph store in the
>>  future.
>> The
>>  design will consist of a hybrid piece of infrastructure serving both as
>>  a source and a sink connected to the current flume infrastructure
>> (since all the current sinks and sources are living in their own
>> directories I would suggest this live somewhere else in the flume
>> directory structure.  Listed below is some classes I have partially
>> configured to kick off this
>> discussion
>> NeoRestClient
>> Roles and Responsibilities: Interface to neo4j, unpack and pack data
>> structures to perform CRUD operation on a local or remote noe4j instance
>> APIS:
>> //inputs flume event
>> //outputs flume data structure identifying success metrics around the
>> operation
>> //description: transform the flume event into a graph node
>> insertNode(NeoNode nodeToInsert)
>> searchNode(NeoNode nodeToSearch,Algorithm useAStarOrDijkstra)
>> deleteNode(NeoNode nodeToDelete)
>>
>>
>> Note that I would also like to offer up the chance to present cipher
>> queries (http://neo4j.com/developer/cypher-query-language/) to the
>> source/sink infrastructure
>>
>> Neo4jDynamicSerializer
>> Roles and responsibilities: serialize flume headers and body and use the
>> Neo4jRestClient to perform crud on neo4j
>>
>>
>> Both the source and the sink infrastructure will use the same
>> infrastructure above.
>>
>>
>> That should be enough of a first cut for design/motivation and JIRA
>> details, would love to kick off the discussion at this point.
>> Thanks in advance
>>
>>
>>
>>
>>
>> > From: sxk1...@hotmail.com
>> > To: dev@flume.apache.org
>> > Subject: [Discuss graph source/sink design proposal]
>> > Date: Sun, 12 Jun 2016 15:01:14 -0700
>> >
>> > Jira with details here:
>> https://issues.apache.org/jira/browse/FLUME-2035
>> >
>> > Please respond with your questions.
>>
>
>


Re: Review Request 48161: FLUME-2918: TaildirSource is underperforming with huge parent directories

2016-06-13 Thread Attila Simon

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48161/
---

(Updated June 13, 2016, 2:14 p.m.)


Review request for Flume.


Changes
---

address first branch requests


Bugs: FLUME-2918
https://issues.apache.org/jira/browse/FLUME-2918


Repository: flume-git


Description
---

The way TailDir source checks which files should be tracked was improved. 
Existing implementation caused unneccessary high CPU usage for huge (+50K 
files) directories. This fix allows users to eliminate continous listing of 
parent directory (on each Source.process invocation) and introduce a more 
performant method for listing&matching files.

used java.nio.file.DirectoryStream to filter files
made pattern match calculation optionally cached
added junit tests
added javadoc
added license


Diffs (updated)
-

  
flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/ReliableTaildirEventReader.java
 5b6d465 
  
flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
 PRE-CREATION 
  
flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSource.java
 8816327 
  
flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSourceConfigurationConstants.java
 6165276 
  
flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
 PRE-CREATION 
  
flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java
 f9e614c 

Diff: https://reviews.apache.org/r/48161/diff/


Testing
---

mvn clean install -DskipTests -> built
junit tests for flume-taildir-source module -> passed


Thanks,

Attila Simon



Re: Review Request 48161: FLUME-2918: TaildirSource is underperforming with huge parent directories

2016-06-13 Thread Attila Simon


> On June 7, 2016, 1:44 a.m., Mike Percy wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java,
> >  line 113
> > 
> >
> > Why not set these default values set in the class definition i.e. 
> > private long lastSeenParentDirMTime = -1 rather than specifically 
> > initializing them in the constructor? It's less code and more obvious what 
> > is happening.

This value is not relevant to the instance since it will be replaced instantly 
by the first iteration, but I'm fine with changing it.


> On June 7, 2016, 1:44 a.m., Mike Percy wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java,
> >  line 150
> > 
> >
> > Why are we providing a "sorted by last (cached) modification time" as a 
> > guarantee here? Are we using that guarantee somehow? See below on the cost 
> > of that guarantee.

This guarantee was part of the original implementation (my change didn't make 
it worse). I wanted to be non-intrusive.


> On June 7, 2016, 1:44 a.m., Mike Percy wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java,
> >  line 156
> > 
> >
> > Can this method ever return null? If so, let's make sure that doesn't 
> > happen. It should return Collections.emptyList() if it can't find anything.

It is guaranteed that it doesn't return null (it is in tha javadoc as well). 
Prerequisites of this guarantee is that lastMachedFiles starts with en empty 
list and getMatchingFilesNoCache won't change its guarantee that it returns not 
null. I think because both are private the additional explicit check for 
if(result == null) would be unneccessary.


> On June 7, 2016, 1:44 a.m., Mike Percy wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java,
> >  line 79
> > 
> >
> > Add comments here for the variables that need explanation. For example, 
> > what are the units on these timestamps? Milliseconds?

comment added to member declaration


> On June 7, 2016, 1:44 a.m., Mike Percy wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java,
> >  line 86
> > 
> >
> > nit: semi valid? Sounds like inventing terminology. If so, avoid. Also, 
> > s/consist/consisting/

done


> On June 7, 2016, 1:44 a.m., Mike Percy wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java,
> >  line 103
> > 
> >
> > why final?

final removed


> On June 7, 2016, 1:44 a.m., Mike Percy wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java,
> >  line 124
> > 
> >
> > grammar nit: s/which are matching regex pattern passed as object 
> > instantiation/that match the regex pattern passed in during object 
> > instantiation/

done


> On June 7, 2016, 1:44 a.m., Mike Percy wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java,
> >  line 125
> > 
> >
> > grammar nit: s/frequently/frequent/

done


> On June 7, 2016, 1:44 a.m., Mike Percy wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java,
> >  line 127
> > 
> >
> > nit: s/instruct/trigger/ here and below

done


> On June 7, 2016, 1:44 a.m., Mike Percy wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java,
> >  line 161
> > 
> >
> > nit: spurious parenthesis before lastSeenParentDirMTime

the condition was described in the javadoc, unfortunately it is ugly but needed


> On June 7, 2016, 1:44 a.m., Mike Percy wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java,
> >  line 164
> > 
> >
> > This sorting function seems problematic to me. It can call stat() up to 
> > n^2 times (assuming quicksort). Shouldn't we get the last modified time of 
> > each file in the list once and then sort it? Why are we sor