date:20170227

Re: Proposal: CompositeAccumulation for Windowed Operator

2017-02-27 Thread Chinmay Kolhatkar

Thanks Bright. I've reviewed your PR. It looks good.. Just a minor change
required. Please see my comment there.

On Mon, Feb 27, 2017 at 11:26 PM, Bright Chen 
wrote:

> A jira created: https://issues.apache.org/jira/browse/APEXMALHAR-2428
>
>
> On Mon, Feb 27, 2017 at 9:53 AM, Bright Chen 
> wrote:
>
> > I think Chimay's proposal could make application more clear and increase
> > the performance as locate of key/window cost most of time.
> >
> > A suggested usage for Composite Accumulation could as following:
> >
> > *//following is the sample code how to add sub accumulations*
> >
> > *CompositeAccumulation accumulations = new
> > CompositeAccumulation<>();*
> >
> > *AccumulationTag sumTag =
> > accumulations.addAccumulation((Accumulation)new SumAccumulation());*
> >
> > *AccumulationTag countTag =
> > accumulations.addAccumulation((Accumulation)new Count());*
> >
> > *AccumulationTag maxTag = accumulations.addAccumulation(new Max());*
> >
> > *AccumulationTag minTag = accumulations.addAccumulation(new Min());*
> >
> > *//following is the sample how to get the sub-accumulation output*
> >
> > *accumulations.getSubOutput(sumTag, outputValues)*
> >
> > *accumulations.getSubOutput(countTag, outputValues)*
> >
> > *accumulations.getSubOutput(maxTag, outputValues)*
> >
> > *accumulations.getSubOutput(minTag, outputValues)*
> >
> >
> > Thanks
> >
> > Bright
> >
> > On Sun, Feb 26, 2017 at 10:33 PM, Chinmay Kolhatkar <
> > chin...@datatorrent.com> wrote:
> >
> >> Dear Community,
> >>
> >> Currently we have accumulations for individual types of accumulations.
> >> But if one wants to do more than one accumulations in a single stage of
> >> Windowed Operator it is not possible.
> >>
> >> I want to propose an idea about "CompositeAccumulation" where more than
> >> one
> >> accumulation can be configured and this accoumulation can relay on
> >> multiple
> >> accumulations to generate final result/output.
> >>
> >> The output can be either of the 2 forms:
> >> 1. Just the list of outputs with AccumulationTags as identifiers.
> >> 2. Merge the results of multiple accumulations using some user defined
> >> logic.
> >>  For eg. In aggregation case, Input POJO to this accumulation can
> be a
> >> POJO containing NumberOfOrders as field and in output one might need to
> >> generate a final(single) POJO which contains result of multiple
> >> accumulations like SUM, COUNT on NumberOfOrders as different fields of
> >> outgoing POJO.
> >>
> >> I particularly see the use of this for Multiple Aggregation which we
> would
> >> like to do in SQL on Apex Integration.
> >>
> >> Please share your thoughts on the same.
> >>
> >> Thanks,
> >> Chinmay.
> >>
> >
> >
>

Re: Maven build package error with 3.5

2017-02-27 Thread Sanjay Pujare

The root cause seems to be embedded in your output:

Could not transfer artifact org.apache.apex:malhar-library:pom:3.5.0
from/to central (
https://repo.maven.apache.org/maven2):
sun.security.validator.ValidatorException: PKIX path building failed:
sun.security.provider.certpath.SunCertPathBuilderException: unable to find
valid certification path to requested target -> [Help 1]

Could this be because a corporate firewall is stopping this access?  You
can check this out
http://stackoverflow.com/questions/25911623/problems-using-maven-and-ssl-behind-proxy


On Mon, Feb 27, 2017 at 3:15 PM, Dongming Liang 
wrote:

> It was running well with 3.4, but now failing with Apex 3.5
>
> ➜  log-aggregator git:(apex-tcp) ✗ mvn package -DskipTests
> [INFO] Scanning for projects...
> [INFO]
> [INFO]
> 
> [INFO] Building Aggregator 1.0-SNAPSHOT
> [INFO]
> 
> Downloading:
> https://repo.maven.apache.org/maven2/org/apache/apex/malhar-
> library/3.5.0/malhar-library-3.5.0.pom
> Downloading:
> https://repo.maven.apache.org/maven2/org/apache/apex/apex-
> api/3.5.0/apex-api-3.5.0.pom
> Downloading:
> https://repo.maven.apache.org/maven2/org/apache/apex/apex-
> common/3.5.0/apex-common-3.5.0.pom
> Downloading:
> https://repo.maven.apache.org/maven2/org/apache/apex/apex-
> engine/3.5.0/apex-engine-3.5.0.pom
> [INFO]
> 
> [INFO] BUILD FAILURE
> [INFO]
> 
> [INFO] Total time: 10.435 s
> [INFO] Finished at: 2017-02-27T15:03:06-08:00
> [INFO] Final Memory: 11M/245M
> [INFO]
> 
> [ERROR] Failed to execute goal on project log-aggregator: Could not resolve
> dependencies for project
> com.capitalone.vault8:log-aggregator:jar:1.0-SNAPSHOT: Failed to collect
> dependencies at org.apache.apex:malhar-library:jar:3.5.0: Failed to read
> artifact descriptor for org.apache.apex:malhar-library:jar:3.5.0: Could
> not
> transfer artifact org.apache.apex:malhar-library:pom:3.5.0 from/to
> central (
> https://repo.maven.apache.org/maven2):
> sun.security.validator.ValidatorException: PKIX path building failed:
> sun.security.provider.certpath.SunCertPathBuilderException: unable to find
> valid certification path to requested target -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/
> DependencyResolutionException
> ➜  log-aggregator git:(apex-tcp) ✗
>
> The pom file is:
>
> 
> http://maven.apache.org/POM/4.0.0;
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
> xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
> http://maven.apache.org/xsd/maven-4.0.0.xsd;>
>   4.0.0
>
>   com.mycom.teamx
>   log-aggregator
>   jar
>   1.0-SNAPSHOT
>
>   
>   Aggregator
>   Log Aggregator
>
>   
> 
> 3.5.0
> lib/*.jar
>   
>
>   
> 
>
>  org.apache.maven.plugins
>  maven-eclipse-plugin
>  2.9
>  
>true
>  
>
>
>  maven-compiler-plugin
>  3.3
>  
>UTF-8
>1.7
>1.7
>true
>false
>true
>true
>  
>
>
>  maven-dependency-plugin
>  2.8
>  
>
>  copy-dependencies
>  prepare-package
>  
>copy-dependencies
>  
>  
>target/deps
>runtime
>  
>
>  
>
>
>
>  maven-assembly-plugin
>  
>
>  app-package-assembly
>  package
>  
>single
>  
>  
>${project.artifactId}-${project.version}
> -apexapp
>false
>
>  src/assemble/appPackage.xml
>
>
>  0755
>
>
>  
>${apex.apppackage.classpath}
>${apex.version}
>
> ${project.groupId}
>
> ${project.artifactId}
>
> ${project.version}
>
> ${project.name}
>
> ${project.description} Package-Description>
>  
>
>  
>
>  
>
>
>
>  maven-antrun-plugin
>  1.7
>  
>
>  package
>  
>
>

Maven build package error with 3.5

2017-02-27 Thread Dongming Liang

It was running well with 3.4, but now failing with Apex 3.5

➜  log-aggregator git:(apex-tcp) ✗ mvn package -DskipTests
[INFO] Scanning for projects...
[INFO]
[INFO]

[INFO] Building Aggregator 1.0-SNAPSHOT
[INFO]

Downloading:
https://repo.maven.apache.org/maven2/org/apache/apex/malhar-library/3.5.0/malhar-library-3.5.0.pom
Downloading:
https://repo.maven.apache.org/maven2/org/apache/apex/apex-api/3.5.0/apex-api-3.5.0.pom
Downloading:
https://repo.maven.apache.org/maven2/org/apache/apex/apex-common/3.5.0/apex-common-3.5.0.pom
Downloading:
https://repo.maven.apache.org/maven2/org/apache/apex/apex-engine/3.5.0/apex-engine-3.5.0.pom
[INFO]

[INFO] BUILD FAILURE
[INFO]

[INFO] Total time: 10.435 s
[INFO] Finished at: 2017-02-27T15:03:06-08:00
[INFO] Final Memory: 11M/245M
[INFO]

[ERROR] Failed to execute goal on project log-aggregator: Could not resolve
dependencies for project
com.capitalone.vault8:log-aggregator:jar:1.0-SNAPSHOT: Failed to collect
dependencies at org.apache.apex:malhar-library:jar:3.5.0: Failed to read
artifact descriptor for org.apache.apex:malhar-library:jar:3.5.0: Could not
transfer artifact org.apache.apex:malhar-library:pom:3.5.0 from/to central (
https://repo.maven.apache.org/maven2):
sun.security.validator.ValidatorException: PKIX path building failed:
sun.security.provider.certpath.SunCertPathBuilderException: unable to find
valid certification path to requested target -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions,
please read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
➜  log-aggregator git:(apex-tcp) ✗

The pom file is:


http://maven.apache.org/POM/4.0.0;
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
  4.0.0

  com.mycom.teamx
  log-aggregator
  jar
  1.0-SNAPSHOT

  
  Aggregator
  Log Aggregator

  

3.5.0
lib/*.jar
  

  

   
 org.apache.maven.plugins
 maven-eclipse-plugin
 2.9
 
   true
 
   
   
 maven-compiler-plugin
 3.3
 
   UTF-8
   1.7
   1.7
   true
   false
   true
   true
 
   
   
 maven-dependency-plugin
 2.8
 
   
 copy-dependencies
 prepare-package
 
   copy-dependencies
 
 
   target/deps
   runtime
 
   
 
   

   
 maven-assembly-plugin
 
   
 app-package-assembly
 package
 
   single
 
 
   
${project.artifactId}-${project.version}-apexapp
   false
   
 src/assemble/appPackage.xml
   
   
 0755
   
   
 
   ${apex.apppackage.classpath}
   ${apex.version}

${project.groupId}

${project.artifactId}

${project.version}

${project.name}

${project.description}
 
   
 
   
 
   

   
 maven-antrun-plugin
 1.7
 
   
 package
 
   
 
   
 
 
   run
 
   
 
   





  

  


  org.apache.apex
  malhar-library
  ${apex.version}
  

  *
  *

  


  org.apache.apex
  apex-api
  ${apex.version}


  org.apache.apex
  apex-common
  ${apex.version}
  

  *
  *

  



  junit
  junit
  4.10
  test


  org.apache.apex
  apex-engine
  ${apex.version}
  test
  

  *
  *

  


  joda-time
  joda-time
  2.9.1


  javax.validation
  validation-api
  1.1.0.Final


  org.apache.hadoop
  hadoop-common
  2.3.0

  




Thanks,
- Dongming

[jira] [Commented] (APEXMALHAR-2366) Apply BloomFilter to Bucket

2017-02-27 Thread bright chen (JIRA)


[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886659#comment-15886659
 ] 

bright chen commented on APEXMALHAR-2366:
-

Hi [~bhupesh]
The only difference as I think is this BloomFilter implementation used 
SerializationBuffer to save some copy and garbage collection. I am not sure how 
much impact on performance. Another thing is Chaitanya's BloomFilter is in 
Megh, It at least need to move to the malhar lib before can use it. and I am 
not sure if there any license issue neither

> Apply BloomFilter to Bucket
> ---
>
> Key: APEXMALHAR-2366
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2366
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: bright chen
>Assignee: bright chen
>   Original Estimate: 192h
>  Remaining Estimate: 192h
>
> The bucket get() will check the cache and then check from the stored files if 
> the entry is not in the cache. The checking from files is a pretty heavy 
> operation due to file seek.
> The chance of check from file is very high if the key range are large.
> Suggest to apply BloomFilter for bucket to reduce the chance read from file.
> If the buckets were managed by ManagedStateImpl, the entry of bucket would be 
> very huge and the BloomFilter maybe not useful after a while. But If the 
> buckets were managed by ManagedTimeUnifiedStateImpl, each bucket keep certain 
> amount of entry and BloomFilter would be very useful.
> For implementation:
> The Guava already have BloomFilter and the interface are pretty simple and 
> fit for our case. But Guava 11 is not compatible with Guava 14 (Guava 11 use 
> Sink while Guava 14 use PrimitiveSink).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: Java packages: legacy -> org.apache.apex

2017-02-27 Thread Vlad Rozov

Let's not be confused with open source == ASF, it is not. Not all open 
source projects are part of Apache. Majority of Apache projects do use 
"org.apache." package names.


Thank you,

Vlad

//On 2/27/17 10:24, Sanjay Pujare wrote:

+1 for bullet 1 assuming new code implies brand new classes (since it
doesn't involve any backward compatibility issues). We can always review
contributor PRs to make sure new code is added with new package naming
guidelines.

But for 2 and 3 I have a question/comment: is there even a need to do it?
There is lots of open source code with package names like com.google.* and
com.sun.* etc and as far as I know there are no moves afoot to rename these
packages. The renaming itself doesn't add any new functionality or
technical capabilities but introduces instability in Apex code as well as
user code. Just a thought...

On Mon, Feb 27, 2017 at 8:23 AM, Chinmay Kolhatkar 
wrote:


Thomas,

I agree with you that we need this migration to be done but I have a
different opinion on how to execute this.
I think if we do this in phases as described above, users might end up in
more confusion.

For doing this migration, I think it should follow these steps:
1. Whether for operator library or core components, we should announce
widely on dev and users mailing list that "...such change is going to
happen in next release"
2 Take up the work all at once and not phase it.

Thanks,
Chinmay.



On Mon, Feb 27, 2017 at 9:09 PM, Thomas Weise  wrote:


Hi,

This topic has come up on several PRs and I think it warrants a broader
discussion.

At the time of incubation, the decision was to defer change of Java
packages from com.datatorrent to org.apache.apex till next major release

to

ensure backward compatibility for users.

Unfortunately that has lead to some confusion, as contributors continue

to

add new code under legacy packages.

It is also a wider issue that examples for using Apex continue to refer

to

com.datatorrent packages, nearly one year after graduation. More and more
user code is being built on top of something that needs to change, the

can

is being kicked down the road and users will face more changes later.

I would like to propose the following:

1. All new code has to be submitted under org.apache.apex packages

2. Not all code is under backward compatibility restriction and in those
cases we can rename the packages right away. Examples: buffer server,
engine, demos/examples, benchmarks

3. Discuss when the core API and operators can be changed. For operators

we

have a bit more freedom to do changes before a major release as most of
them are marked @Evolving and users have the ability to continue using
prior version of Malhar with newer engine due to engine backward
compatibility guarantee.

Thanks,
Thomas

Re: Java packages: legacy -> org.apache.apex

2017-02-27 Thread Pramod Immaneni

For malhar, for existing operators, I prefer we do this as part of the
planned refactoring for breaking the monolith modules into baby packages
and would also prefer deprecating the existing operators in place. This
will help us achieve two things. First, the user will see all the new
changes at once as opposed to dealing with it twice (with package rename
and dependency changes) and second it will allow for a smoother transition
as the existing code will still work in a deprecated state. It will also
give a more consistent structure to malhar. For new operators, we can go
with the new package path but we need to ensure they will get moved into
the baby packages as well.

For demos, we can modify the paths as the apps are typically used wholesale
and the interface is typically manual interaction.

For core, if we are adding new api subsystems, like the launcher api we
added recently for example, we can go with new package path but if we are
making incremental additions to existing functionality, I feel it is better
to keep it in the same package. I also prefer we keep the package of the
implementation classes consistent with api, for understandability and
readability of the code. So, for example, we don't change package path of
LogicalPlan as it is an implementation of DAG. It is subjective, but it
will be good if we can also do the same with classes closely related to the
implementation classes as well. Maybe we can moving these on a package by
package basis, like everything in com.datatorrent.stram.engine could be
moved. For completely internal components like buffer server, we can move
them wholesale. We can consider moving all api and classes, when we go to
next major release but would like to see if we can find a way to support
existing api for one more major release in deprecated mode.

Thanks

On Mon, Feb 27, 2017 at 7:39 AM, Thomas Weise  wrote:

> Hi,
>
> This topic has come up on several PRs and I think it warrants a broader
> discussion.
>
> At the time of incubation, the decision was to defer change of Java
> packages from com.datatorrent to org.apache.apex till next major release to
> ensure backward compatibility for users.
>
> Unfortunately that has lead to some confusion, as contributors continue to
> add new code under legacy packages.
>
> It is also a wider issue that examples for using Apex continue to refer to
> com.datatorrent packages, nearly one year after graduation. More and more
> user code is being built on top of something that needs to change, the can
> is being kicked down the road and users will face more changes later.
>
> I would like to propose the following:
>
> 1. All new code has to be submitted under org.apache.apex packages
>
> 2. Not all code is under backward compatibility restriction and in those
> cases we can rename the packages right away. Examples: buffer server,
> engine, demos/examples, benchmarks
>
> 3. Discuss when the core API and operators can be changed. For operators we
> have a bit more freedom to do changes before a major release as most of
> them are marked @Evolving and users have the ability to continue using
> prior version of Malhar with newer engine due to engine backward
> compatibility guarantee.
>
> Thanks,
> Thomas
>

Re: [DISCUSS] Proposal for adapting Malhar operators for batch use cases

2017-02-27 Thread David Yan

I now see your rationale on putting the filename in the window.
As far as I understand, the reasons why the filename is not part of the key
and the Global Window is not used are:

1) The files are processed in sequence, not in parallel
2) The windowed operator should not keep the state associated with the file
when the processing of the file is done
3) The trigger should be fired for the file when a file is done processing.

However, if the file is just a sequence has nothing to do with a timestamp,
assigning a timestamp to a file is not an intuitive thing to do and would
just create confusions to the users, especially when it's used as an
example for new users.

How about having a separate class called SequenceWindow? And perhaps
TimeWindow can inherit from it?

David

On Mon, Feb 27, 2017 at 8:58 AM, Thomas Weise  wrote:

> On Mon, Feb 27, 2017 at 8:50 AM, Bhupesh Chawda 
> wrote:
>
> > I think my comments related to count based windows might be causing
> > confusion. Let's not discuss count based scenarios for now.
> >
> > Just want to make sure we are on the same page wrt. the "each file is a
> > batch" use case. As mentioned by Thomas, the each tuple from the same
> file
> > has the same timestamp (which is just a sequence number) and that helps
> > keep tuples from each file in a separate window.
> >
>
> Yes, in this case it is a sequence number, but it could be a time stamp
> also, depending on the file naming convention. And if it was event time
> processing, the watermark would be derived from records within the file.
>
> Agreed, the source should have a mechanism to control the time stamp
> extraction along with everything else pertaining to the watermark
> generation.
>
>
> > We could also implement a "timestampExtractor" interface to identify the
> > timestamp (sequence number) for a file.
> >
> > ~ Bhupesh
> >
> >
> > ___
> >
> > Bhupesh Chawda
> >
> > E: bhup...@datatorrent.com | Twitter: @bhupeshsc
> >
> > www.datatorrent.com  |  apex.apache.org
> >
> >
> >
> > On Mon, Feb 27, 2017 at 9:52 PM, Thomas Weise  wrote:
> >
> > > I don't think this is a use case for count based window.
> > >
> > > We have multiple files that are retrieved in a sequence and there is no
> > > knowledge of the number of records per file. The requirement is to
> > > aggregate each file separately and emit the aggregate when the file is
> > read
> > > fully. There is no concept of "end of something" for an individual key
> > and
> > > global window isn't applicable.
> > >
> > > However, as already explained and implemented by Bhupesh, this can be
> > > solved using watermark and window (in this case the window timestamp
> > isn't
> > > a timestamp, but a file sequence, but that doesn't matter.
> > >
> > > Thomas
> > >
> > >
> > > On Mon, Feb 27, 2017 at 8:05 AM, David Yan  wrote:
> > >
> > > > I don't think this is the way to go. Global Window only means the
> > > timestamp
> > > > does not matter (or that there is no timestamp). It does not
> > necessarily
> > > > mean it's a large batch. Unless there is some notion of event time
> for
> > > each
> > > > file, you don't want to embed the file into the window itself.
> > > >
> > > > If you want the result broken up by file name, and if the files are
> to
> > be
> > > > processed in parallel, I think making the file name be part of the
> key
> > is
> > > > the way to go. I think it's very confusing if we somehow make the
> file
> > to
> > > > be part of the window.
> > > >
> > > > For count-based window, it's not implemented yet and you're welcome
> to
> > > add
> > > > that feature. In case of count-based windows, there would be no
> notion
> > of
> > > > time and you probably only trigger at the end of each window. In the
> > case
> > > > of count-based windows, the watermark only matters for batch since
> you
> > > need
> > > > a way to know when the batch has ended (if the count is 10, the
> number
> > of
> > > > tuples in the batch is let's say 105, you need a way to end the last
> > > window
> > > > with 5 tuples).
> > > >
> > > > David
> > > >
> > > > On Mon, Feb 27, 2017 at 2:41 AM, Bhupesh Chawda <
> > bhup...@datatorrent.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi David,
> > > > >
> > > > > Thanks for your comments.
> > > > >
> > > > > The wordcount example that I created based on the windowed operator
> > > does
> > > > > processing of word counts per file (each file as a separate batch),
> > > i.e.
> > > > > process counts for each file and dump into separate files.
> > > > > As I understand Global window is for one large batch; i.e. all
> > incoming
> > > > > data falls into the same batch. This could not be processed using
> > > > > GlobalWindow option as we need more than one windows. In this
> case, I
> > > > > configured the windowed operator to have time windows of 1ms each
> and
> > > > > passed data for each file

[GitHub] apex-malhar pull request #566: APEXMALHAR-2428 CompositeAccumulation for win...

2017-02-27 Thread brightchen

GitHub user brightchen opened a pull request:

https://github.com/apache/apex-malhar/pull/566

APEXMALHAR-2428 CompositeAccumulation for windowed operator



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brightchen/apex-malhar APEXMALHAR-2428

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/566.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #566


commit e2b1d31e5e69c0515db68baccf1fb97f8f7077c0
Author: brightchen 
Date:   2017-01-27T20:37:07Z

APEXMALHAR-2428 CompositeAccumulation for windowed operator




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (APEXMALHAR-2428) CompositeAccumulation for windowed operator

2017-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886304#comment-15886304
 ] 

ASF GitHub Bot commented on APEXMALHAR-2428:


GitHub user brightchen opened a pull request:

https://github.com/apache/apex-malhar/pull/566

APEXMALHAR-2428 CompositeAccumulation for windowed operator



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brightchen/apex-malhar APEXMALHAR-2428

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/566.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #566


commit e2b1d31e5e69c0515db68baccf1fb97f8f7077c0
Author: brightchen 
Date:   2017-01-27T20:37:07Z

APEXMALHAR-2428 CompositeAccumulation for windowed operator




> CompositeAccumulation for windowed operator
> ---
>
> Key: APEXMALHAR-2428
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2428
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>Reporter: bright chen
>Assignee: bright chen
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (APEXMALHAR-2395) create MultiAccumulation

2017-02-27 Thread bright chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bright chen resolved APEXMALHAR-2395.
-
Resolution: Duplicate

duplicate with https://issues.apache.org/jira/browse/APEXMALHAR-2428

> create MultiAccumulation
> 
>
> Key: APEXMALHAR-2395
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2395
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: bright chen
>Assignee: bright chen
>
> create  MultiAccumulation which support SUM, MAX, MIN, COUNT. And AVG can be 
> computed by SUM and COUNT when query.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: Java packages: legacy -> org.apache.apex

2017-02-27 Thread Sanjay Pujare

+1 for bullet 1 assuming new code implies brand new classes (since it
doesn't involve any backward compatibility issues). We can always review
contributor PRs to make sure new code is added with new package naming
guidelines.

But for 2 and 3 I have a question/comment: is there even a need to do it?
There is lots of open source code with package names like com.google.* and
com.sun.* etc and as far as I know there are no moves afoot to rename these
packages. The renaming itself doesn't add any new functionality or
technical capabilities but introduces instability in Apex code as well as
user code. Just a thought...

On Mon, Feb 27, 2017 at 8:23 AM, Chinmay Kolhatkar 
wrote:

> Thomas,
>
> I agree with you that we need this migration to be done but I have a
> different opinion on how to execute this.
> I think if we do this in phases as described above, users might end up in
> more confusion.
>
> For doing this migration, I think it should follow these steps:
> 1. Whether for operator library or core components, we should announce
> widely on dev and users mailing list that "...such change is going to
> happen in next release"
> 2 Take up the work all at once and not phase it.
>
> Thanks,
> Chinmay.
>
>
>
> On Mon, Feb 27, 2017 at 9:09 PM, Thomas Weise  wrote:
>
> > Hi,
> >
> > This topic has come up on several PRs and I think it warrants a broader
> > discussion.
> >
> > At the time of incubation, the decision was to defer change of Java
> > packages from com.datatorrent to org.apache.apex till next major release
> to
> > ensure backward compatibility for users.
> >
> > Unfortunately that has lead to some confusion, as contributors continue
> to
> > add new code under legacy packages.
> >
> > It is also a wider issue that examples for using Apex continue to refer
> to
> > com.datatorrent packages, nearly one year after graduation. More and more
> > user code is being built on top of something that needs to change, the
> can
> > is being kicked down the road and users will face more changes later.
> >
> > I would like to propose the following:
> >
> > 1. All new code has to be submitted under org.apache.apex packages
> >
> > 2. Not all code is under backward compatibility restriction and in those
> > cases we can rename the packages right away. Examples: buffer server,
> > engine, demos/examples, benchmarks
> >
> > 3. Discuss when the core API and operators can be changed. For operators
> we
> > have a bit more freedom to do changes before a major release as most of
> > them are marked @Evolving and users have the ability to continue using
> > prior version of Malhar with newer engine due to engine backward
> > compatibility guarantee.
> >
> > Thanks,
> > Thomas
> >
>

Re: Proposal: CompositeAccumulation for Windowed Operator

2017-02-27 Thread Bright Chen

A jira created: https://issues.apache.org/jira/browse/APEXMALHAR-2428


On Mon, Feb 27, 2017 at 9:53 AM, Bright Chen  wrote:

> I think Chimay's proposal could make application more clear and increase
> the performance as locate of key/window cost most of time.
>
> A suggested usage for Composite Accumulation could as following:
>
> *//following is the sample code how to add sub accumulations*
>
> *CompositeAccumulation accumulations = new
> CompositeAccumulation<>();*
>
> *AccumulationTag sumTag =
> accumulations.addAccumulation((Accumulation)new SumAccumulation());*
>
> *AccumulationTag countTag =
> accumulations.addAccumulation((Accumulation)new Count());*
>
> *AccumulationTag maxTag = accumulations.addAccumulation(new Max());*
>
> *AccumulationTag minTag = accumulations.addAccumulation(new Min());*
>
> *//following is the sample how to get the sub-accumulation output*
>
> *accumulations.getSubOutput(sumTag, outputValues)*
>
> *accumulations.getSubOutput(countTag, outputValues)*
>
> *accumulations.getSubOutput(maxTag, outputValues)*
>
> *accumulations.getSubOutput(minTag, outputValues)*
>
>
> Thanks
>
> Bright
>
> On Sun, Feb 26, 2017 at 10:33 PM, Chinmay Kolhatkar <
> chin...@datatorrent.com> wrote:
>
>> Dear Community,
>>
>> Currently we have accumulations for individual types of accumulations.
>> But if one wants to do more than one accumulations in a single stage of
>> Windowed Operator it is not possible.
>>
>> I want to propose an idea about "CompositeAccumulation" where more than
>> one
>> accumulation can be configured and this accoumulation can relay on
>> multiple
>> accumulations to generate final result/output.
>>
>> The output can be either of the 2 forms:
>> 1. Just the list of outputs with AccumulationTags as identifiers.
>> 2. Merge the results of multiple accumulations using some user defined
>> logic.
>>  For eg. In aggregation case, Input POJO to this accumulation can be a
>> POJO containing NumberOfOrders as field and in output one might need to
>> generate a final(single) POJO which contains result of multiple
>> accumulations like SUM, COUNT on NumberOfOrders as different fields of
>> outgoing POJO.
>>
>> I particularly see the use of this for Multiple Aggregation which we would
>> like to do in SQL on Apex Integration.
>>
>> Please share your thoughts on the same.
>>
>> Thanks,
>> Chinmay.
>>
>
>

[jira] [Created] (APEXMALHAR-2428) CompositeAccumulation for windowed operator

2017-02-27 Thread bright chen (JIRA)

bright chen created APEXMALHAR-2428:
---

 Summary: CompositeAccumulation for windowed operator
 Key: APEXMALHAR-2428
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2428
 Project: Apache Apex Malhar
  Issue Type: New Feature
Reporter: bright chen
Assignee: bright chen






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: Proposal: CompositeAccumulation for Windowed Operator

2017-02-27 Thread Bright Chen

I think Chimay's proposal could make application more clear and increase
the performance as locate of key/window cost most of time.

A suggested usage for Composite Accumulation could as following:

*//following is the sample code how to add sub accumulations*

*CompositeAccumulation accumulations = new
CompositeAccumulation<>();*

*AccumulationTag sumTag =
accumulations.addAccumulation((Accumulation)new SumAccumulation());*

*AccumulationTag countTag =
accumulations.addAccumulation((Accumulation)new Count());*

*AccumulationTag maxTag = accumulations.addAccumulation(new Max());*

*AccumulationTag minTag = accumulations.addAccumulation(new Min());*

*//following is the sample how to get the sub-accumulation output*

*accumulations.getSubOutput(sumTag, outputValues)*

*accumulations.getSubOutput(countTag, outputValues)*

*accumulations.getSubOutput(maxTag, outputValues)*

*accumulations.getSubOutput(minTag, outputValues)*


Thanks

Bright

On Sun, Feb 26, 2017 at 10:33 PM, Chinmay Kolhatkar  wrote:

> Dear Community,
>
> Currently we have accumulations for individual types of accumulations.
> But if one wants to do more than one accumulations in a single stage of
> Windowed Operator it is not possible.
>
> I want to propose an idea about "CompositeAccumulation" where more than one
> accumulation can be configured and this accoumulation can relay on multiple
> accumulations to generate final result/output.
>
> The output can be either of the 2 forms:
> 1. Just the list of outputs with AccumulationTags as identifiers.
> 2. Merge the results of multiple accumulations using some user defined
> logic.
>  For eg. In aggregation case, Input POJO to this accumulation can be a
> POJO containing NumberOfOrders as field and in output one might need to
> generate a final(single) POJO which contains result of multiple
> accumulations like SUM, COUNT on NumberOfOrders as different fields of
> outgoing POJO.
>
> I particularly see the use of this for Multiple Aggregation which we would
> like to do in SQL on Apex Integration.
>
> Please share your thoughts on the same.
>
> Thanks,
> Chinmay.
>

Re: [DISCUSS] Proposal for adapting Malhar operators for batch use cases

2017-02-27 Thread Thomas Weise

On Mon, Feb 27, 2017 at 8:50 AM, Bhupesh Chawda 
wrote:

> I think my comments related to count based windows might be causing
> confusion. Let's not discuss count based scenarios for now.
>
> Just want to make sure we are on the same page wrt. the "each file is a
> batch" use case. As mentioned by Thomas, the each tuple from the same file
> has the same timestamp (which is just a sequence number) and that helps
> keep tuples from each file in a separate window.
>

Yes, in this case it is a sequence number, but it could be a time stamp
also, depending on the file naming convention. And if it was event time
processing, the watermark would be derived from records within the file.

Agreed, the source should have a mechanism to control the time stamp
extraction along with everything else pertaining to the watermark
generation.


> We could also implement a "timestampExtractor" interface to identify the
> timestamp (sequence number) for a file.
>
> ~ Bhupesh
>
>
> ___
>
> Bhupesh Chawda
>
> E: bhup...@datatorrent.com | Twitter: @bhupeshsc
>
> www.datatorrent.com  |  apex.apache.org
>
>
>
> On Mon, Feb 27, 2017 at 9:52 PM, Thomas Weise  wrote:
>
> > I don't think this is a use case for count based window.
> >
> > We have multiple files that are retrieved in a sequence and there is no
> > knowledge of the number of records per file. The requirement is to
> > aggregate each file separately and emit the aggregate when the file is
> read
> > fully. There is no concept of "end of something" for an individual key
> and
> > global window isn't applicable.
> >
> > However, as already explained and implemented by Bhupesh, this can be
> > solved using watermark and window (in this case the window timestamp
> isn't
> > a timestamp, but a file sequence, but that doesn't matter.
> >
> > Thomas
> >
> >
> > On Mon, Feb 27, 2017 at 8:05 AM, David Yan  wrote:
> >
> > > I don't think this is the way to go. Global Window only means the
> > timestamp
> > > does not matter (or that there is no timestamp). It does not
> necessarily
> > > mean it's a large batch. Unless there is some notion of event time for
> > each
> > > file, you don't want to embed the file into the window itself.
> > >
> > > If you want the result broken up by file name, and if the files are to
> be
> > > processed in parallel, I think making the file name be part of the key
> is
> > > the way to go. I think it's very confusing if we somehow make the file
> to
> > > be part of the window.
> > >
> > > For count-based window, it's not implemented yet and you're welcome to
> > add
> > > that feature. In case of count-based windows, there would be no notion
> of
> > > time and you probably only trigger at the end of each window. In the
> case
> > > of count-based windows, the watermark only matters for batch since you
> > need
> > > a way to know when the batch has ended (if the count is 10, the number
> of
> > > tuples in the batch is let's say 105, you need a way to end the last
> > window
> > > with 5 tuples).
> > >
> > > David
> > >
> > > On Mon, Feb 27, 2017 at 2:41 AM, Bhupesh Chawda <
> bhup...@datatorrent.com
> > >
> > > wrote:
> > >
> > > > Hi David,
> > > >
> > > > Thanks for your comments.
> > > >
> > > > The wordcount example that I created based on the windowed operator
> > does
> > > > processing of word counts per file (each file as a separate batch),
> > i.e.
> > > > process counts for each file and dump into separate files.
> > > > As I understand Global window is for one large batch; i.e. all
> incoming
> > > > data falls into the same batch. This could not be processed using
> > > > GlobalWindow option as we need more than one windows. In this case, I
> > > > configured the windowed operator to have time windows of 1ms each and
> > > > passed data for each file with increasing timestamps: (file1, 1),
> > (file2,
> > > > 2) and so on. Is there a better way of handling this scenario?
> > > >
> > > > Regarding (2 - count based windows), I think there is a trigger
> option
> > to
> > > > process count based windows. In case I want to process every 1000
> > tuples
> > > as
> > > > a batch, I could set the Trigger option to CountTrigger with the
> > > > accumulation set to Discarding. Is this correct?
> > > >
> > > > I agree that (4. Final Watermark) can be done using Global window.
> > > >
> > > > ~ Bhupesh
> > > >
> > > > ___
> > > >
> > > > Bhupesh Chawda
> > > >
> > > > E: bhup...@datatorrent.com | Twitter: @bhupeshsc
> > > >
> > > > www.datatorrent.com  |  apex.apache.org
> > > >
> > > >
> > > >
> > > > On Mon, Feb 27, 2017 at 12:18 PM, David Yan 
> > wrote:
> > > >
> > > > > I'm worried that we are making the watermark concept too
> complicated.
> > > > >
> > > > > Watermarks should simply just tell you what windows can be
> considered
> > > > > complete.
> > > >

Re: [DISCUSS] Proposal for adapting Malhar operators for batch use cases

2017-02-27 Thread Bhupesh Chawda

I think my comments related to count based windows might be causing
confusion. Let's not discuss count based scenarios for now.

Just want to make sure we are on the same page wrt. the "each file is a
batch" use case. As mentioned by Thomas, the each tuple from the same file
has the same timestamp (which is just a sequence number) and that helps
keep tuples from each file in a separate window.

We could also implement a "timestampExtractor" interface to identify the
timestamp (sequence number) for a file.

~ Bhupesh


___

Bhupesh Chawda

E: bhup...@datatorrent.com | Twitter: @bhupeshsc

www.datatorrent.com  |  apex.apache.org



On Mon, Feb 27, 2017 at 9:52 PM, Thomas Weise  wrote:

> I don't think this is a use case for count based window.
>
> We have multiple files that are retrieved in a sequence and there is no
> knowledge of the number of records per file. The requirement is to
> aggregate each file separately and emit the aggregate when the file is read
> fully. There is no concept of "end of something" for an individual key and
> global window isn't applicable.
>
> However, as already explained and implemented by Bhupesh, this can be
> solved using watermark and window (in this case the window timestamp isn't
> a timestamp, but a file sequence, but that doesn't matter.
>
> Thomas
>
>
> On Mon, Feb 27, 2017 at 8:05 AM, David Yan  wrote:
>
> > I don't think this is the way to go. Global Window only means the
> timestamp
> > does not matter (or that there is no timestamp). It does not necessarily
> > mean it's a large batch. Unless there is some notion of event time for
> each
> > file, you don't want to embed the file into the window itself.
> >
> > If you want the result broken up by file name, and if the files are to be
> > processed in parallel, I think making the file name be part of the key is
> > the way to go. I think it's very confusing if we somehow make the file to
> > be part of the window.
> >
> > For count-based window, it's not implemented yet and you're welcome to
> add
> > that feature. In case of count-based windows, there would be no notion of
> > time and you probably only trigger at the end of each window. In the case
> > of count-based windows, the watermark only matters for batch since you
> need
> > a way to know when the batch has ended (if the count is 10, the number of
> > tuples in the batch is let's say 105, you need a way to end the last
> window
> > with 5 tuples).
> >
> > David
> >
> > On Mon, Feb 27, 2017 at 2:41 AM, Bhupesh Chawda  >
> > wrote:
> >
> > > Hi David,
> > >
> > > Thanks for your comments.
> > >
> > > The wordcount example that I created based on the windowed operator
> does
> > > processing of word counts per file (each file as a separate batch),
> i.e.
> > > process counts for each file and dump into separate files.
> > > As I understand Global window is for one large batch; i.e. all incoming
> > > data falls into the same batch. This could not be processed using
> > > GlobalWindow option as we need more than one windows. In this case, I
> > > configured the windowed operator to have time windows of 1ms each and
> > > passed data for each file with increasing timestamps: (file1, 1),
> (file2,
> > > 2) and so on. Is there a better way of handling this scenario?
> > >
> > > Regarding (2 - count based windows), I think there is a trigger option
> to
> > > process count based windows. In case I want to process every 1000
> tuples
> > as
> > > a batch, I could set the Trigger option to CountTrigger with the
> > > accumulation set to Discarding. Is this correct?
> > >
> > > I agree that (4. Final Watermark) can be done using Global window.
> > >
> > > ~ Bhupesh
> > >
> > > ___
> > >
> > > Bhupesh Chawda
> > >
> > > E: bhup...@datatorrent.com | Twitter: @bhupeshsc
> > >
> > > www.datatorrent.com  |  apex.apache.org
> > >
> > >
> > >
> > > On Mon, Feb 27, 2017 at 12:18 PM, David Yan 
> wrote:
> > >
> > > > I'm worried that we are making the watermark concept too complicated.
> > > >
> > > > Watermarks should simply just tell you what windows can be considered
> > > > complete.
> > > >
> > > > Point 2 is basically a count-based window. Watermarks do not play a
> > role
> > > > here because the window is always complete at the n-th tuple.
> > > >
> > > > If I understand correctly, point 3 is for batch processing of files.
> > > Unless
> > > > the files contain timed events, it sounds to be that this can be
> > achieved
> > > > with just a Global Window. For signaling EOF, a watermark with a
> > > +infinity
> > > > timestamp can be used so that triggers will be fired upon receipt of
> > that
> > > > watermark.
> > > >
> > > > For point 4, just like what I mentioned above, can be achieved with a
> > > > watermark with a +infinity timestamp.
> > > >
> > > > David
> > > >
> > > >
>

Re: Java packages: legacy -> org.apache.apex

2017-02-27 Thread Chinmay Kolhatkar

Thomas,

I agree with you that we need this migration to be done but I have a
different opinion on how to execute this.
I think if we do this in phases as described above, users might end up in
more confusion.

For doing this migration, I think it should follow these steps:
1. Whether for operator library or core components, we should announce
widely on dev and users mailing list that "...such change is going to
happen in next release"
2 Take up the work all at once and not phase it.

Thanks,
Chinmay.



On Mon, Feb 27, 2017 at 9:09 PM, Thomas Weise  wrote:

> Hi,
>
> This topic has come up on several PRs and I think it warrants a broader
> discussion.
>
> At the time of incubation, the decision was to defer change of Java
> packages from com.datatorrent to org.apache.apex till next major release to
> ensure backward compatibility for users.
>
> Unfortunately that has lead to some confusion, as contributors continue to
> add new code under legacy packages.
>
> It is also a wider issue that examples for using Apex continue to refer to
> com.datatorrent packages, nearly one year after graduation. More and more
> user code is being built on top of something that needs to change, the can
> is being kicked down the road and users will face more changes later.
>
> I would like to propose the following:
>
> 1. All new code has to be submitted under org.apache.apex packages
>
> 2. Not all code is under backward compatibility restriction and in those
> cases we can rename the packages right away. Examples: buffer server,
> engine, demos/examples, benchmarks
>
> 3. Discuss when the core API and operators can be changed. For operators we
> have a bit more freedom to do changes before a major release as most of
> them are marked @Evolving and users have the ability to continue using
> prior version of Malhar with newer engine due to engine backward
> compatibility guarantee.
>
> Thanks,
> Thomas
>

Re: [DISCUSS] Proposal for adapting Malhar operators for batch use cases

2017-02-27 Thread Thomas Weise

I don't think this is a use case for count based window.

We have multiple files that are retrieved in a sequence and there is no
knowledge of the number of records per file. The requirement is to
aggregate each file separately and emit the aggregate when the file is read
fully. There is no concept of "end of something" for an individual key and
global window isn't applicable.

However, as already explained and implemented by Bhupesh, this can be
solved using watermark and window (in this case the window timestamp isn't
a timestamp, but a file sequence, but that doesn't matter.

Thomas


On Mon, Feb 27, 2017 at 8:05 AM, David Yan  wrote:

> I don't think this is the way to go. Global Window only means the timestamp
> does not matter (or that there is no timestamp). It does not necessarily
> mean it's a large batch. Unless there is some notion of event time for each
> file, you don't want to embed the file into the window itself.
>
> If you want the result broken up by file name, and if the files are to be
> processed in parallel, I think making the file name be part of the key is
> the way to go. I think it's very confusing if we somehow make the file to
> be part of the window.
>
> For count-based window, it's not implemented yet and you're welcome to add
> that feature. In case of count-based windows, there would be no notion of
> time and you probably only trigger at the end of each window. In the case
> of count-based windows, the watermark only matters for batch since you need
> a way to know when the batch has ended (if the count is 10, the number of
> tuples in the batch is let's say 105, you need a way to end the last window
> with 5 tuples).
>
> David
>
> On Mon, Feb 27, 2017 at 2:41 AM, Bhupesh Chawda 
> wrote:
>
> > Hi David,
> >
> > Thanks for your comments.
> >
> > The wordcount example that I created based on the windowed operator does
> > processing of word counts per file (each file as a separate batch), i.e.
> > process counts for each file and dump into separate files.
> > As I understand Global window is for one large batch; i.e. all incoming
> > data falls into the same batch. This could not be processed using
> > GlobalWindow option as we need more than one windows. In this case, I
> > configured the windowed operator to have time windows of 1ms each and
> > passed data for each file with increasing timestamps: (file1, 1), (file2,
> > 2) and so on. Is there a better way of handling this scenario?
> >
> > Regarding (2 - count based windows), I think there is a trigger option to
> > process count based windows. In case I want to process every 1000 tuples
> as
> > a batch, I could set the Trigger option to CountTrigger with the
> > accumulation set to Discarding. Is this correct?
> >
> > I agree that (4. Final Watermark) can be done using Global window.
> >
> > ~ Bhupesh
> >
> > ___
> >
> > Bhupesh Chawda
> >
> > E: bhup...@datatorrent.com | Twitter: @bhupeshsc
> >
> > www.datatorrent.com  |  apex.apache.org
> >
> >
> >
> > On Mon, Feb 27, 2017 at 12:18 PM, David Yan  wrote:
> >
> > > I'm worried that we are making the watermark concept too complicated.
> > >
> > > Watermarks should simply just tell you what windows can be considered
> > > complete.
> > >
> > > Point 2 is basically a count-based window. Watermarks do not play a
> role
> > > here because the window is always complete at the n-th tuple.
> > >
> > > If I understand correctly, point 3 is for batch processing of files.
> > Unless
> > > the files contain timed events, it sounds to be that this can be
> achieved
> > > with just a Global Window. For signaling EOF, a watermark with a
> > +infinity
> > > timestamp can be used so that triggers will be fired upon receipt of
> that
> > > watermark.
> > >
> > > For point 4, just like what I mentioned above, can be achieved with a
> > > watermark with a +infinity timestamp.
> > >
> > > David
> > >
> > >
> > >
> > >
> > > On Sat, Feb 18, 2017 at 8:04 AM, Bhupesh Chawda <
> bhup...@datatorrent.com
> > >
> > > wrote:
> > >
> > > > Hi Thomas,
> > > >
> > > > For an input operator which is supposed to generate watermarks for
> > > > downstream operators, I can think about the following watermarks that
> > the
> > > > operator can emit:
> > > > 1. Time based watermarks (the high watermark / low watermark)
> > > > 2. Number of tuple based watermarks (Every n tuples)
> > > > 3. File based watermarks (Start file, end file)
> > > > 4. Final watermark
> > > >
> > > > File based watermarks seem to be applicable for batch (file based) as
> > > well,
> > > > and hence I thought of looking at these first. Does this seem to be
> in
> > > line
> > > > with the thought process?
> > > >
> > > > ~ Bhupesh
> > > >
> > > >
> > > >
> > > > ___
> > > >
> > > > Bhupesh Chawda
> > > >
> > > > Software Engineer
> > > >
> > > > E:

Re: [DISCUSS] Proposal for adapting Malhar operators for batch use cases

2017-02-27 Thread David Yan

I don't think this is the way to go. Global Window only means the timestamp
does not matter (or that there is no timestamp). It does not necessarily
mean it's a large batch. Unless there is some notion of event time for each
file, you don't want to embed the file into the window itself.

If you want the result broken up by file name, and if the files are to be
processed in parallel, I think making the file name be part of the key is
the way to go. I think it's very confusing if we somehow make the file to
be part of the window.

For count-based window, it's not implemented yet and you're welcome to add
that feature. In case of count-based windows, there would be no notion of
time and you probably only trigger at the end of each window. In the case
of count-based windows, the watermark only matters for batch since you need
a way to know when the batch has ended (if the count is 10, the number of
tuples in the batch is let's say 105, you need a way to end the last window
with 5 tuples).

David

On Mon, Feb 27, 2017 at 2:41 AM, Bhupesh Chawda 
wrote:

> Hi David,
>
> Thanks for your comments.
>
> The wordcount example that I created based on the windowed operator does
> processing of word counts per file (each file as a separate batch), i.e.
> process counts for each file and dump into separate files.
> As I understand Global window is for one large batch; i.e. all incoming
> data falls into the same batch. This could not be processed using
> GlobalWindow option as we need more than one windows. In this case, I
> configured the windowed operator to have time windows of 1ms each and
> passed data for each file with increasing timestamps: (file1, 1), (file2,
> 2) and so on. Is there a better way of handling this scenario?
>
> Regarding (2 - count based windows), I think there is a trigger option to
> process count based windows. In case I want to process every 1000 tuples as
> a batch, I could set the Trigger option to CountTrigger with the
> accumulation set to Discarding. Is this correct?
>
> I agree that (4. Final Watermark) can be done using Global window.
>
> ~ Bhupesh
>
> ___
>
> Bhupesh Chawda
>
> E: bhup...@datatorrent.com | Twitter: @bhupeshsc
>
> www.datatorrent.com  |  apex.apache.org
>
>
>
> On Mon, Feb 27, 2017 at 12:18 PM, David Yan  wrote:
>
> > I'm worried that we are making the watermark concept too complicated.
> >
> > Watermarks should simply just tell you what windows can be considered
> > complete.
> >
> > Point 2 is basically a count-based window. Watermarks do not play a role
> > here because the window is always complete at the n-th tuple.
> >
> > If I understand correctly, point 3 is for batch processing of files.
> Unless
> > the files contain timed events, it sounds to be that this can be achieved
> > with just a Global Window. For signaling EOF, a watermark with a
> +infinity
> > timestamp can be used so that triggers will be fired upon receipt of that
> > watermark.
> >
> > For point 4, just like what I mentioned above, can be achieved with a
> > watermark with a +infinity timestamp.
> >
> > David
> >
> >
> >
> >
> > On Sat, Feb 18, 2017 at 8:04 AM, Bhupesh Chawda  >
> > wrote:
> >
> > > Hi Thomas,
> > >
> > > For an input operator which is supposed to generate watermarks for
> > > downstream operators, I can think about the following watermarks that
> the
> > > operator can emit:
> > > 1. Time based watermarks (the high watermark / low watermark)
> > > 2. Number of tuple based watermarks (Every n tuples)
> > > 3. File based watermarks (Start file, end file)
> > > 4. Final watermark
> > >
> > > File based watermarks seem to be applicable for batch (file based) as
> > well,
> > > and hence I thought of looking at these first. Does this seem to be in
> > line
> > > with the thought process?
> > >
> > > ~ Bhupesh
> > >
> > >
> > >
> > > ___
> > >
> > > Bhupesh Chawda
> > >
> > > Software Engineer
> > >
> > > E: bhup...@datatorrent.com | Twitter: @bhupeshsc
> > >
> > > www.datatorrent.com  |  apex.apache.org
> > >
> > >
> > >
> > > On Thu, Feb 16, 2017 at 10:37 AM, Thomas Weise  wrote:
> > >
> > > > I don't think this should be designed based on a simplistic file
> > > > input-output scenario. It would be good to include a stateful
> > > > transformation based on event time.
> > > >
> > > > More complex pipelines contain stateful transformations that depend
> on
> > > > windowing and watermarks. I think we need a watermark concept that is
> > > based
> > > > on progress in event time (or other monotonic increasing sequence)
> that
> > > > other operators can generically work with.
> > > >
> > > > Note that even file input in many cases can produce time based
> > > watermarks,
> > > > for example when you read part files that are bound by event time.
> > > >
> > > > Thanks,
> > > > Thomas
> > > >

Java packages: legacy -> org.apache.apex

2017-02-27 Thread Thomas Weise

Hi,

This topic has come up on several PRs and I think it warrants a broader
discussion.

At the time of incubation, the decision was to defer change of Java
packages from com.datatorrent to org.apache.apex till next major release to
ensure backward compatibility for users.

Unfortunately that has lead to some confusion, as contributors continue to
add new code under legacy packages.

It is also a wider issue that examples for using Apex continue to refer to
com.datatorrent packages, nearly one year after graduation. More and more
user code is being built on top of something that needs to change, the can
is being kicked down the road and users will face more changes later.

I would like to propose the following:

1. All new code has to be submitted under org.apache.apex packages

2. Not all code is under backward compatibility restriction and in those
cases we can rename the packages right away. Examples: buffer server,
engine, demos/examples, benchmarks

3. Discuss when the core API and operators can be changed. For operators we
have a bit more freedom to do changes before a major release as most of
them are marked @Evolving and users have the ability to continue using
prior version of Malhar with newer engine due to engine backward
compatibility guarantee.

Thanks,
Thomas

[jira] [Resolved] (APEXMALHAR-2424) Null pointer exception in JDBCPojoPollInputOperator is thrown when we set columnsExpression and have additional columns in fieldInfos

2017-02-27 Thread shubham pathak (JIRA)


 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shubham pathak resolved APEXMALHAR-2424.

Resolution: Fixed

> Null pointer exception in JDBCPojoPollInputOperator is thrown when we set 
> columnsExpression and have additional columns in fieldInfos
> -
>
> Key: APEXMALHAR-2424
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2424
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Hitesh Kapoor
>Assignee: Hitesh Kapoor
> Fix For: 3.7.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (APEXMALHAR-2424) Null pointer exception in JDBCPojoPollInputOperator is thrown when we set columnsExpression and have additional columns in fieldInfos

2017-02-27 Thread shubham pathak (JIRA)


 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shubham pathak updated APEXMALHAR-2424:
---
Fix Version/s: 3.7.0

> Null pointer exception in JDBCPojoPollInputOperator is thrown when we set 
> columnsExpression and have additional columns in fieldInfos
> -
>
> Key: APEXMALHAR-2424
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2424
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Hitesh Kapoor
>Assignee: Hitesh Kapoor
> Fix For: 3.7.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (APEXMALHAR-2424) Null pointer exception in JDBCPojoPollInputOperator is thrown when we set columnsExpression and have additional columns in fieldInfos

2017-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885722#comment-15885722
 ] 

ASF GitHub Bot commented on APEXMALHAR-2424:


Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/562


> Null pointer exception in JDBCPojoPollInputOperator is thrown when we set 
> columnsExpression and have additional columns in fieldInfos
> -
>
> Key: APEXMALHAR-2424
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2424
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Hitesh Kapoor
>Assignee: Hitesh Kapoor
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] apex-malhar pull request #562: APEXMALHAR-2424 Extra null field getting adde...

2017-02-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/562


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Resolved] (APEXMALHAR-2415) Enable PojoInnerJoin accum to allow multiple keys for join purpose

2017-02-27 Thread Tushar Gosavi (JIRA)


 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tushar Gosavi resolved APEXMALHAR-2415.
---
   Resolution: Fixed
Fix Version/s: 3.7.0

> Enable PojoInnerJoin accum to allow multiple keys for join purpose
> --
>
> Key: APEXMALHAR-2415
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2415
> Project: Apache Apex Malhar
>  Issue Type: Sub-task
>  Components: algorithms
>Affects Versions: 3.6.0
>Reporter: Chinmay Kolhatkar
>Assignee: Hitesh Kapoor
> Fix For: 3.7.0
>
>
> Enable PojoInnerJoin accum to allow multiple keys for join purpose



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (APEXMALHAR-2415) Enable PojoInnerJoin accum to allow multiple keys for join purpose

2017-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885663#comment-15885663
 ] 

ASF GitHub Bot commented on APEXMALHAR-2415:


Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/561


> Enable PojoInnerJoin accum to allow multiple keys for join purpose
> --
>
> Key: APEXMALHAR-2415
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2415
> Project: Apache Apex Malhar
>  Issue Type: Sub-task
>  Components: algorithms
>Affects Versions: 3.6.0
>Reporter: Chinmay Kolhatkar
>Assignee: Hitesh Kapoor
>
> Enable PojoInnerJoin accum to allow multiple keys for join purpose



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] apex-malhar pull request #561: APEXMALHAR-2415 Taking join on multiple colum...

2017-02-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/561


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: [DISCUSS] Proposal for adapting Malhar operators for batch use cases

2017-02-27 Thread Bhupesh Chawda

Hi David,

Thanks for your comments.

The wordcount example that I created based on the windowed operator does
processing of word counts per file (each file as a separate batch), i.e.
process counts for each file and dump into separate files.
As I understand Global window is for one large batch; i.e. all incoming
data falls into the same batch. This could not be processed using
GlobalWindow option as we need more than one windows. In this case, I
configured the windowed operator to have time windows of 1ms each and
passed data for each file with increasing timestamps: (file1, 1), (file2,
2) and so on. Is there a better way of handling this scenario?

Regarding (2 - count based windows), I think there is a trigger option to
process count based windows. In case I want to process every 1000 tuples as
a batch, I could set the Trigger option to CountTrigger with the
accumulation set to Discarding. Is this correct?

I agree that (4. Final Watermark) can be done using Global window.

~ Bhupesh

___

Bhupesh Chawda

E: bhup...@datatorrent.com | Twitter: @bhupeshsc

www.datatorrent.com  |  apex.apache.org



On Mon, Feb 27, 2017 at 12:18 PM, David Yan  wrote:

> I'm worried that we are making the watermark concept too complicated.
>
> Watermarks should simply just tell you what windows can be considered
> complete.
>
> Point 2 is basically a count-based window. Watermarks do not play a role
> here because the window is always complete at the n-th tuple.
>
> If I understand correctly, point 3 is for batch processing of files. Unless
> the files contain timed events, it sounds to be that this can be achieved
> with just a Global Window. For signaling EOF, a watermark with a +infinity
> timestamp can be used so that triggers will be fired upon receipt of that
> watermark.
>
> For point 4, just like what I mentioned above, can be achieved with a
> watermark with a +infinity timestamp.
>
> David
>
>
>
>
> On Sat, Feb 18, 2017 at 8:04 AM, Bhupesh Chawda 
> wrote:
>
> > Hi Thomas,
> >
> > For an input operator which is supposed to generate watermarks for
> > downstream operators, I can think about the following watermarks that the
> > operator can emit:
> > 1. Time based watermarks (the high watermark / low watermark)
> > 2. Number of tuple based watermarks (Every n tuples)
> > 3. File based watermarks (Start file, end file)
> > 4. Final watermark
> >
> > File based watermarks seem to be applicable for batch (file based) as
> well,
> > and hence I thought of looking at these first. Does this seem to be in
> line
> > with the thought process?
> >
> > ~ Bhupesh
> >
> >
> >
> > ___
> >
> > Bhupesh Chawda
> >
> > Software Engineer
> >
> > E: bhup...@datatorrent.com | Twitter: @bhupeshsc
> >
> > www.datatorrent.com  |  apex.apache.org
> >
> >
> >
> > On Thu, Feb 16, 2017 at 10:37 AM, Thomas Weise  wrote:
> >
> > > I don't think this should be designed based on a simplistic file
> > > input-output scenario. It would be good to include a stateful
> > > transformation based on event time.
> > >
> > > More complex pipelines contain stateful transformations that depend on
> > > windowing and watermarks. I think we need a watermark concept that is
> > based
> > > on progress in event time (or other monotonic increasing sequence) that
> > > other operators can generically work with.
> > >
> > > Note that even file input in many cases can produce time based
> > watermarks,
> > > for example when you read part files that are bound by event time.
> > >
> > > Thanks,
> > > Thomas
> > >
> > >
> > > On Wed, Feb 15, 2017 at 4:02 AM, Bhupesh Chawda <
> bhup...@datatorrent.com
> > >
> > > wrote:
> > >
> > > > For better understanding the use case for control tuples in batch, I
> > am
> > > > creating a prototype for a batch application using File Input and
> File
> > > > Output operators.
> > > >
> > > > To enable basic batch processing for File IO operators, I am
> proposing
> > > the
> > > > following changes to File input and output operators:
> > > > 1. File Input operator emits a watermark each time it opens and
> closes
> > a
> > > > file. These can be "start file" and "end file" watermarks which
> include
> > > the
> > > > corresponding file names. The "start file" tuple should be sent
> before
> > > any
> > > > of the data from that file flows.
> > > > 2. File Input operator can be configured to end the application
> after a
> > > > single or n scans of the directory (a batch). This is where the
> > operator
> > > > emits the final watermark (the end of application control tuple).
> This
> > > will
> > > > also shutdown the application.
> > > > 3. The File output operator handles these control tuples. "Start
> file"
> > > > initializes the file name for the incoming tuples. "End file"
> watermark
> > > > forces a finalize on that file.
> > > >
> > > > The user would

[jira] [Resolved] (APEXMALHAR-2414) Improve performance of PojoInnerJoin accum by using PojoUtils

2017-02-27 Thread Tushar Gosavi (JIRA)


 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tushar Gosavi resolved APEXMALHAR-2414.
---
   Resolution: Fixed
Fix Version/s: 3.7.0

Changed pushed through dd5341f222b8141b76ecbc5ea9653c70b3c78c44

> Improve performance of PojoInnerJoin accum by using PojoUtils
> -
>
> Key: APEXMALHAR-2414
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2414
> Project: Apache Apex Malhar
>  Issue Type: Sub-task
>  Components: algorithms
>Reporter: Chinmay Kolhatkar
>Assignee: Hitesh Kapoor
> Fix For: 3.7.0
>
>
> Currently PojoInnerJoin accumulation uses java reflection to deal with 
> objects. Use PojoUtils instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (APEXMALHAR-2426) Add user document for RegexParser operator

2017-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885466#comment-15885466
 ] 

ASF GitHub Bot commented on APEXMALHAR-2426:


GitHub user venkateshkottapalli opened a pull request:

https://github.com/apache/apex-malhar/pull/565

APEXMALHAR-2426 - RegexParser Documentation

APEXMALHAR-2426 : Regex Parser Documentation

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/venkateshkottapalli/apex-malhar 
APEXMALHAR-2426-RegexParserDocumentation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/565.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #565


commit 69adc9a7bef8c15258bd4df8b7e39d5f9871d6cc
Author: venkateshDT 
Date:   2017-02-25T00:19:28Z

RegexParser Documentation




> Add user document for RegexParser operator
> --
>
> Key: APEXMALHAR-2426
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2426
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: Venkatesh Kottapalli
>Assignee: Venkatesh Kottapalli
>  Labels: documentation
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Add documentation for the RegexParser operator
> RegexParser JIRA :
> https://issues.apache.org/jira/browse/APEXMALHAR-2218



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] apex-malhar pull request #565: APEXMALHAR-2426 - RegexParser Documentation

2017-02-27 Thread venkateshkottapalli

GitHub user venkateshkottapalli opened a pull request:

https://github.com/apache/apex-malhar/pull/565

APEXMALHAR-2426 - RegexParser Documentation

APEXMALHAR-2426 : Regex Parser Documentation

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/venkateshkottapalli/apex-malhar 
APEXMALHAR-2426-RegexParserDocumentation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/565.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #565


commit 69adc9a7bef8c15258bd4df8b7e39d5f9871d6cc
Author: venkateshDT 
Date:   2017-02-25T00:19:28Z

RegexParser Documentation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Resolved] (APEXMALHAR-2218) RegexParser- Operator to parse byte stream using Regex pattern and emit a POJO

2017-02-27 Thread Venkatesh Kottapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkatesh Kottapalli resolved APEXMALHAR-2218.
--
Resolution: Fixed

Merged.

> RegexParser- Operator to parse byte stream using Regex pattern and emit a POJO
> --
>
> Key: APEXMALHAR-2218
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2218
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>Reporter: Venkatesh Kottapalli
>Assignee: Venkatesh Kottapalli
> Fix For: 3.7.0
>
>
> A generic parser which takes the regex pattern as a parameter and parse the 
> log file line by line.
> Input: Byte Array
> Output: POJO
> Parameters: POJO Schema Configuration is required and the order of schema 
> should match with the lines from the log file.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (APEXMALHAR-2218) RegexParser- Operator to parse byte stream using Regex pattern and emit a POJO

2017-02-27 Thread Venkatesh Kottapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkatesh Kottapalli updated APEXMALHAR-2218:
-
Fix Version/s: 3.7.0
   Issue Type: New Feature  (was: Improvement)

> RegexParser- Operator to parse byte stream using Regex pattern and emit a POJO
> --
>
> Key: APEXMALHAR-2218
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2218
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>Reporter: Venkatesh Kottapalli
>Assignee: Venkatesh Kottapalli
> Fix For: 3.7.0
>
>
> A generic parser which takes the regex pattern as a parameter and parse the 
> log file line by line.
> Input: Byte Array
> Output: POJO
> Parameters: POJO Schema Configuration is required and the order of schema 
> should match with the lines from the log file.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (APEXMALHAR-2218) RegexParser- Operator to parse byte stream using Regex pattern and emit a POJO

2017-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885445#comment-15885445
 ] 

ASF GitHub Bot commented on APEXMALHAR-2218:


Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/396


> RegexParser- Operator to parse byte stream using Regex pattern and emit a POJO
> --
>
> Key: APEXMALHAR-2218
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2218
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Venkatesh Kottapalli
>Assignee: Venkatesh Kottapalli
>
> A generic parser which takes the regex pattern as a parameter and parse the 
> log file line by line.
> Input: Byte Array
> Output: POJO
> Parameters: POJO Schema Configuration is required and the order of schema 
> should match with the lines from the log file.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] apex-malhar pull request #396: APEXMALHAR-2218-Creation of RegexSplitter ope...

2017-02-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/396


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (APEXMALHAR-2427) Kinesis Input Operator documentation

2017-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885440#comment-15885440
 ] 

ASF GitHub Bot commented on APEXMALHAR-2427:


GitHub user deepak-narkhede opened a pull request:

https://github.com/apache/apex-malhar/pull/564

APEXMALHAR-2427 Kinesis Input Operator documentation.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/deepak-narkhede/apex-malhar APEXMALHAR-2427

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/564.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #564


commit a12bd8794ef2886e87db834136df365b5c08a1d4
Author: deepak-narkhede 
Date:   2017-02-27T09:21:57Z

APEXMALHAR-2427 Kinesis Input Operator documentation.




> Kinesis Input Operator documentation
> 
>
> Key: APEXMALHAR-2427
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2427
> Project: Apache Apex Malhar
>  Issue Type: Documentation
>Reporter: Deepak Narkhede
>Assignee: Deepak Narkhede
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: Proposal: CompositeAccumulation for Windowed Operator

Re: Maven build package error with 3.5

Maven build package error with 3.5

[jira] [Commented] (APEXMALHAR-2366) Apply BloomFilter to Bucket

Re: Java packages: legacy -> org.apache.apex

Re: Java packages: legacy -> org.apache.apex

Re: [DISCUSS] Proposal for adapting Malhar operators for batch use cases

[GitHub] apex-malhar pull request #566: APEXMALHAR-2428 CompositeAccumulation for win...

[jira] [Commented] (APEXMALHAR-2428) CompositeAccumulation for windowed operator

[jira] [Resolved] (APEXMALHAR-2395) create MultiAccumulation

Re: Java packages: legacy -> org.apache.apex

Re: Proposal: CompositeAccumulation for Windowed Operator

[jira] [Created] (APEXMALHAR-2428) CompositeAccumulation for windowed operator

Re: Proposal: CompositeAccumulation for Windowed Operator

Re: [DISCUSS] Proposal for adapting Malhar operators for batch use cases

Re: [DISCUSS] Proposal for adapting Malhar operators for batch use cases

Re: Java packages: legacy -> org.apache.apex

Re: [DISCUSS] Proposal for adapting Malhar operators for batch use cases

Re: [DISCUSS] Proposal for adapting Malhar operators for batch use cases

Java packages: legacy -> org.apache.apex

[jira] [Resolved] (APEXMALHAR-2424) Null pointer exception in JDBCPojoPollInputOperator is thrown when we set columnsExpression and have additional columns in fieldInfos

[jira] [Updated] (APEXMALHAR-2424) Null pointer exception in JDBCPojoPollInputOperator is thrown when we set columnsExpression and have additional columns in fieldInfos

[jira] [Commented] (APEXMALHAR-2424) Null pointer exception in JDBCPojoPollInputOperator is thrown when we set columnsExpression and have additional columns in fieldInfos

[GitHub] apex-malhar pull request #562: APEXMALHAR-2424 Extra null field getting adde...

[jira] [Resolved] (APEXMALHAR-2415) Enable PojoInnerJoin accum to allow multiple keys for join purpose

[jira] [Commented] (APEXMALHAR-2415) Enable PojoInnerJoin accum to allow multiple keys for join purpose

[GitHub] apex-malhar pull request #561: APEXMALHAR-2415 Taking join on multiple colum...

Re: [DISCUSS] Proposal for adapting Malhar operators for batch use cases

[jira] [Resolved] (APEXMALHAR-2414) Improve performance of PojoInnerJoin accum by using PojoUtils

[jira] [Commented] (APEXMALHAR-2426) Add user document for RegexParser operator

[GitHub] apex-malhar pull request #565: APEXMALHAR-2426 - RegexParser Documentation

[jira] [Resolved] (APEXMALHAR-2218) RegexParser- Operator to parse byte stream using Regex pattern and emit a POJO

[jira] [Updated] (APEXMALHAR-2218) RegexParser- Operator to parse byte stream using Regex pattern and emit a POJO

[jira] [Commented] (APEXMALHAR-2218) RegexParser- Operator to parse byte stream using Regex pattern and emit a POJO

[GitHub] apex-malhar pull request #396: APEXMALHAR-2218-Creation of RegexSplitter ope...

[jira] [Commented] (APEXMALHAR-2427) Kinesis Input Operator documentation

36 matches

Site Navigation

Mail list logo

Footer information