[GitHub] apex-malhar pull request #319: APEXMALHAR-2085: REVIEW ONLY: Operator suppor...

2016-07-05 Thread vrozov
Github user vrozov commented on a diff in the pull request:

https://github.com/apache/apex-malhar/pull/319#discussion_r69658261
  
--- Diff: 
library/src/test/java/org/apache/apex/malhar/lib/window/SumAccumulation.java ---
@@ -0,0 +1,55 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.apex.malhar.lib.window;
+
+/**
+ * Accumulation that does a simple sum of longs
+ */
+public class SumAccumulation implements Accumulation
+{
+  @Override
+  public Long defaultAccumulatedValue()
+  {
+return 0L;
+  }
+
+  @Override
+  public Long accumulate(Long accumulatedValue, Long input)
+  {
+return accumulatedValue + input;
--- End diff --

This is an example of the interface promoting an implementation that is 
subject to boxing and GC. It will be much more efficient to accumulate values 
in an internal long member variable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2069) FileSplitterInput and TimeBasedDirectoryScanner - move operational fields initialization from constructor to setup

2016-07-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363506#comment-15363506
 ] 

ASF GitHub Bot commented on APEXMALHAR-2069:


Github user sanjaypujare closed the pull request at:

https://github.com/apache/apex-malhar/pull/329


> FileSplitterInput and TimeBasedDirectoryScanner - move operational fields 
> initialization from constructor to setup
> --
>
> Key: APEXMALHAR-2069
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2069
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Vlad Rozov
>
> For example, there is no need for scanService to be initialized in the 
> constructor. It should be done during operator setup().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2069) FileSplitterInput and TimeBasedDirectoryScanner - move operational fields initialization from constructor to setup

2016-07-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363503#comment-15363503
 ] 

ASF GitHub Bot commented on APEXMALHAR-2069:


GitHub user sanjaypujare opened a pull request:

https://github.com/apache/apex-malhar/pull/333

Fix for APEXMALHAR-2069: moved creation of scanService to setup, @vro…

@vrozov  pls review

: throw a RuntimeException if scanService is already set, fix 
unit tests to call setup only on WindowDataManager
: move the initialization of most data members from the 
constructor to setup for the TimeBasedDirectoryScanner class
: move the initialization of most data members from the ctor to 
setup for FileSplitterInput class

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sanjaypujare/apex-malhar 
APEXMALHAR-2069.improvement

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/333.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #333


commit ccbc50e67832eeb66bffe147cede486c9f1d5b9c
Author: Sanjay Pujare 
Date:   2016-06-30T01:08:30Z

Fix for APEXMALHAR-2069: moved creation of scanService to setup, @vrozov 
pls review
: throw a RuntimeException if scanService is already set, fix 
unit tests to call setup only on WindowDataManager
: move the initialization of most data members from the 
constructor to setup for the TimeBasedDirectoryScanner class
: move the initialization of most data members from the ctor to 
setup for FileSplitterInput class




> FileSplitterInput and TimeBasedDirectoryScanner - move operational fields 
> initialization from constructor to setup
> --
>
> Key: APEXMALHAR-2069
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2069
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Vlad Rozov
>
> For example, there is no need for scanService to be initialized in the 
> constructor. It should be done during operator setup().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #333: Fix for APEXMALHAR-2069: moved creation of sc...

2016-07-05 Thread sanjaypujare
GitHub user sanjaypujare opened a pull request:

https://github.com/apache/apex-malhar/pull/333

Fix for APEXMALHAR-2069: moved creation of scanService to setup, @vro…

@vrozov  pls review

: throw a RuntimeException if scanService is already set, fix 
unit tests to call setup only on WindowDataManager
: move the initialization of most data members from the 
constructor to setup for the TimeBasedDirectoryScanner class
: move the initialization of most data members from the ctor to 
setup for FileSplitterInput class

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sanjaypujare/apex-malhar 
APEXMALHAR-2069.improvement

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/333.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #333


commit ccbc50e67832eeb66bffe147cede486c9f1d5b9c
Author: Sanjay Pujare 
Date:   2016-06-30T01:08:30Z

Fix for APEXMALHAR-2069: moved creation of scanService to setup, @vrozov 
pls review
: throw a RuntimeException if scanService is already set, fix 
unit tests to call setup only on WindowDataManager
: move the initialization of most data members from the 
constructor to setup for the TimeBasedDirectoryScanner class
: move the initialization of most data members from the ctor to 
setup for FileSplitterInput class




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2085) Implement Windowed Operators

2016-07-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363475#comment-15363475
 ] 

ASF GitHub Bot commented on APEXMALHAR-2085:


Github user vrozov commented on a diff in the pull request:

https://github.com/apache/apex-malhar/pull/319#discussion_r69656309
  
--- Diff: 
library/src/main/java/org/apache/apex/malhar/lib/window/Accumulation.java ---
@@ -0,0 +1,76 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.apex.malhar.lib.window;
+
+import org.apache.hadoop.classification.InterfaceStability;
+
+/**
+ * This interface is for the processing part of the WindowedOperator.
+ * We can assume that all stateful processing of the WindowedOperator is a 
form of accumulation.
+ *
+ * In most cases, AccumT is the same as OutputT. But in some cases, the 
accumulated type and the output type may be
+ * different. For example, if we are doing the AVERAGE of doubles, InputT 
will be double, and we need the SUM and the
+ * COUNT stored as type AccumT, and AccumT will be a pair of double and 
long, in which double is the sum of the inputs,
+ * and long is the number of inputs. OutputT will be double, because it 
represents the average of the inputs.
+ */
+@InterfaceStability.Evolving
+public interface Accumulation
+{
+  /**
+   * Returns the default accumulated value when nothing has been 
accumulated
+   *
+   * @return
+   */
+  AccumT defaultAccumulatedValue();
+
+  /**
+   * Accumulates the input to the accumulated value
+   *
+   * @param accumulatedValue
+   * @param input
+   * @return
+   */
+  AccumT accumulate(AccumT accumulatedValue, InputT input);
--- End diff --

My suggestion is to delegate to a class that implements Accumulation 
interface handling of accumulation and change interface to 
```
void accumulate(InputT input);
void merge(Accumulation accumulatedValue);
OutputT getOutput();
```
The implementation class will need to define how it handles accumulation 
and how AccumT is defined. The implementation may use Collection for AccumT or 
may use primitive types such as int, long or double to accumulate values.



> Implement Windowed Operators
> 
>
> Key: APEXMALHAR-2085
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2085
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>Reporter: Siyuan Hua
>Assignee: David Yan
>
> As per our recent several discussions in the community. A group of Windowed 
> Operators that delivers the window semantic follows the google Data Flow 
> model(https://cloud.google.com/dataflow/) is very important. 
> The operators should be designed and implemented in a way for 
> High-level API
> Beam translation
> Easy to use with other popular operator
> {panel:title=Operator Hierarchy}
> Hierarchy of the operators,
> The windowed operators should cover all possible transformations that require 
> window, and batch processing is also considered as special window called 
> global window
> {code}
>+---+
>+-> |  WindowedOperator | <+
>|   ++--+  |
>|^  ^+
>|| | |
>|| | |
> +--+++--+--+  +---+-++--+-+
> |CombineOperator||GroupOperator|  |KeyedOperator||JoinOperator|
> +---++-+  +--+--++-+--+
>+-^   ^ ^
>| | |
>   ++---+   +-++   

Re: Malhar contribution guidelines

2016-07-05 Thread Ashwin Chandra Putta
Added a few comments Pramod, otherwise looks good. Dividing the guidelines
would probably make sense. Contribution process related guidelines can go
into contribution guidelines and technical implementation related
guidelines can go into implementation guidelines. Another thing that might
be helpful is a section describing a simple process for new contributors to
get started without much complexity.

Regards,
Ashwin.

On Tue, Jul 5, 2016 at 2:06 PM, Pramod Immaneni 
wrote:

> I can wait. I can also try to break it up into an abridged set of
> guidelines and add the details into the operator developer guide if
> they are missing.
>
> > On Jul 5, 2016, at 1:38 PM, Thomas Weise  wrote:
> >
> > I will also have a look at it.
> >
> > Wondering though if it shouldn't replace the existing operator developer
> > guide?
> >
> > http://apex.apache.org/docs/apex/operator_development/
> >
> > For the contributing guidelines on the web site, maybe it is better to
> have
> > a shorter checklist that gives folks the necessary pointers w/o too much
> > detail?
> >
> > http://apex.apache.org/contributing.html
> >
> > Thomas
> >
> >
> >
> > On Tue, Jul 5, 2016 at 1:07 PM, Munagala Ramanath 
> > wrote:
> >
> >> Pramod, I'll take a look this evening if you can wait that long.
> >> Ram
> >>> On Jul 5, 2016 11:24 AM, "Pramod Immaneni" 
> wrote:
> >>>
> >>> I received some feedback. Any other comments before adding these
> >> guidelines
> >>> to the project.
> >>>
> >>> Thanks
> >>>
> >>> On Fri, Jun 17, 2016 at 3:12 PM, Pramod Immaneni <
> pra...@datatorrent.com
> >>>
> >>> wrote:
> >>>
>  Hi everyone,
> 
>  I wanted to create a set of guidelines that would help folks that want
> >> to
>  contribute to Malhar. The goal is that by following these guidelines
> >> the
>  contributions will be assured a certain level of quality as the
> >> different
>  aspects to consider, common missteps and mistakes will be taken care
> of
>  which in turn would also make the review process smoother by reducing
> >> the
>  number of review iterations before the contribution gets merged. I
> >> tried
> >>> to
>  capture as much information as I thought would help developers towards
> >>> this
>  goal based on past experience and exposure, in a document.
> 
>  Please go through it and provide your feedback, it will be greatly
>  appreciated. I will go through your comments and incorporate any
> >>> necessary
>  changes. After that I hope this document will become a living document
> >> as
>  part of the contribution guidelines and evolve with the times.
> >>
> https://drive.google.com/open?id=1WjbaIogVtMDQwbvxTlQxrFqUbK9D75bk98asW9e5OxA
> 
>  Thanks
> >>
>



-- 

Regards,
Ashwin.


[jira] [Resolved] (APEXCORE-238) Update Apex community page with meetup links

2016-07-05 Thread Sasha Parfenov (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXCORE-238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sasha Parfenov resolved APEXCORE-238.
-
Resolution: Fixed

updated with new meetup widget.

> Update Apex community page with meetup links
> 
>
> Key: APEXCORE-238
> URL: https://issues.apache.org/jira/browse/APEXCORE-238
> Project: Apache Apex Core
>  Issue Type: Task
>Reporter: Amol Kekre
>Assignee: Sasha Parfenov
>
> To enable community to develop further and have healthy meetups



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Malhar contribution guidelines

2016-07-05 Thread Sanjay Pujare
Hi Pramod

Very useful document. Some questions/comments from me (sorry for any newbie
comments):

- for the "Update an Existing Operator" section, are there any backward
compatibility constraints one should be aware of?
- it will be very useful to have each guideline illustrated by an actual
example (e.g. an example of combining 2 or more operators in a module,
initialization and teardown cases in
constructor/setup/beginWindow/endWindow/deactivate/teardown etc)
- should the guidelines say something about unit tests and how unit tests
should typically be written for operators (also include other kinds of
automated tests in-case "unit" testing is difficult)
- didn't find any references to WindowedOperator anywhere (but did find
WindowedStream). Will be good to have hyperlinks for all references
- java 1.8/1.7 compatibility requirement?
- with respect to "...only the data for the first input port..." , how is
"first" input port determined?





On Tue, Jul 5, 2016 at 11:23 AM, Pramod Immaneni 
wrote:

> I received some feedback. Any other comments before adding these guidelines
> to the project.
>
> Thanks
>
> On Fri, Jun 17, 2016 at 3:12 PM, Pramod Immaneni 
> wrote:
>
> > Hi everyone,
> >
> > I wanted to create a set of guidelines that would help folks that want to
> > contribute to Malhar. The goal is that by following these guidelines the
> > contributions will be assured a certain level of quality as the different
> > aspects to consider, common missteps and mistakes will be taken care of
> > which in turn would also make the review process smoother by reducing the
> > number of review iterations before the contribution gets merged. I tried
> to
> > capture as much information as I thought would help developers towards
> this
> > goal based on past experience and exposure, in a document.
> >
> > Please go through it and provide your feedback, it will be greatly
> > appreciated. I will go through your comments and incorporate any
> necessary
> > changes. After that I hope this document will become a living document as
> > part of the contribution guidelines and evolve with the times.
> >
> >
> >
> https://drive.google.com/open?id=1WjbaIogVtMDQwbvxTlQxrFqUbK9D75bk98asW9e5OxA
> >
> > Thanks
> >
>


[GitHub] apex-malhar pull request #319: APEXMALHAR-2085: REVIEW ONLY: Operator suppor...

2016-07-05 Thread davidyan74
Github user davidyan74 commented on a diff in the pull request:

https://github.com/apache/apex-malhar/pull/319#discussion_r69614566
  
--- Diff: 
library/src/main/java/org/apache/apex/malhar/lib/window/WindowedStorage.java ---
@@ -0,0 +1,88 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.apex.malhar.lib.window;
+
+import java.util.Map;
+
+import org.apache.hadoop.classification.InterfaceStability;
+
+/**
+ * WindowedStorage is a key-value store with the key being the window. The 
implementation of this interface should
+ * make sure checkpointing and recovery will be done correctly.
+ *
+ * @param  The type of the data that is stored per window
+ *
+ * TODO: Look at the possibility of integrating spillable data structure: 
https://issues.apache.org/jira/browse/APEXMALHAR-2026
--- End diff --

@bhupeshchawda We will for sure look at ManagedState.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2085) Implement Windowed Operators

2016-07-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362957#comment-15362957
 ] 

ASF GitHub Bot commented on APEXMALHAR-2085:


Github user davidyan74 commented on a diff in the pull request:

https://github.com/apache/apex-malhar/pull/319#discussion_r69614298
  
--- Diff: 
library/src/main/java/org/apache/apex/malhar/lib/window/Accumulation.java ---
@@ -0,0 +1,76 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.apex.malhar.lib.window;
+
+import org.apache.hadoop.classification.InterfaceStability;
+
+/**
+ * This interface is for the processing part of the WindowedOperator.
+ * We can assume that all stateful processing of the WindowedOperator is a 
form of accumulation.
+ *
+ * In most cases, AccumT is the same as OutputT. But in some cases, the 
accumulated type and the output type may be
+ * different. For example, if we are doing the AVERAGE of doubles, InputT 
will be double, and we need the SUM and the
+ * COUNT stored as type AccumT, and AccumT will be a pair of double and 
long, in which double is the sum of the inputs,
+ * and long is the number of inputs. OutputT will be double, because it 
represents the average of the inputs.
+ */
+@InterfaceStability.Evolving
+public interface Accumulation
+{
+  /**
+   * Returns the default accumulated value when nothing has been 
accumulated
+   *
+   * @return
+   */
+  AccumT defaultAccumulatedValue();
+
+  /**
+   * Accumulates the input to the accumulated value
+   *
+   * @param accumulatedValue
+   * @param input
+   * @return
+   */
+  AccumT accumulate(AccumT accumulatedValue, InputT input);
--- End diff --

@vrozov Good point. I'm assuming you're asking why not: 
```java
  void accumulate(AccumT accumulatedValue, InputT input);
```
with accumulatedValue updated in place. But doing it will make it a lot 
less flexible because the underlying storage might not support this kind of 
operation. For example, if the storage supports get(key) and put(key, value) 
with get(key) returning not a reference to the actual object (possibly as a 
result of deserialization), then it would not work.



> Implement Windowed Operators
> 
>
> Key: APEXMALHAR-2085
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2085
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>Reporter: Siyuan Hua
>Assignee: David Yan
>
> As per our recent several discussions in the community. A group of Windowed 
> Operators that delivers the window semantic follows the google Data Flow 
> model(https://cloud.google.com/dataflow/) is very important. 
> The operators should be designed and implemented in a way for 
> High-level API
> Beam translation
> Easy to use with other popular operator
> {panel:title=Operator Hierarchy}
> Hierarchy of the operators,
> The windowed operators should cover all possible transformations that require 
> window, and batch processing is also considered as special window called 
> global window
> {code}
>+---+
>+-> |  WindowedOperator | <+
>|   ++--+  |
>|^  ^+
>|| | |
>|| | |
> +--+++--+--+  +---+-++--+-+
> |CombineOperator||GroupOperator|  |KeyedOperator||JoinOperator|
> +---++-+  +--+--++-+--+
>+-^   ^ ^
>| | |
>   ++---+   

[GitHub] apex-malhar pull request #319: APEXMALHAR-2085: REVIEW ONLY: Operator suppor...

2016-07-05 Thread davidyan74
Github user davidyan74 commented on a diff in the pull request:

https://github.com/apache/apex-malhar/pull/319#discussion_r69614298
  
--- Diff: 
library/src/main/java/org/apache/apex/malhar/lib/window/Accumulation.java ---
@@ -0,0 +1,76 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.apex.malhar.lib.window;
+
+import org.apache.hadoop.classification.InterfaceStability;
+
+/**
+ * This interface is for the processing part of the WindowedOperator.
+ * We can assume that all stateful processing of the WindowedOperator is a 
form of accumulation.
+ *
+ * In most cases, AccumT is the same as OutputT. But in some cases, the 
accumulated type and the output type may be
+ * different. For example, if we are doing the AVERAGE of doubles, InputT 
will be double, and we need the SUM and the
+ * COUNT stored as type AccumT, and AccumT will be a pair of double and 
long, in which double is the sum of the inputs,
+ * and long is the number of inputs. OutputT will be double, because it 
represents the average of the inputs.
+ */
+@InterfaceStability.Evolving
+public interface Accumulation
+{
+  /**
+   * Returns the default accumulated value when nothing has been 
accumulated
+   *
+   * @return
+   */
+  AccumT defaultAccumulatedValue();
+
+  /**
+   * Accumulates the input to the accumulated value
+   *
+   * @param accumulatedValue
+   * @param input
+   * @return
+   */
+  AccumT accumulate(AccumT accumulatedValue, InputT input);
--- End diff --

@vrozov Good point. I'm assuming you're asking why not: 
```java
  void accumulate(AccumT accumulatedValue, InputT input);
```
with accumulatedValue updated in place. But doing it will make it a lot 
less flexible because the underlying storage might not support this kind of 
operation. For example, if the storage supports get(key) and put(key, value) 
with get(key) returning not a reference to the actual object (possibly as a 
result of deserialization), then it would not work.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] apex-malhar pull request #324: Spillable Datastructures PR for review only

2016-07-05 Thread sandeshh
Github user sandeshh commented on a diff in the pull request:

https://github.com/apache/apex-malhar/pull/324#discussion_r69600767
  
--- Diff: 
library/src/main/java/org/apache/apex/malhar/lib/state/spillable/TimeBasedPriorityQueue.java
 ---
@@ -0,0 +1,128 @@
+package org.apache.apex.malhar.lib.state.spillable;
+
+import java.util.Iterator;
+import java.util.Map;
+import java.util.Set;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+
+public class TimeBasedPriorityQueue
--- End diff --

Have you looked at 
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/DelayQueue.html ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] apex-malhar pull request #298: [APEXMALHAR-2086] Kafka output operator: 0.9....

2016-07-05 Thread sandeshh
GitHub user sandeshh reopened a pull request:

https://github.com/apache/apex-malhar/pull/298

[APEXMALHAR-2086] Kafka output operator: 0.9.0

Kafka output exactly once operator and the regular output operator.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sandeshh/apex-malhar APEXMALHAR-2086

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/298.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #298


commit 6bf316da76bf1d1871f52a114a307c05ff652376
Author: sandeshh 
Date:   2016-05-25T15:56:56Z

Kafka 0.9.0 output operators and unit tests.

1. Abstract Base class
2. Kafka Output operator
3. Exactly Once output operator
 Key in the Kafka message is used by the operator to track the tuples 
written by it.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] apex-malhar pull request #330: Initial cut of Inner Join operator for REVIEW...

2016-07-05 Thread chaithu14
GitHub user chaithu14 opened a pull request:

https://github.com/apache/apex-malhar/pull/330

Initial cut of Inner Join operator for REVIEW only



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chaithu14/incubator-apex-malhar 
APEXMALHAR-2100-InnerJoin

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/330.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #330


commit 6987dce47a5789e8de578ae3ee77c2cf6507fcc0
Author: Chaitanya 
Date:   2016-07-05T12:14:46Z

Initial cut of Inner Join operator for REVIEW only




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Regarding issue APEXMALHAR-2122

2016-07-05 Thread Akshay S Harale
Hi,

I have created the pull request for issue APEXMALHAR-2122.

The reason behind this is to support the latest version of elastic search.
Also updated the AbstractElasticSearchOutputOperator.processBatch() api to
process the batch only if its not empty.
I have created the pull request for the same : apex-malhar 325


I created the pull request after running the example in local mode(junit
test case).

Now I am trying to run application using APEX cli. First got the exception
about the googles guava library:




*java.lang.NoSuchMethodError:
com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor;
  at
org.elasticsearch.threadpool.ThreadPool.(ThreadPool.java:190)
  at
org.elasticsearch.client.transport.TransportClient$Builder.build(TransportClient.java:131)
  at
com.datatorrent.contrib.elasticsearch.ElasticSearchConnectable.connect(ElasticSearchConnectable.java:123)*

The reason behind above exception is : elastic search java client requires
guava 18.0 and the Hadoop 2.2 comes with guava 11.0.2.
In our application package (apa file) we have guava 18.0 jar. But i think
while launching application apex gives preference to the guava 11.0.2.

So we updated the apex-engine.jar and changed guava version to 18.0
manually which is not good practice :)
After this we are able to see that application is running in hadoop cluster
but in log there is an exception :







*2016-06-28 15:54:28,721 ERROR com.datatorrent.netlet.AbstractClient:
Exception in event loop {id=ProcessWideEventLoop, head=1, tail=1,
capacity=1024} java.net.ConnectException: Connection refused at
sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at
com.datatorrent.netlet.DefaultEventLoop.handleSelectedKey(DefaultEventLoop.java:371)
  at
com.datatorrent.netlet.OptimizedEventLoop$SelectedSelectionKeySet.forEach(OptimizedEventLoop.java:59)
  at
com.datatorrent.netlet.OptimizedEventLoop.runEventLoop(OptimizedEventLoop.java:192)*

Also tried using maven shade plugin to include guava classes in output jar
but that also not working.

Regards,
Akshay

-- 
This e-mail, including any attached files, may contain confidential and 
privileged information for the sole use of the intended recipient. Any 
review, use, distribution, or disclosure by others is strictly prohibited. 
If you are not the intended recipient (or authorized to receive information 
for the intended recipient), please contact the sender by reply e-mail and 
delete all copies of this message.