[GitHub] incubator-apex-core pull request #338: APEXCORE-463 : fixed condition to all...
Github user shubham-pathak22 commented on a diff in the pull request: https://github.com/apache/incubator-apex-core/pull/338#discussion_r65658413 --- Diff: engine/src/main/java/com/datatorrent/stram/webapp/TypeGraph.java --- @@ -355,7 +356,8 @@ public int size() } Set result = new TreeSet<>(); for (TypeGraphVertex node : tgv.allInstantiableDescendants) { - if ((isAncestor(InputOperator.class.getName(), node.typeName) || !getAllInputPorts(node).isEmpty())) { + if ((isAncestor(InputOperator.class.getName(), node.typeName) || isAncestor(Module.class.getName(), node.typeName) --- End diff -- Renamed methods as suggested. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-apex-malhar pull request #308: [APEXMALHAR-2106][WIP] Support mult...
Github user ilganeli closed the pull request at: https://github.com/apache/incubator-apex-malhar/pull/308 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (APEXMALHAR-2106) Support merging multiple streams with StreamMerger
[ https://issues.apache.org/jira/browse/APEXMALHAR-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313126#comment-15313126 ] ASF GitHub Bot commented on APEXMALHAR-2106: GitHub user ilganeli opened a pull request: https://github.com/apache/incubator-apex-malhar/pull/308 [APEXMALHAR-2106][WIP] Support multiple streams in StreamMerger * Created a module which creates a binary tree of StreamMerger operators to support merging multiple streams * StreamMerger operators are presently allocated as CONTAINER_LOCAL. I would like for them to be THREAD_LOCAL but I was getting the following error when attempting this: ```Caused by: javax.validation.ValidationException: Locality THREAD_LOCAL invalid for operator OperatorMeta{name=Merger_Tier_0_#_0, operator=StreamMerger{name=null}, attributes={}} with multiple input streams as they origin from different owner OIO operators ``` * An example application is included but i'm still working on further tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ilganeli/incubator-apex-malhar APEXMALHAR-2106B Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-apex-malhar/pull/308.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #308 commit f991ecb3cd68dc56a1e544503b5bc7eedfe2d015 Author: Ilya GanelinDate: 2016-05-31T23:56:29Z Updated KeyValPair to override hashcode to support sticky partitining by key. commit 7dc1b463ed13502ea6fbb5d4d2588c3a28dd2f09 Author: Ilya Ganelin Date: 2016-06-02T21:29:47Z Created a module to merge multiple streams into a single stream. Created a test application demonstrating this in action. Still need to add more tests. commit 429832f91b95169daf9986aa6bc5c965721d9626 Author: Priyanka Gugale Date: 2016-05-19T00:52:08Z APEXMALHAR-2096: Add property blocksThreshold to limit input rate commit 1352e3f92aef2f6d314230dea81408a409d09466 Author: Tushar R. Gosavi Date: 2016-06-01T09:32:27Z APEXMALHAR-998 extend second type parameter from Object commit a8122d7ceee14d4c732657250848a1009dc8c14d Author: Ilya Ganelin Date: 2016-06-02T21:36:02Z Merge branch 'master' into APEXMALHAR-2106B commit eda0eb4fe47f16cd1c56392f3db879f5e9b1d2bd Author: Ilya Ganelin Date: 2016-06-02T21:37:50Z Removed extra space. > Support merging multiple streams with StreamMerger > --- > > Key: APEXMALHAR-2106 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2106 > Project: Apache Apex Malhar > Issue Type: New Feature >Reporter: Ilya Ganelin >Assignee: Ilya Ganelin > > To properly implement the Flatten transformation (and other Stream > combination operations), Apex must support merging data from multiple > sources. The StreamMerger operator can be improved to merge multiple streams, > rather than just the two streams it can handle in the present implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Parquet output operator ?
Anybody know if there are plans for a Parquet writer operator ? If so, can anyone share status and timeline ? Thanks. Ram
[jira] [Commented] (APEXMALHAR-2099) Identify overlap between Beam API and existing Apex Stream API
[ https://issues.apache.org/jira/browse/APEXMALHAR-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312803#comment-15312803 ] Ilya Ganelin commented on APEXMALHAR-2099: -- [~siyuan] Is there additional work that can be broken out from any of these components? > Identify overlap between Beam API and existing Apex Stream API > -- > > Key: APEXMALHAR-2099 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2099 > Project: Apache Apex Malhar > Issue Type: Sub-task >Reporter: Ilya Ganelin > > There should be some overlap between the Beam API and the recently released > Apex Stream API. This task captures the need to understand and document this > overlap. > AC: > * A document or JIRA comment identifying which components of the Beam API are > implement, similar, or absent within the Apex Stream API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXMALHAR-2085) Implement Windowed Operators
[ https://issues.apache.org/jira/browse/APEXMALHAR-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312801#comment-15312801 ] David Yan commented on APEXMALHAR-2085: --- Also, since the window must be assigned before any "GroupBy" operations, we probably need to add the GroupBy function to the second operator as well. So if groupby is needed, it will act as the key to the Dimension Store, together with the window. Otherwise, the key will just be the window. > Implement Windowed Operators > > > Key: APEXMALHAR-2085 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2085 > Project: Apache Apex Malhar > Issue Type: New Feature >Reporter: Siyuan Hua >Assignee: Siyuan Hua > > As per our recent several discussions in the community. A group of Windowed > Operators that delivers the window semantic follows the google Data Flow > model(https://cloud.google.com/dataflow/) is very important. > The operators should be designed and implemented in a way for > High-level API > Beam translation > Easy to use with other popular operator > {panel:title=Operator Hierarchy} > Hierarchy of the operators, > The windowed operators should cover all possible transformations that require > window, and batch processing is also considered as special window called > global window > {code} >+---+ >+-> | WindowedOperator | <+ >| ++--+ | >|^ ^+ >|| | | >|| | | > +--+++--+--+ +---+-++--+-+ > |CombineOperator||GroupOperator| |KeyedOperator||JoinOperator| > +---++-+ +--+--++-+--+ >+-^ ^ ^ >| | | > ++---+ +-++ +++ > |KeyedCombine| |KeyedGroup| | CoGroup | > ++ +--+ +-+ > {code} > Combine operation includes all operations that combine all tuples in one > window into one or small number of tuples, Group operation group all tuples > in one window, Join and CoGroup are used to join and group tuples from > different inputs. > {panel} > {panel:title=Components} > * Window Component > It includes configuration, window state that should be checkpointed, etc. It > should support NonMergibleWindow(fixed or slide) MergibleWindow(Session) > * Trigger > It should support early trigger, late trigger with customizable trigger > behaviour > * Other related components: > ** Watermark generator, can be plugged into input source to generate watermark > ** Tuple schema support: > It should handle either predefined tuple type or give a declarative API to > describe the user defined tuple class > {panel} > Most component API should be reused in High-Level API > This is the umbrella ticket, separate tickets would be created for different > components and operators respectively -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (APEXMALHAR-2066) Add jdbc poller input operator
[ https://issues.apache.org/jira/browse/APEXMALHAR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] devendra tagare updated APEXMALHAR-2066: Description: Create a JDBC poller input operator that has the following features. 1. poll from external jdbc store asynchronously in the input operator. 2. polling frequency and batch size should be configurable. 3. should be idempotent. 4. should be partition-able. 5. should be batch + polling capable. Assumptions for idempotency & partitioning, 1.User needs to provide tableName,dbConnection,setEmitColumnList,look-up key. 2.Optionally batchSize,pollInterval,Look-up key and a where clause can be given. 3.This operator uses static partitioning to arrive at range queries for exactly once reads 4.Assumption is that there is an ordered column using which range queries can be formed 5.If an emitColumnList is provided, please ensure that the keyColumn is the first column in the list 6.Range queries are formed using the JdbcMetaDataUtility Output - comma separated list of the emit columns eg columnA,columnB,columnC Per window the first and the last key processed is saved using the FSWindowDataManager - (,operatorId,windowId).This (lowerBound,upperBoundPair) is then used for recovery.The queries are constructed using the JDBCMetaDataUtility. JDBCMetaDataUtility A utility class used to retrieve the metadata for a given unique key of a SQL table. This class would emit range queries based on a primary index given. was: Create a JDBC poller input operator that has the following features. 1. poll from external jdbc store asynchronously in the input operator. 2. polling frequency and batch size should be configurable. 3. should be idempotent. 4. should be partition-able. 5. should be batch + polling capable. > Add jdbc poller input operator > -- > > Key: APEXMALHAR-2066 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2066 > Project: Apache Apex Malhar > Issue Type: Task >Reporter: Ashwin Chandra Putta >Assignee: devendra tagare > > Create a JDBC poller input operator that has the following features. > 1. poll from external jdbc store asynchronously in the input operator. > 2. polling frequency and batch size should be configurable. > 3. should be idempotent. > 4. should be partition-able. > 5. should be batch + polling capable. > Assumptions for idempotency & partitioning, > 1.User needs to provide tableName,dbConnection,setEmitColumnList,look-up key. > 2.Optionally batchSize,pollInterval,Look-up key and a where clause can be > given. > 3.This operator uses static partitioning to arrive at range queries for > exactly once reads > 4.Assumption is that there is an ordered column using which range queries can > be formed > 5.If an emitColumnList is provided, please ensure that the keyColumn is the > first column in the list > 6.Range queries are formed using the JdbcMetaDataUtility Output - comma > separated list of the emit columns eg columnA,columnB,columnC > Per window the first and the last key processed is saved using the > FSWindowDataManager - ( ,operatorId,windowId).This > (lowerBound,upperBoundPair) is then used for recovery.The queries are > constructed using the JDBCMetaDataUtility. > JDBCMetaDataUtility > A utility class used to retrieve the metadata for a given unique key of a SQL > table. This class would emit range queries based on a primary index given. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXMALHAR-2094) Quantiles sketch operator
[ https://issues.apache.org/jira/browse/APEXMALHAR-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312673#comment-15312673 ] ASF GitHub Bot commented on APEXMALHAR-2094: Github user sandeep-n commented on a diff in the pull request: https://github.com/apache/incubator-apex-malhar/pull/301#discussion_r65579101 --- Diff: sketches/src/main/java/org/apache/apex/malhar/sketches/QuantilesEstimator.java --- @@ -0,0 +1,182 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.apex.malhar.sketches; + +import com.yahoo.sketches.quantiles.QuantilesSketch; + +import com.datatorrent.api.DefaultInputPort; +import com.datatorrent.api.DefaultOutputPort; +import com.datatorrent.api.annotation.OperatorAnnotation; +import com.datatorrent.common.util.BaseOperator; + +/** + * An implementation of BaseOperator that computes a "sketch" (a representation of the probability distribution using + * a low memory footprint) of the incoming numeric data, and evaluates/outputs the cumulative distribution function and + * quantiles of the probability distribution. Leverages the quantiles sketch implementation from the Yahoo Datasketches + * Library. + * + * Input Port(s) : + * data : Data values input port. + * + * Output Port(s) : + * cdfOutput : cumulative distribution function output port. + * quantilesOutput : quantiles output port. + * + * Partitions : No, no will yield wrong results. + * + + */ +@OperatorAnnotation(partitionable = false) +public class QuantilesEstimator extends BaseOperator +{ + /** + * Output port that emits cdf estimated at the current data point + */ + public final transient DefaultOutputPort cdfOutput = new DefaultOutputPort<>(); + /** + * Emits quantiles of stream seen thus far + */ + public final transient DefaultOutputPortquantilesOutput = new DefaultOutputPort<>(); + /** + * Emits probability masses on specified intervals + */ + public final transient DefaultOutputPort pmfOutput = new DefaultOutputPort<>(); + private transient QuantilesSketch quantilesSketch = QuantilesSketch.builder().build(); + /** + * This field determines the specific quantiles to be calculated. + * Default is set to compute the standard quartiles. + */ + private double[] fractions = {0.0, 0.25, 0.50, 0.75, 1.00}; + /** + * This field determines the intervals on which the probability mass function is computed. + */ + private double[] pmfIntervals = {}; + /** + * This operator computes three different quantities which are output on separate output ports. If not using any of + * these quantities, these variables can be set to avoid unnecessary computation. + */ + private boolean computeCdf = true; + private boolean computeQuantiles = true; + private boolean computePmf = true; + public final transient DefaultInputPort data = new DefaultInputPort() + { +@Override +public void process(Double input) +{ + + quantilesSketch.update(input); + + if (computeQuantiles) { +/** + * Computes and emits quantiles of the stream seen thus far + */ +quantilesOutput.emit(quantilesSketch.getQuantiles(fractions)); + } + + if (computeCdf) { +/** + * Emits (estimate of the) cumulative distribution function evaluated at the input value, according to the + * sketched probability distribution of the stream seen thus far. + */ +cdfOutput.emit(quantilesSketch.getCDF(new double[]{input})[0]); --- End diff -- This looks ugly, but the getCDF method belongs to the QuantileSketch class from the external DataSketches library, so I'd rather stick with this signature. > Quantiles sketch operator >
[GitHub] incubator-apex-malhar pull request #301: [APEXMALHAR-2094] Quantiles operato...
Github user sandeep-n commented on a diff in the pull request: https://github.com/apache/incubator-apex-malhar/pull/301#discussion_r65579101 --- Diff: sketches/src/main/java/org/apache/apex/malhar/sketches/QuantilesEstimator.java --- @@ -0,0 +1,182 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.apex.malhar.sketches; + +import com.yahoo.sketches.quantiles.QuantilesSketch; + +import com.datatorrent.api.DefaultInputPort; +import com.datatorrent.api.DefaultOutputPort; +import com.datatorrent.api.annotation.OperatorAnnotation; +import com.datatorrent.common.util.BaseOperator; + +/** + * An implementation of BaseOperator that computes a "sketch" (a representation of the probability distribution using + * a low memory footprint) of the incoming numeric data, and evaluates/outputs the cumulative distribution function and + * quantiles of the probability distribution. Leverages the quantiles sketch implementation from the Yahoo Datasketches + * Library. + * + * Input Port(s) : + * data : Data values input port. + * + * Output Port(s) : + * cdfOutput : cumulative distribution function output port. + * quantilesOutput : quantiles output port. + * + * Partitions : No, no will yield wrong results. + * + + */ +@OperatorAnnotation(partitionable = false) +public class QuantilesEstimator extends BaseOperator +{ + /** + * Output port that emits cdf estimated at the current data point + */ + public final transient DefaultOutputPort cdfOutput = new DefaultOutputPort<>(); + /** + * Emits quantiles of stream seen thus far + */ + public final transient DefaultOutputPortquantilesOutput = new DefaultOutputPort<>(); + /** + * Emits probability masses on specified intervals + */ + public final transient DefaultOutputPort pmfOutput = new DefaultOutputPort<>(); + private transient QuantilesSketch quantilesSketch = QuantilesSketch.builder().build(); + /** + * This field determines the specific quantiles to be calculated. + * Default is set to compute the standard quartiles. + */ + private double[] fractions = {0.0, 0.25, 0.50, 0.75, 1.00}; + /** + * This field determines the intervals on which the probability mass function is computed. + */ + private double[] pmfIntervals = {}; + /** + * This operator computes three different quantities which are output on separate output ports. If not using any of + * these quantities, these variables can be set to avoid unnecessary computation. + */ + private boolean computeCdf = true; + private boolean computeQuantiles = true; + private boolean computePmf = true; + public final transient DefaultInputPort data = new DefaultInputPort() + { +@Override +public void process(Double input) +{ + + quantilesSketch.update(input); + + if (computeQuantiles) { +/** + * Computes and emits quantiles of the stream seen thus far + */ +quantilesOutput.emit(quantilesSketch.getQuantiles(fractions)); + } + + if (computeCdf) { +/** + * Emits (estimate of the) cumulative distribution function evaluated at the input value, according to the + * sketched probability distribution of the stream seen thus far. + */ +cdfOutput.emit(quantilesSketch.getCDF(new double[]{input})[0]); --- End diff -- This looks ugly, but the getCDF method belongs to the QuantileSketch class from the external DataSketches library, so I'd rather stick with this signature. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA
[jira] [Commented] (APEXMALHAR-2099) Identify overlap between Beam API and existing Apex Stream API
[ https://issues.apache.org/jira/browse/APEXMALHAR-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312609#comment-15312609 ] Siyuan Hua commented on APEXMALHAR-2099: [~ilganeli] Thanks for bring this up, I should have communicated more on my side :) Well, Here is the progress right now, [~davidyan] is going to start working on WindowedOperator today, [~thw] already has done some work in Beam translation, and I still focus on high-level API (working on the trigger expression now, precisely speaking :) ). I wouldn't say the design above is the final agreement, when we work on it, we might change something. It's a big feature and the completeness of Data Flow Model is very good. The ultimate goal is to have a Windowed Operator family that can be shared in both High-Level API and Beam translation. [~thw] I agree that maybe DSL is better than Java in this case and flexibility should be there in basic APIs > Identify overlap between Beam API and existing Apex Stream API > -- > > Key: APEXMALHAR-2099 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2099 > Project: Apache Apex Malhar > Issue Type: Sub-task >Reporter: Ilya Ganelin > > There should be some overlap between the Beam API and the recently released > Apex Stream API. This task captures the need to understand and document this > overlap. > AC: > * A document or JIRA comment identifying which components of the Beam API are > implement, similar, or absent within the Apex Stream API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (APEXMALHAR-2108) Centralize javadoc extractor setup for pom.xml
Thomas Weise created APEXMALHAR-2108: Summary: Centralize javadoc extractor setup for pom.xml Key: APEXMALHAR-2108 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2108 Project: Apache Apex Malhar Issue Type: Task Reporter: Thomas Weise Every build module contains the same boilerplate code for javadoc extraction and inclusion into the jar artifact. The relevant configuration should be moved to the parent pom. A property to turn this on/off should also be added for optimizing build time in dev environment or for those child modules that don't need it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-apex-malhar pull request #289: APEXMALHAR-2087 Hive output module
Github user asfgit closed the pull request at: https://github.com/apache/incubator-apex-malhar/pull/289 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (APEXMALHAR-2107) Apex operator for Apache Ignite
[ https://issues.apache.org/jira/browse/APEXMALHAR-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Weise updated APEXMALHAR-2107: - Assignee: Vladisav Jelisavcic > Apex operator for Apache Ignite > --- > > Key: APEXMALHAR-2107 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2107 > Project: Apache Apex Malhar > Issue Type: Sub-task > Components: query operators >Reporter: Vladisav Jelisavcic >Assignee: Vladisav Jelisavcic > > Apache Ignite (https://ignite.apache.org/) operator support for Apex. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-apex-core pull request #338: APEXCORE-463 : fixed condition to all...
Github user PramodSSImmaneni commented on a diff in the pull request: https://github.com/apache/incubator-apex-core/pull/338#discussion_r65550916 --- Diff: engine/src/main/java/com/datatorrent/stram/webapp/TypeGraph.java --- @@ -355,7 +356,8 @@ public int size() } Set result = new TreeSet<>(); for (TypeGraphVertex node : tgv.allInstantiableDescendants) { - if ((isAncestor(InputOperator.class.getName(), node.typeName) || !getAllInputPorts(node).isEmpty())) { + if ((isAncestor(InputOperator.class.getName(), node.typeName) || isAncestor(Module.class.getName(), node.typeName) --- End diff -- @siyuanh are you suggesting name of this method be changed from getAllDTInstantiableOperators. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-apex-malhar pull request #290: JdbcPOJOInputOperator polling fix
Github user asfgit closed the pull request at: https://github.com/apache/incubator-apex-malhar/pull/290 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---