[jira] [Comment Edited] (APEXMALHAR-1818) Integrate Calcite to support SQL
[ https://issues.apache.org/jira/browse/APEXMALHAR-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505682#comment-15505682 ] Chinmay Kolhatkar edited comment on APEXMALHAR-1818 at 9/20/16 5:56 AM: Here is a quick update on Apex-Calcite integration work. Currently I'm able to run SQL statement as a DAG against registered table abstractions of data endpoint and message type. Here is the SQL support that is currently implemented: # Data Endpoint (Source/Destination): #* File #* Kafka # Message Types from Data endpoint (source/destination): #* CSV # SQL Functionality Support: #* SELECT (Projection) - Select from Source #* INSERT - Insert into Destination #* WHERE (Filter) #* Scalar functions which are provided in Calcite core #* Custom sclar function can be defined as provided to SQL. # Table can be defined as abstraction of Data Endpoint (source/dest) and message type Currently Calcite integration with Apex is exposed as a small boiler plate code in populateDAG as follows: {code} SQLExecEnvironment.getEnvironment(dag) .registerTable("ORDERS", new KafkaEndpoint(broker, sourceTopic, new CSVMessageFormat(schemaIn))) .registerTable("SALES", new KafkaEndpoint(broker, destTopic, new CSVMessageFormat(schemaOut))) .registerFunction("APEXCONCAT", FileEndpointTest.class, "apex_concat_str") .executeSQL("INSERT INTO SALES " + "SELECT STREAM ROWTIME, " + "FLOOR(ROWTIME TO DAY), " + "APEXCONCAT('OILPAINT', SUBSTRING(PRODUCT, 6, 7)) " + "FROM ORDERS WHERE ID > 3 " + "AND " + "PRODUCT LIKE 'paint%'"); {code} Following is a video recording of the demo of apex-capcite integration: https://drive.google.com/open?id=0B_Tb-ZDtsUHeUVM5NWRYSFg0Z3c Currently I'm working on addition of inner join functionality. Once the inner join functionality is implemented, I think the code is good to create a Review Only PR for first cut of calcite integration. The code is here: https://github.com/chinmaykolhatkar/apex-malhar/tree/calcite/sql [~julianhyde] [~thw] [~ilganeli] [~akekre] Please share your opinion on above. Thanks, Chinmay. was (Author: chinmay): Here is a quick update on Apex-Calcite integration work. Currently I'm able to run SQL statement as a DAG against registered table abstractions of data endpoint and message type. Here is the SQL support that is currently implemented: 1. Data Endpoint (Source/Destination): - File - Kafka 2. Message Types from Data endpoint (source/destination): - CSV 3. SQL Functionality Support: - SELECT (Projection) - Select from Source - INSERT - Insert into Destination - WHERE (Filter) - Scalar functions which are provided in Calcite core - Custom sclar function can be defined as provided to SQL. 4. Table can be defined as abstraction of Data Endpoint (source/dest) and message type Currently Calcite integration with Apex is exposed as a small boiler plate code in populateDAG as follows: {code} SQLExecEnvironment.getEnvironment(dag) .registerTable("ORDERS", new KafkaEndpoint(broker, sourceTopic, new CSVMessageFormat(schemaIn))) .registerTable("SALES", new KafkaEndpoint(broker, destTopic, new CSVMessageFormat(schemaOut))) .registerFunction("APEXCONCAT", FileEndpointTest.class, "apex_concat_str") .executeSQL("INSERT INTO SALES " + "SELECT STREAM ROWTIME, " + "FLOOR(ROWTIME TO DAY), " + "APEXCONCAT('OILPAINT', SUBSTRING(PRODUCT, 6, 7)) " + "FROM ORDERS WHERE ID > 3 " + "AND " + "PRODUCT LIKE 'paint%'"); {code} Following is a video recording of the demo of apex-capcite integration: https://drive.google.com/open?id=0B_Tb-ZDtsUHeUVM5NWRYSFg0Z3c Currently I'm working on addition of inner join functionality. Once the inner join functionality is implemented, I think the code is good to create a Review Only PR for first cut of calcite integration. The code is here: https://github.com/chinmaykolhatkar/apex-malhar/tree/calcite/sql [~julianhyde] [~thw] [~ilganeli] [~akekre] Please share your opinion on above. Thanks, Chinmay. > Integrate Calcite to support SQL > > > Key: APEXMALHAR-1818 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-1818 > Project: Apache Apex Malhar > Issue Type: New Feature > Components: query operators >Reporter: Amol >Assignee: Chinmay Kolhatkar > Labels: roadmap > > Once we have ability to generate a subdag, we should take a look at > integrating Calcite into Apex. The operator that enables populate DAG, should > use Calcite to generate the DAG, given a SQL query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (APEXMALHAR-1818) Integrate Calcite to support SQL
[ https://issues.apache.org/jira/browse/APEXMALHAR-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505682#comment-15505682 ] Chinmay Kolhatkar edited comment on APEXMALHAR-1818 at 9/20/16 5:53 AM: Here is a quick update on Apex-Calcite integration work. Currently I'm able to run SQL statement as a DAG against registered table abstractions of data endpoint and message type. Here is the SQL support that is currently implemented: 1. Data Endpoint (Source/Destination): - File - Kafka 2. Message Types from Data endpoint (source/destination): - CSV 3. SQL Functionality Support: - SELECT (Projection) - Select from Source - INSERT - Insert into Destination - WHERE (Filter) - Scalar functions which are provided in Calcite core - Custom sclar function can be defined as provided to SQL. 4. Table can be defined as abstraction of Data Endpoint (source/dest) and message type Currently Calcite integration with Apex is exposed as a small boiler plate code in populateDAG as follows: {code} SQLExecEnvironment.getEnvironment(dag) .registerTable("ORDERS", new KafkaEndpoint(broker, sourceTopic, new CSVMessageFormat(schemaIn))) .registerTable("SALES", new KafkaEndpoint(broker, destTopic, new CSVMessageFormat(schemaOut))) .registerFunction("APEXCONCAT", FileEndpointTest.class, "apex_concat_str") .executeSQL("INSERT INTO SALES " + "SELECT STREAM ROWTIME, " + "FLOOR(ROWTIME TO DAY), " + "APEXCONCAT('OILPAINT', SUBSTRING(PRODUCT, 6, 7)) " + "FROM ORDERS WHERE ID > 3 " + "AND " + "PRODUCT LIKE 'paint%'"); {code} Following is a video recording of the demo of apex-capcite integration: https://drive.google.com/open?id=0B_Tb-ZDtsUHeUVM5NWRYSFg0Z3c Currently I'm working on addition of inner join functionality. Once the inner join functionality is implemented, I think the code is good to create a Review Only PR for first cut of calcite integration. The code is here: https://github.com/chinmaykolhatkar/apex-malhar/tree/calcite/sql [~julianhyde] [~thw] [~ilganeli] [~akekre] Please share your opinion on above. Thanks, Chinmay. was (Author: chinmay): Here is a quick update on Apex-Calcite integration work. Currently I'm able to run SQL statement as a DAG against registered table abstractions of data endpoint and message type. Here is the SQL support that is currently implemented: 1. Data Endpoint (Source/Destination): - File - Kafka 2. Message Types from Data endpoint (source/destination): - CSV 3. SQL Functionality Support: - SELECT (Projection) - Select from Source - INSERT - Insert into Destination - WHERE (Filter) - Scalar functions which are provided in Calcite core - Custom sclar function can be defined as provided to SQL. 4. Table can be defined as abstraction of Data Endpoint (source/dest) and message type Currently Calcite integration with Apex is exposed as a small boiler plate code in populateDAG as follows: {code} SQLExecEnvironment.getEnvironment(dag) .registerTable("ORDERS", new KafkaEndpoint(broker, sourceTopic, new CSVMessageFormat(schemaIn))) .registerTable("SALES", new KafkaEndpoint(broker, destTopic, new CSVMessageFormat(schemaOut))) .registerFunction("APEXCONCAT", FileEndpointTest.class, "apex_concat_str") .executeSQL("INSERT INTO SALES " + "SELECT STREAM ROWTIME, " + "FLOOR(ROWTIME TO DAY), " + "APEXCONCAT('OILPAINT', SUBSTRING(PRODUCT, 6, 7)) " + "FROM ORDERS WHERE ID > 3 " + "AND " + "PRODUCT LIKE 'paint%'"); {code} Following is a video recording of the demo of apex-capcite integration: https://drive.google.com/open?id=0B_Tb-ZDtsUHeUVM5NWRYSFg0Z3c Currently I'm working on addition of inner join functionality. Once the inner join functionality is implemented, I think the code is good to create a Review Only PR for first cut of calcite integration. The code is here: https://github.com/chinmaykolhatkar/apex-malhar/tree/calcite/sql [~julianhyde] [~thw] [~ilganeli] [~akekre] Please share your opinion on above. Thanks, Chinmay. > Integrate Calcite to support SQL > > > Key: APEXMALHAR-1818 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-1818 > Project: Apache Apex Malhar > Issue Type: New Feature > Components: query operators >Reporter: Amol >Assignee: Chinmay Kolhatkar > Labels: roadmap > > Once we have ability to generate a subdag, we should take a look at > integrating Calcite into Apex. The operator that enables populate DAG, should > use Calcite to generate the DAG, given a SQL query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXMALHAR-1818) Integrate Calcite to support SQL
[ https://issues.apache.org/jira/browse/APEXMALHAR-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505682#comment-15505682 ] Chinmay Kolhatkar commented on APEXMALHAR-1818: --- Here is a quick update on Apex-Calcite integration work. Currently I'm able to run SQL statement as a DAG against registered table abstractions of data endpoint and message type. Here is the SQL support that is currently implemented: 1. Data Endpoint (Source/Destination): - File - Kafka 2. Message Types from Data endpoint (source/destination): - CSV 3. SQL Functionality Support: - SELECT (Projection) - Select from Source - INSERT - Insert into Destination - WHERE (Filter) - Scalar functions which are provided in Calcite core - Custom sclar function can be defined as provided to SQL. 4. Table can be defined as abstraction of Data Endpoint (source/dest) and message type Currently Calcite integration with Apex is exposed as a small boiler plate code in populateDAG as follows: {code} SQLExecEnvironment.getEnvironment(dag) .registerTable("ORDERS", new KafkaEndpoint(broker, sourceTopic, new CSVMessageFormat(schemaIn))) .registerTable("SALES", new KafkaEndpoint(broker, destTopic, new CSVMessageFormat(schemaOut))) .registerFunction("APEXCONCAT", FileEndpointTest.class, "apex_concat_str") .executeSQL("INSERT INTO SALES " + "SELECT STREAM ROWTIME, " + "FLOOR(ROWTIME TO DAY), " + "APEXCONCAT('OILPAINT', SUBSTRING(PRODUCT, 6, 7)) " + "FROM ORDERS WHERE ID > 3 " + "AND " + "PRODUCT LIKE 'paint%'"); {code} Following is a video recording of the demo of apex-capcite integration: https://drive.google.com/open?id=0B_Tb-ZDtsUHeUVM5NWRYSFg0Z3c Currently I'm working on addition of inner join functionality. Once the inner join functionality is implemented, I think the code is good to create a Review Only PR for first cut of calcite integration. The code is here: https://github.com/chinmaykolhatkar/apex-malhar/tree/calcite/sql [~julianhyde] [~thw] [~ilganeli] [~akekre] Please share your opinion on above. Thanks, Chinmay. > Integrate Calcite to support SQL > > > Key: APEXMALHAR-1818 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-1818 > Project: Apache Apex Malhar > Issue Type: New Feature > Components: query operators >Reporter: Amol >Assignee: Chinmay Kolhatkar > Labels: roadmap > > Once we have ability to generate a subdag, we should take a look at > integrating Calcite into Apex. The operator that enables populate DAG, should > use Calcite to generate the DAG, given a SQL query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: APEXMALHAR-1818 : Apex-Calcite Integration
Hi All, I wanted to give a quick update on Apex-Calcite integration work. Currently I'm able to run SQL statement as a DAG against registered table abstractions of data endpoint and message type. Here is the SQL support that is currently implemented: 1. Data Endpoint (Source/Destination): - File - Kafka 2. Message Types from Data endpoint (source/destination): - CSV 3. SQL Functionality Support: - SELECT (Projection) - Select from Source - INSERT - Insert into Destination - WHERE (Filter) - Scalar functions which are provided in Calcite core - Custom sclar function can be defined as provided to SQL. 4. Table can be defined as abstraction of Data Endpoint (source/dest) and message type Currently Calcite integration with Apex is exposed as a small boiler plate code in populateDAG as follows: SQLExecEnvironment.getEnvironment(dag) .registerTable("ORDERS", new KafkaEndpoint(broker, sourceTopic, new CSVMessageFormat(schemaIn))) .registerTable("SALES", new KafkaEndpoint(broker, destTopic, new CSVMessageFormat(schemaOut))) .registerFunction("APEXCONCAT", FileEndpointTest.class, "apex_concat_str") .executeSQL("INSERT INTO SALES " + "SELECT STREAM ROWTIME, " + "FLOOR(ROWTIME TO DAY), " + "APEXCONCAT('OILPAINT', SUBSTRING(PRODUCT, 6, 7)) " + "FROM ORDERS WHERE ID > 3 " + "AND " + "PRODUCT LIKE 'paint%'"); Following is a video recording of the demo of apex-capcite integration: https://drive.google.com/open?id=0B_Tb-ZDtsUHeUVM5NWRYSFg0Z3c Currently I'm working on addition of inner join functionality. Once the inner join functionality is implemented, I think the code is good to create a Review Only PR for first cut of calcite integration. Please share your opinion on above. Thanks, Chinmay. On Fri, Aug 12, 2016 at 9:55 PM, Chinmay Kolhatkarwrote: > Hi All, > > I wanted to give update on Apex-Calcite Integration work being done for > visibility and feedback from the community. > > In the first phase, target is to use Calcite core library for SQL parsing > and transformation of relation algebra to apex specific component > (operators). > Once this is achieved one would be able to define input and outputs using > Calcite model file and define the processing from input to output using SQL > statement. > > The status for above work as of now is as follows: > 1. I'm able to traverse relational algebra for simple select statement. > 2. DAG is getting generated for simple statement SELECT STREAM * FROM > TABLE. > 3. DAG is getting validated. > 4. Operators are being set with properties, streams and schema is also > being set using TUPLE_CLASS attr. For schema the class is generated on the > fly and put in classpath using LIBRARY_JAR attr. > 5. Able to run generated DAG in local mode. > 6. The code is currently being developed at (WIP): > Currently for each of development and code being farely large, I've added > a new module malhar-sql in malhar in my fork. But I'm open to other > suggestions here. > https://github.com/chinmaykolhatkar/apex-malhar/tree/calcite/sql > > Next step: > 1. Run the generate DAG in distributed mode. > 2. Expand the source and destination definition (calcite model file) to > include Kafka as source schema and destination. > 3. Expand the scope to include filter operator (WHERE clause, HAVING too > if possible) and inner join when it gets merged. > 4. Write extensive unit tests for above. > > I'll send an update on this thread at every logical step of achieving > something. > > I request the community to provide the feedback on above approach/targets > and if possible take a look at the code in above link. > > Thanks, > Chinmay. > >
[GitHub] apex-malhar pull request #418: APEXMALHAR-2247 #resolve Added iteration feat...
GitHub user davidyan74 opened a pull request: https://github.com/apache/apex-malhar/pull/418 APEXMALHAR-2247 #resolve Added iteration feature in SpillableArrayListImpl and generalize SerdeListSlice to SerdeCollectionSlice @siyuanh please review and merge You can merge this pull request into a Git repository by running: $ git pull https://github.com/davidyan74/apex-malhar APEXMALHAR-2247 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-malhar/pull/418.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #418 commit 7c50c9996407cf3cd04e076580095145b8526577 Author: David YanDate: 2016-09-20T00:27:58Z APEXMALHAR-2247 #resolve Added iteration feature in SpillableArrayListImpl and generalize SerdeListSlice to SerdeCollectionSlice --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (APEXMALHAR-2247) Add iteration feature in SpillableArrayListImpl and generalize SerdeListSlice to SerdeCollectionSlice
[ https://issues.apache.org/jira/browse/APEXMALHAR-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan updated APEXMALHAR-2247: -- Summary: Add iteration feature in SpillableArrayListImpl and generalize SerdeListSlice to SerdeCollectionSlice (was: Add iteration feature in SpillableArrayListImpl) > Add iteration feature in SpillableArrayListImpl and generalize SerdeListSlice > to SerdeCollectionSlice > - > > Key: APEXMALHAR-2247 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2247 > Project: Apache Apex Malhar > Issue Type: New Feature >Reporter: David Yan >Assignee: David Yan > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (APEXMALHAR-2249) Create SpillableSet and SpillableSetMultimap interfaces and implementation
[ https://issues.apache.org/jira/browse/APEXMALHAR-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan closed APEXMALHAR-2249. - Resolution: Duplicate > Create SpillableSet and SpillableSetMultimap interfaces and implementation > -- > > Key: APEXMALHAR-2249 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2249 > Project: Apache Apex Malhar > Issue Type: Sub-task >Reporter: David Yan >Assignee: David Yan > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (APEXMALHAR-2249) Create SpillableSet and SpillableSetMultimap interfaces and implementation
David Yan created APEXMALHAR-2249: - Summary: Create SpillableSet and SpillableSetMultimap interfaces and implementation Key: APEXMALHAR-2249 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2249 Project: Apache Apex Malhar Issue Type: Sub-task Reporter: David Yan Assignee: David Yan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (APEXMALHAR-2248) Create SpillableSet and SpillableSetMultimap interfaces and implementation
David Yan created APEXMALHAR-2248: - Summary: Create SpillableSet and SpillableSetMultimap interfaces and implementation Key: APEXMALHAR-2248 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2248 Project: Apache Apex Malhar Issue Type: Sub-task Reporter: David Yan Assignee: David Yan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXMALHAR-2247) Add iteration feature in SpillableArrayListImpl and generalize SerdeListSlice to SerdeCollectionSlice
[ https://issues.apache.org/jira/browse/APEXMALHAR-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505143#comment-15505143 ] ASF GitHub Bot commented on APEXMALHAR-2247: GitHub user davidyan74 opened a pull request: https://github.com/apache/apex-malhar/pull/418 APEXMALHAR-2247 #resolve Added iteration feature in SpillableArrayListImpl and generalize SerdeListSlice to SerdeCollectionSlice @siyuanh please review and merge You can merge this pull request into a Git repository by running: $ git pull https://github.com/davidyan74/apex-malhar APEXMALHAR-2247 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-malhar/pull/418.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #418 commit 7c50c9996407cf3cd04e076580095145b8526577 Author: David YanDate: 2016-09-20T00:27:58Z APEXMALHAR-2247 #resolve Added iteration feature in SpillableArrayListImpl and generalize SerdeListSlice to SerdeCollectionSlice > Add iteration feature in SpillableArrayListImpl and generalize SerdeListSlice > to SerdeCollectionSlice > - > > Key: APEXMALHAR-2247 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2247 > Project: Apache Apex Malhar > Issue Type: New Feature >Reporter: David Yan >Assignee: David Yan > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXMALHAR-2246) The underlying map of SpillableByteArrayListMultimapImpl uses a primitive byte[] as a key, which won't work because it does not compare the underlying bytes
[ https://issues.apache.org/jira/browse/APEXMALHAR-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505125#comment-15505125 ] ASF GitHub Bot commented on APEXMALHAR-2246: GitHub user davidyan74 opened a pull request: https://github.com/apache/apex-malhar/pull/417 APEXMALHAR-2246 #resolve use Slice instead of byte[] in the underlying map of SpillableByteArrayListMultimapImpl @siyuanh Please review and merge You can merge this pull request into a Git repository by running: $ git pull https://github.com/davidyan74/apex-malhar APEXMALHAR-2246 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-malhar/pull/417.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #417 commit f59dfe1193d282fa96743be7f800b1d6e1c2882d Author: David YanDate: 2016-09-20T00:17:57Z APEXMALHAR-2246 #resolve use Slice instead of byte[] in the underlying map of SpillableByteArrayListMultimapImpl > The underlying map of SpillableByteArrayListMultimapImpl uses a primitive > byte[] as a key, which won't work because it does not compare the underlying > bytes > > > Key: APEXMALHAR-2246 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2246 > Project: Apache Apex Malhar > Issue Type: Bug >Reporter: David Yan >Assignee: David Yan > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] apex-malhar pull request #417: APEXMALHAR-2246 #resolve use Slice instead of...
GitHub user davidyan74 opened a pull request: https://github.com/apache/apex-malhar/pull/417 APEXMALHAR-2246 #resolve use Slice instead of byte[] in the underlying map of SpillableByteArrayListMultimapImpl @siyuanh Please review and merge You can merge this pull request into a Git repository by running: $ git pull https://github.com/davidyan74/apex-malhar APEXMALHAR-2246 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-malhar/pull/417.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #417 commit f59dfe1193d282fa96743be7f800b1d6e1c2882d Author: David YanDate: 2016-09-20T00:17:57Z APEXMALHAR-2246 #resolve use Slice instead of byte[] in the underlying map of SpillableByteArrayListMultimapImpl --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (APEXMALHAR-2245) WindowBoundedMapCache.remove is a no-op if the key is not in the cache, resulting in the entry not being removed in the underlying storage
[ https://issues.apache.org/jira/browse/APEXMALHAR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505111#comment-15505111 ] ASF GitHub Bot commented on APEXMALHAR-2245: GitHub user davidyan74 opened a pull request: https://github.com/apache/apex-malhar/pull/416 APEXMALHAR-2245 #resolve Add the key in removedKeys even if the key does not appear in the cache @siyuanh Please review and merge You can merge this pull request into a Git repository by running: $ git pull https://github.com/davidyan74/apex-malhar APEXMALHAR-2245 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-malhar/pull/416.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #416 commit 447b26df0f73a29e19d894663c92fd99e6384420 Author: David YanDate: 2016-09-20T00:13:22Z APEXMALHAR-2245 #resolve Add the key in removedKeys even if the key does not appear in the cache > WindowBoundedMapCache.remove is a no-op if the key is not in the cache, > resulting in the entry not being removed in the underlying storage > -- > > Key: APEXMALHAR-2245 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2245 > Project: Apache Apex Malhar > Issue Type: Bug >Reporter: David Yan >Assignee: David Yan > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] apex-malhar pull request #416: APEXMALHAR-2245 #resolve Add the key in remov...
GitHub user davidyan74 opened a pull request: https://github.com/apache/apex-malhar/pull/416 APEXMALHAR-2245 #resolve Add the key in removedKeys even if the key does not appear in the cache @siyuanh Please review and merge You can merge this pull request into a Git repository by running: $ git pull https://github.com/davidyan74/apex-malhar APEXMALHAR-2245 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-malhar/pull/416.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #416 commit 447b26df0f73a29e19d894663c92fd99e6384420 Author: David YanDate: 2016-09-20T00:13:22Z APEXMALHAR-2245 #resolve Add the key in removedKeys even if the key does not appear in the cache --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (APEXMALHAR-2246) The underlying map of SpillableByteArrayListMultimapImpl uses a primitive byte[] as a key, which won't work because it does not compare the underlying bytes
David Yan created APEXMALHAR-2246: - Summary: The underlying map of SpillableByteArrayListMultimapImpl uses a primitive byte[] as a key, which won't work because it does not compare the underlying bytes Key: APEXMALHAR-2246 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2246 Project: Apache Apex Malhar Issue Type: Bug Reporter: David Yan Assignee: David Yan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (APEXMALHAR-2191) Implement a way to get all keys with or without prefix from SpillableByteMapImpl
[ https://issues.apache.org/jira/browse/APEXMALHAR-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan resolved APEXMALHAR-2191. --- Resolution: Won't Fix Not valid any more > Implement a way to get all keys with or without prefix from > SpillableByteMapImpl > > > Key: APEXMALHAR-2191 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2191 > Project: Apache Apex Malhar > Issue Type: Sub-task >Reporter: David Yan >Assignee: David Yan >Priority: Critical > > WindowedKeyedStorage is basically a Map>, and we need the > capability of getting all K's given a window, and getting V given a window > and K. > Currently, the spillable implementation of WindowedKeyedStorage uses two > spillable data structures -- SpillableByteMapImpl , V> and > SpillableArrayListMultimapImpl . This will not work because we > need to be able to remove a key from a window, which SpillableArrayList does > not support, and having two separate spillable data structures if it can be > achieved by just one should be avoided in general. > We will implement the solution that supports prefix scanning (given a window, > return all keys), which requires the keys to be stored in order in managed > state. > Traversing all keys in the spillable map is needed by WindowStateMap. The > WindowedOperator needs a way to traverse all windows in its state when firing > a trigger. However, this is less urgent since the Window meta info is small > and should fit in memory even with millions of windows. This is more for the > checkpointing efficiency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (APEXMALHAR-2244) Optimize WindowedStorage and Spillable data structures for time series
[ https://issues.apache.org/jira/browse/APEXMALHAR-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan updated APEXMALHAR-2244: -- Description: The spillable data structures currently does not make any assumption about the key that is used in Managed State, and as a result, it uses ManagedStateImpl to interface with Managed State. But for WindowedStorage used by WindowedOperator, the key to the storage is a window, which is event time based. Using the default ManagedStateImpl would be wrong for event time based keys, since ManagedStateImpl appears to purge data based on the apex window id (process time based). In a high level, the below summarizes roughly what needs to be done: 1. a way to tell the spillable data structures to use the ManagedTimeUnifiedStateImpl 2. a way to tell the spillable data structures how to extract the timestamp from the key. Note that in the case of WindowedOperator, the timestamp should be the end timestamp of the window (beginTimeMillis + durationMillis), not the begin timestamp. 3. a way to tell the spillable data structures how to assign the time bucket given that timestamp 4. only purge a time bucket when all keys that belong to that time bucket are removed and the apex window id of the first window in which the keys are all removed has been committed was: The spillable data structures currently does not make any assumption about the key that is used in Managed State, and as a result, it uses ManagedStateImpl to interface with Managed State. But for WindowedStorage used by WindowedOperator, the key to the storage is a window, which is event time based. Using the default ManagedStateImpl would be wrong for event time based keys, since ManagedStateImpl appears to purge data based on the apex window id (process time based). In a high level, the below summarizes roughly what needs to be done: 1. a way to tell the spillable data structures to use the ManagedTimeUnifiedStateImpl 2. a way to tell the spillable data structures how to extract the timestamp from the key. Note that in the case of WindowedOperator, the timestamp should be the end timestamp of the window (beginTimeMillis + durationMillis), not the begin timestamp. 3. a way to tell the spillable data structures how to assign the time bucket given that timestamp 4. only purge a time bucket when all keys that belong to that time bucket are removed > Optimize WindowedStorage and Spillable data structures for time series > -- > > Key: APEXMALHAR-2244 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2244 > Project: Apache Apex Malhar > Issue Type: Sub-task >Reporter: David Yan >Assignee: Siyuan Hua > > The spillable data structures currently does not make any assumption about > the key that is used in Managed State, and as a result, it uses > ManagedStateImpl to interface with Managed State. But for WindowedStorage > used by WindowedOperator, the key to the storage is a window, which is event > time based. Using the default ManagedStateImpl would be wrong for event time > based keys, since ManagedStateImpl appears to purge data based on the apex > window id (process time based). > In a high level, the below summarizes roughly what needs to be done: > 1. a way to tell the spillable data structures to use the > ManagedTimeUnifiedStateImpl > 2. a way to tell the spillable data structures how to extract the timestamp > from the key. Note that in the case of WindowedOperator, the timestamp should > be the end timestamp of the window (beginTimeMillis + durationMillis), not > the begin timestamp. > 3. a way to tell the spillable data structures how to assign the time bucket > given that timestamp > 4. only purge a time bucket when all keys that belong to that time bucket are > removed and the apex window id of the first window in which the keys are all > removed has been committed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (APEXMALHAR-2244) Optimize WindowedStorage and Spillable data structures for time series
[ https://issues.apache.org/jira/browse/APEXMALHAR-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan updated APEXMALHAR-2244: -- Description: The spillable data structures currently does not make any assumption about the key that is used in Managed State, and as a result, it uses ManagedStateImpl to interface with Managed State. But for WindowedStorage used by WindowedOperator, the key to the storage is a window, which is event time based. Using the default ManagedStateImpl would be wrong for event time based keys, since ManagedStateImpl appears to purge data based on the apex window id (process time based). In a high level, the below summarizes roughly what needs to be done: 1. a way to tell the spillable data structures to use the ManagedTimeUnifiedStateImpl 2. a way to tell the spillable data structures how to extract the timestamp from the key. Note that in the case of WindowedOperator, the timestamp should be the end timestamp of the window (beginTimeMillis + durationMillis), not the begin timestamp. 3. a way to tell the spillable data structures how to assign the time bucket given that timestamp 4. only purge a time bucket when all keys that belong to that time bucket are removed was: The spillable data structures currently does not make any assumption about the key that is used in Managed State, and as a result, it uses ManagedStateImpl to interface with Managed State. But for WindowedStorage used by WindowedOperator, the key to the storage is a window, which is time based. Using the default ManagedStateImpl would be wrong for event time based keys, since ManagedStateImpl appears to purge data based on the apex window id (process time based). In a high level, the below summarizes roughly what needs to be done: 1. a way to tell the spillable data structures to use the ManagedTimeUnifiedStateImpl 2. a way to tell the spillable data structures how to extract the timestamp from the key. Note that in the case of WindowedOperator, the timestamp should be the end timestamp of the window (beginTimeMillis + durationMillis), not the begin timestamp. 3. a way to tell the spillable data structures how to assign the time bucket given that timestamp 4. only purge a time bucket when all keys that belong to that time bucket are removed > Optimize WindowedStorage and Spillable data structures for time series > -- > > Key: APEXMALHAR-2244 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2244 > Project: Apache Apex Malhar > Issue Type: Sub-task >Reporter: David Yan >Assignee: Siyuan Hua > > The spillable data structures currently does not make any assumption about > the key that is used in Managed State, and as a result, it uses > ManagedStateImpl to interface with Managed State. But for WindowedStorage > used by WindowedOperator, the key to the storage is a window, which is event > time based. Using the default ManagedStateImpl would be wrong for event time > based keys, since ManagedStateImpl appears to purge data based on the apex > window id (process time based). > In a high level, the below summarizes roughly what needs to be done: > 1. a way to tell the spillable data structures to use the > ManagedTimeUnifiedStateImpl > 2. a way to tell the spillable data structures how to extract the timestamp > from the key. Note that in the case of WindowedOperator, the timestamp should > be the end timestamp of the window (beginTimeMillis + durationMillis), not > the begin timestamp. > 3. a way to tell the spillable data structures how to assign the time bucket > given that timestamp > 4. only purge a time bucket when all keys that belong to that time bucket are > removed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (APEXMALHAR-2244) Optimize WindowedStorage and Spillable data structures for time series
David Yan created APEXMALHAR-2244: - Summary: Optimize WindowedStorage and Spillable data structures for time series Key: APEXMALHAR-2244 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2244 Project: Apache Apex Malhar Issue Type: Sub-task Reporter: David Yan Assignee: Siyuan Hua The spillable data structures currently does not make any assumption about the key that is used in Managed State, and as a result, it uses ManagedStateImpl to interface with Managed State. But for WindowedStorage used by WindowedOperator, the key to the storage is a window, which is time based. Using the default ManagedStateImpl would be wrong for event time based keys, since ManagedStateImpl appears to purge data based on the apex window id (process time based). In a high level, the below summarizes roughly what needs to be done: 1. a way to tell the spillable data structures to use the ManagedTimeUnifiedStateImpl 2. a way to tell the spillable data structures how to extract the timestamp from the key. Note that in the case of WindowedOperator, the timestamp should be the end timestamp of the window (beginTimeMillis + durationMillis), not the begin timestamp. 3. a way to tell the spillable data structures how to assign the time bucket given that timestamp 4. only purge a time bucket when all keys that belong to that time bucket are removed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXMALHAR-2243) change name StoreOperator.setExeModeStr(String) to setExecModeStr
[ https://issues.apache.org/jira/browse/APEXMALHAR-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504400#comment-15504400 ] ASF GitHub Bot commented on APEXMALHAR-2243: Github user asfgit closed the pull request at: https://github.com/apache/apex-malhar/pull/415 > change name StoreOperator.setExeModeStr(String) to setExecModeStr > - > > Key: APEXMALHAR-2243 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2243 > Project: Apache Apex Malhar > Issue Type: Bug >Reporter: bright chen >Assignee: bright chen >Priority: Minor > > change name StoreOperator.setExeModeStr(String) to setExecModeStr for > consistence. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (APEXMALHAR-2243) change name StoreOperator.setExeModeStr(String) to setExecModeStr
[ https://issues.apache.org/jira/browse/APEXMALHAR-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Weise closed APEXMALHAR-2243. Resolution: Fixed > change name StoreOperator.setExeModeStr(String) to setExecModeStr > - > > Key: APEXMALHAR-2243 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2243 > Project: Apache Apex Malhar > Issue Type: Bug >Reporter: bright chen >Assignee: bright chen >Priority: Minor > > change name StoreOperator.setExeModeStr(String) to setExecModeStr for > consistence. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] apex-malhar pull request #415: APEXMALHAR-2243 #resolve #comment change name...
Github user asfgit closed the pull request at: https://github.com/apache/apex-malhar/pull/415 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (APEXMALHAR-2243) change name StoreOperator.setExeModeStr(String) to setExecModeStr
[ https://issues.apache.org/jira/browse/APEXMALHAR-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503896#comment-15503896 ] ASF GitHub Bot commented on APEXMALHAR-2243: GitHub user brightchen opened a pull request: https://github.com/apache/apex-malhar/pull/415 APEXMALHAR-2243 #resolve #comment change name StoreOperator.setExeMod… …eStr(String) to setExecModeStr You can merge this pull request into a Git repository by running: $ git pull https://github.com/brightchen/apex-malhar APEXMALHAR-2243 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-malhar/pull/415.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #415 commit 0c4b3fce2a8f73dae18d7570ca54d76cf9098e5b Author: brightchenDate: 2016-09-19T16:17:31Z APEXMALHAR-2243 #resolve #comment change name StoreOperator.setExeModeStr(String) to setExecModeStr > change name StoreOperator.setExeModeStr(String) to setExecModeStr > - > > Key: APEXMALHAR-2243 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2243 > Project: Apache Apex Malhar > Issue Type: Bug >Reporter: bright chen >Assignee: bright chen >Priority: Minor > > change name StoreOperator.setExeModeStr(String) to setExecModeStr for > consistence. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] apex-malhar pull request #415: APEXMALHAR-2243 #resolve #comment change name...
GitHub user brightchen opened a pull request: https://github.com/apache/apex-malhar/pull/415 APEXMALHAR-2243 #resolve #comment change name StoreOperator.setExeMod⦠â¦eStr(String) to setExecModeStr You can merge this pull request into a Git repository by running: $ git pull https://github.com/brightchen/apex-malhar APEXMALHAR-2243 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-malhar/pull/415.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #415 commit 0c4b3fce2a8f73dae18d7570ca54d76cf9098e5b Author: brightchenDate: 2016-09-19T16:17:31Z APEXMALHAR-2243 #resolve #comment change name StoreOperator.setExeModeStr(String) to setExecModeStr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Assigned] (APEXMALHAR-2243) change name StoreOperator.setExeModeStr(String) to setExecModeStr
[ https://issues.apache.org/jira/browse/APEXMALHAR-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bright chen reassigned APEXMALHAR-2243: --- Assignee: bright chen > change name StoreOperator.setExeModeStr(String) to setExecModeStr > - > > Key: APEXMALHAR-2243 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2243 > Project: Apache Apex Malhar > Issue Type: Bug >Reporter: bright chen >Assignee: bright chen >Priority: Minor > > change name StoreOperator.setExeModeStr(String) to setExecModeStr for > consistence. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (APEXMALHAR-2243) change name StoreOperator.setExeModeStr(String) to setExecModeStr
bright chen created APEXMALHAR-2243: --- Summary: change name StoreOperator.setExeModeStr(String) to setExecModeStr Key: APEXMALHAR-2243 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2243 Project: Apache Apex Malhar Issue Type: Bug Reporter: bright chen Priority: Minor change name StoreOperator.setExeModeStr(String) to setExecModeStr for consistence. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXCORE-526) Publish javadoc for releases on ASF infrastructure
[ https://issues.apache.org/jira/browse/APEXCORE-526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503804#comment-15503804 ] Thomas Weise commented on APEXCORE-526: --- We can setup the project on buildbot to build javadoc:aggregate and then move it to public-html space. https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/template.conf > Publish javadoc for releases on ASF infrastructure > --- > > Key: APEXCORE-526 > URL: https://issues.apache.org/jira/browse/APEXCORE-526 > Project: Apache Apex Core > Issue Type: Documentation >Reporter: Thomas Weise > > Every release should have the javadocs published and we should have it linked > from the download page, as is the case with user docs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (APEXMALHAR-2242) Add documentation for 0.9 version of Kafka Input Operator.
Chaitanya created APEXMALHAR-2242: - Summary: Add documentation for 0.9 version of Kafka Input Operator. Key: APEXMALHAR-2242 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2242 Project: Apache Apex Malhar Issue Type: Documentation Reporter: Chaitanya Assignee: Chaitanya -- This message was sent by Atlassian JIRA (v6.3.4#6332)