[jira] [Comment Edited] (APEXMALHAR-1818) Integrate Calcite to support SQL

2016-09-19 Thread Chinmay Kolhatkar (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505682#comment-15505682
 ] 

Chinmay Kolhatkar edited comment on APEXMALHAR-1818 at 9/20/16 5:56 AM:


Here is a quick update on Apex-Calcite integration work.

Currently I'm able to run SQL statement as a DAG against registered table 
abstractions of data endpoint and message type.

Here is the SQL support that is currently implemented:
# Data Endpoint (Source/Destination):
#* File
#* Kafka
# Message Types from Data endpoint (source/destination):
#* CSV
# SQL Functionality Support:
#* SELECT (Projection) - Select from Source
#* INSERT - Insert into Destination
#* WHERE (Filter)
#* Scalar functions which are provided in Calcite core
#* Custom sclar function can be defined as provided to SQL.
# Table can be defined as abstraction of Data Endpoint (source/dest) and 
message type

Currently Calcite integration with Apex is exposed as a small boiler plate code 
in populateDAG as follows:
{code}
SQLExecEnvironment.getEnvironment(dag)
  .registerTable("ORDERS", new KafkaEndpoint(broker, sourceTopic, new 
CSVMessageFormat(schemaIn)))
  .registerTable("SALES", new KafkaEndpoint(broker, destTopic, new 
CSVMessageFormat(schemaOut)))
  .registerFunction("APEXCONCAT", FileEndpointTest.class, 
"apex_concat_str")
  .executeSQL("INSERT INTO SALES " + "SELECT STREAM ROWTIME, " + 
"FLOOR(ROWTIME TO DAY), " +
  "APEXCONCAT('OILPAINT', SUBSTRING(PRODUCT, 6, 7)) " + "FROM 
ORDERS WHERE ID > 3 " + "AND " +
  "PRODUCT LIKE 'paint%'");
{code}
Following is a video recording of the demo of apex-capcite integration:
https://drive.google.com/open?id=0B_Tb-ZDtsUHeUVM5NWRYSFg0Z3c

Currently I'm working on addition of inner join functionality.
Once the inner join functionality is implemented, I think the code is good to 
create a Review Only PR for first cut of calcite integration.

The code is here: 
https://github.com/chinmaykolhatkar/apex-malhar/tree/calcite/sql

[~julianhyde] [~thw] [~ilganeli] [~akekre] Please share your opinion on above.

Thanks,
Chinmay.



was (Author: chinmay):
Here is a quick update on Apex-Calcite integration work.

Currently I'm able to run SQL statement as a DAG against registered table 
abstractions of data endpoint and message type.

Here is the SQL support that is currently implemented:
1. Data Endpoint (Source/Destination):
 - File
 - Kafka
2. Message Types from Data endpoint (source/destination):
 - CSV
3. SQL Functionality Support:
 - SELECT (Projection) - Select from Source
 - INSERT - Insert into Destination
 - WHERE (Filter)
 - Scalar functions which are provided in Calcite core
 - Custom sclar function can be defined as provided to SQL.
4. Table can be defined as abstraction of Data Endpoint (source/dest) and 
message type

Currently Calcite integration with Apex is exposed as a small boiler plate code 
in populateDAG as follows:
{code}
SQLExecEnvironment.getEnvironment(dag)
  .registerTable("ORDERS", new KafkaEndpoint(broker, sourceTopic, new 
CSVMessageFormat(schemaIn)))
  .registerTable("SALES", new KafkaEndpoint(broker, destTopic, new 
CSVMessageFormat(schemaOut)))
  .registerFunction("APEXCONCAT", FileEndpointTest.class, 
"apex_concat_str")
  .executeSQL("INSERT INTO SALES " + "SELECT STREAM ROWTIME, " + 
"FLOOR(ROWTIME TO DAY), " +
  "APEXCONCAT('OILPAINT', SUBSTRING(PRODUCT, 6, 7)) " + "FROM 
ORDERS WHERE ID > 3 " + "AND " +
  "PRODUCT LIKE 'paint%'");
{code}
Following is a video recording of the demo of apex-capcite integration:
https://drive.google.com/open?id=0B_Tb-ZDtsUHeUVM5NWRYSFg0Z3c

Currently I'm working on addition of inner join functionality.
Once the inner join functionality is implemented, I think the code is good to 
create a Review Only PR for first cut of calcite integration.

The code is here: 
https://github.com/chinmaykolhatkar/apex-malhar/tree/calcite/sql

[~julianhyde] [~thw] [~ilganeli] [~akekre] Please share your opinion on above.

Thanks,
Chinmay.


> Integrate Calcite to support SQL
> 
>
> Key: APEXMALHAR-1818
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-1818
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>  Components: query operators
>Reporter: Amol
>Assignee: Chinmay Kolhatkar
>  Labels: roadmap
>
> Once we have ability to generate a subdag, we should take a look at 
> integrating Calcite into Apex. The operator that enables populate DAG, should 
> use Calcite to generate the DAG, given a SQL query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (APEXMALHAR-1818) Integrate Calcite to support SQL

2016-09-19 Thread Chinmay Kolhatkar (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505682#comment-15505682
 ] 

Chinmay Kolhatkar edited comment on APEXMALHAR-1818 at 9/20/16 5:53 AM:


Here is a quick update on Apex-Calcite integration work.

Currently I'm able to run SQL statement as a DAG against registered table 
abstractions of data endpoint and message type.

Here is the SQL support that is currently implemented:
1. Data Endpoint (Source/Destination):
 - File
 - Kafka
2. Message Types from Data endpoint (source/destination):
 - CSV
3. SQL Functionality Support:
 - SELECT (Projection) - Select from Source
 - INSERT - Insert into Destination
 - WHERE (Filter)
 - Scalar functions which are provided in Calcite core
 - Custom sclar function can be defined as provided to SQL.
4. Table can be defined as abstraction of Data Endpoint (source/dest) and 
message type

Currently Calcite integration with Apex is exposed as a small boiler plate code 
in populateDAG as follows:
{code}
SQLExecEnvironment.getEnvironment(dag)
  .registerTable("ORDERS", new KafkaEndpoint(broker, sourceTopic, new 
CSVMessageFormat(schemaIn)))
  .registerTable("SALES", new KafkaEndpoint(broker, destTopic, new 
CSVMessageFormat(schemaOut)))
  .registerFunction("APEXCONCAT", FileEndpointTest.class, 
"apex_concat_str")
  .executeSQL("INSERT INTO SALES " + "SELECT STREAM ROWTIME, " + 
"FLOOR(ROWTIME TO DAY), " +
  "APEXCONCAT('OILPAINT', SUBSTRING(PRODUCT, 6, 7)) " + "FROM 
ORDERS WHERE ID > 3 " + "AND " +
  "PRODUCT LIKE 'paint%'");
{code}
Following is a video recording of the demo of apex-capcite integration:
https://drive.google.com/open?id=0B_Tb-ZDtsUHeUVM5NWRYSFg0Z3c

Currently I'm working on addition of inner join functionality.
Once the inner join functionality is implemented, I think the code is good to 
create a Review Only PR for first cut of calcite integration.

The code is here: 
https://github.com/chinmaykolhatkar/apex-malhar/tree/calcite/sql

[~julianhyde] [~thw] [~ilganeli] [~akekre] Please share your opinion on above.

Thanks,
Chinmay.



was (Author: chinmay):
Here is a quick update on Apex-Calcite integration work.

Currently I'm able to run SQL statement as a DAG against registered table 
abstractions of data endpoint and message type.

Here is the SQL support that is currently implemented:
1. Data Endpoint (Source/Destination):
   - File
   - Kafka
2. Message Types from Data endpoint (source/destination):
   - CSV
3. SQL Functionality Support:
   - SELECT (Projection) - Select from Source
   - INSERT - Insert into Destination
   - WHERE (Filter)
   - Scalar functions which are provided in Calcite core
   - Custom sclar function can be defined as provided to SQL.
4. Table can be defined as abstraction of Data Endpoint (source/dest) and 
message type

Currently Calcite integration with Apex is exposed as a small boiler plate code 
in populateDAG as follows:
{code}
SQLExecEnvironment.getEnvironment(dag)
  .registerTable("ORDERS", new KafkaEndpoint(broker, sourceTopic, new 
CSVMessageFormat(schemaIn)))
  .registerTable("SALES", new KafkaEndpoint(broker, destTopic, new 
CSVMessageFormat(schemaOut)))
  .registerFunction("APEXCONCAT", FileEndpointTest.class, 
"apex_concat_str")
  .executeSQL("INSERT INTO SALES " + "SELECT STREAM ROWTIME, " + 
"FLOOR(ROWTIME TO DAY), " +
  "APEXCONCAT('OILPAINT', SUBSTRING(PRODUCT, 6, 7)) " + "FROM 
ORDERS WHERE ID > 3 " + "AND " +
  "PRODUCT LIKE 'paint%'");
{code}
Following is a video recording of the demo of apex-capcite integration:
https://drive.google.com/open?id=0B_Tb-ZDtsUHeUVM5NWRYSFg0Z3c

Currently I'm working on addition of inner join functionality.
Once the inner join functionality is implemented, I think the code is good to 
create a Review Only PR for first cut of calcite integration.

The code is here: 
https://github.com/chinmaykolhatkar/apex-malhar/tree/calcite/sql

[~julianhyde] [~thw] [~ilganeli] [~akekre] Please share your opinion on above.

Thanks,
Chinmay.


> Integrate Calcite to support SQL
> 
>
> Key: APEXMALHAR-1818
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-1818
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>  Components: query operators
>Reporter: Amol
>Assignee: Chinmay Kolhatkar
>  Labels: roadmap
>
> Once we have ability to generate a subdag, we should take a look at 
> integrating Calcite into Apex. The operator that enables populate DAG, should 
> use Calcite to generate the DAG, given a SQL query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-1818) Integrate Calcite to support SQL

2016-09-19 Thread Chinmay Kolhatkar (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505682#comment-15505682
 ] 

Chinmay Kolhatkar commented on APEXMALHAR-1818:
---

Here is a quick update on Apex-Calcite integration work.

Currently I'm able to run SQL statement as a DAG against registered table 
abstractions of data endpoint and message type.

Here is the SQL support that is currently implemented:
1. Data Endpoint (Source/Destination):
   - File
   - Kafka
2. Message Types from Data endpoint (source/destination):
   - CSV
3. SQL Functionality Support:
   - SELECT (Projection) - Select from Source
   - INSERT - Insert into Destination
   - WHERE (Filter)
   - Scalar functions which are provided in Calcite core
   - Custom sclar function can be defined as provided to SQL.
4. Table can be defined as abstraction of Data Endpoint (source/dest) and 
message type

Currently Calcite integration with Apex is exposed as a small boiler plate code 
in populateDAG as follows:
{code}
SQLExecEnvironment.getEnvironment(dag)
  .registerTable("ORDERS", new KafkaEndpoint(broker, sourceTopic, new 
CSVMessageFormat(schemaIn)))
  .registerTable("SALES", new KafkaEndpoint(broker, destTopic, new 
CSVMessageFormat(schemaOut)))
  .registerFunction("APEXCONCAT", FileEndpointTest.class, 
"apex_concat_str")
  .executeSQL("INSERT INTO SALES " + "SELECT STREAM ROWTIME, " + 
"FLOOR(ROWTIME TO DAY), " +
  "APEXCONCAT('OILPAINT', SUBSTRING(PRODUCT, 6, 7)) " + "FROM 
ORDERS WHERE ID > 3 " + "AND " +
  "PRODUCT LIKE 'paint%'");
{code}
Following is a video recording of the demo of apex-capcite integration:
https://drive.google.com/open?id=0B_Tb-ZDtsUHeUVM5NWRYSFg0Z3c

Currently I'm working on addition of inner join functionality.
Once the inner join functionality is implemented, I think the code is good to 
create a Review Only PR for first cut of calcite integration.

The code is here: 
https://github.com/chinmaykolhatkar/apex-malhar/tree/calcite/sql

[~julianhyde] [~thw] [~ilganeli] [~akekre] Please share your opinion on above.

Thanks,
Chinmay.


> Integrate Calcite to support SQL
> 
>
> Key: APEXMALHAR-1818
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-1818
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>  Components: query operators
>Reporter: Amol
>Assignee: Chinmay Kolhatkar
>  Labels: roadmap
>
> Once we have ability to generate a subdag, we should take a look at 
> integrating Calcite into Apex. The operator that enables populate DAG, should 
> use Calcite to generate the DAG, given a SQL query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: APEXMALHAR-1818 : Apex-Calcite Integration

2016-09-19 Thread Chinmay Kolhatkar
Hi All,

I wanted to give a quick update on Apex-Calcite integration work.

Currently I'm able to run SQL statement as a DAG against registered table
abstractions of data endpoint and message type.

Here is the SQL support that is currently implemented:
1. Data Endpoint (Source/Destination):
   - File
   - Kafka
2. Message Types from Data endpoint (source/destination):
   - CSV
3. SQL Functionality Support:
   - SELECT (Projection) - Select from Source
   - INSERT - Insert into Destination
   - WHERE (Filter)
   - Scalar functions which are provided in Calcite core
   - Custom sclar function can be defined as provided to SQL.
4. Table can be defined as abstraction of Data Endpoint (source/dest) and
message type

Currently Calcite integration with Apex is exposed as a small boiler plate
code in populateDAG as follows:

SQLExecEnvironment.getEnvironment(dag)
  .registerTable("ORDERS", new KafkaEndpoint(broker, sourceTopic,
new CSVMessageFormat(schemaIn)))
  .registerTable("SALES", new KafkaEndpoint(broker, destTopic, new
CSVMessageFormat(schemaOut)))
  .registerFunction("APEXCONCAT", FileEndpointTest.class,
"apex_concat_str")
  .executeSQL("INSERT INTO SALES " + "SELECT STREAM ROWTIME, " +
"FLOOR(ROWTIME TO DAY), " +
  "APEXCONCAT('OILPAINT', SUBSTRING(PRODUCT, 6, 7)) " + "FROM
ORDERS WHERE ID > 3 " + "AND " +
  "PRODUCT LIKE 'paint%'");

Following is a video recording of the demo of apex-capcite integration:
https://drive.google.com/open?id=0B_Tb-ZDtsUHeUVM5NWRYSFg0Z3c

Currently I'm working on addition of inner join functionality.
Once the inner join functionality is implemented, I think the code is good
to create a Review Only PR for first cut of calcite integration.

Please share your opinion on above.

Thanks,
Chinmay.


On Fri, Aug 12, 2016 at 9:55 PM, Chinmay Kolhatkar 
wrote:

> Hi All,
>
> I wanted to give update on Apex-Calcite Integration work being done for
> visibility and feedback from the community.
>
> In the first phase, target is to use Calcite core library for SQL parsing
> and transformation of relation algebra to apex specific component
> (operators).
> Once this is achieved one would be able to define input and outputs using
> Calcite model file and define the processing from input to output using SQL
> statement.
>
> The status for above work as of now is as follows:
> 1. I'm able to traverse relational algebra for simple select statement.
> 2. DAG is getting generated for simple statement SELECT STREAM * FROM
> TABLE.
> 3. DAG is getting validated.
> 4. Operators are being set with properties, streams and schema is also
> being set using TUPLE_CLASS attr. For schema the class is generated on the
> fly and put in classpath using LIBRARY_JAR attr.
> 5. Able to run generated DAG in local mode.
> 6. The code is currently being developed at (WIP):
> Currently for each of development and code being farely large, I've added
> a new module malhar-sql in malhar in my fork. But I'm open to other
> suggestions here.
> https://github.com/chinmaykolhatkar/apex-malhar/tree/calcite/sql
>
> Next step:
> 1. Run the generate DAG in distributed mode.
> 2. Expand the source and destination definition (calcite model file) to
> include Kafka as source schema and destination.
> 3. Expand the scope to include filter operator (WHERE clause, HAVING too
> if possible) and inner join when it gets merged.
> 4. Write extensive unit tests for above.
>
> I'll send an update on this thread at every logical step of achieving
> something.
>
> I request the community to provide the feedback on above approach/targets
> and if possible take a look at the code in above link.
>
> Thanks,
> Chinmay.
>
>


[GitHub] apex-malhar pull request #418: APEXMALHAR-2247 #resolve Added iteration feat...

2016-09-19 Thread davidyan74
GitHub user davidyan74 opened a pull request:

https://github.com/apache/apex-malhar/pull/418

APEXMALHAR-2247 #resolve Added iteration feature in SpillableArrayListImpl 
and generalize SerdeListSlice to SerdeCollectionSlice

@siyuanh please review and merge

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davidyan74/apex-malhar APEXMALHAR-2247

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/418.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #418


commit 7c50c9996407cf3cd04e076580095145b8526577
Author: David Yan 
Date:   2016-09-20T00:27:58Z

APEXMALHAR-2247 #resolve Added iteration feature in SpillableArrayListImpl 
and generalize SerdeListSlice to SerdeCollectionSlice




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (APEXMALHAR-2247) Add iteration feature in SpillableArrayListImpl and generalize SerdeListSlice to SerdeCollectionSlice

2016-09-19 Thread David Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Yan updated APEXMALHAR-2247:
--
Summary: Add iteration feature in SpillableArrayListImpl and generalize 
SerdeListSlice to SerdeCollectionSlice  (was: Add iteration feature in 
SpillableArrayListImpl)

> Add iteration feature in SpillableArrayListImpl and generalize SerdeListSlice 
> to SerdeCollectionSlice
> -
>
> Key: APEXMALHAR-2247
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2247
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>Reporter: David Yan
>Assignee: David Yan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (APEXMALHAR-2249) Create SpillableSet and SpillableSetMultimap interfaces and implementation

2016-09-19 Thread David Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Yan closed APEXMALHAR-2249.
-
Resolution: Duplicate

> Create SpillableSet and SpillableSetMultimap interfaces and implementation
> --
>
> Key: APEXMALHAR-2249
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2249
> Project: Apache Apex Malhar
>  Issue Type: Sub-task
>Reporter: David Yan
>Assignee: David Yan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2249) Create SpillableSet and SpillableSetMultimap interfaces and implementation

2016-09-19 Thread David Yan (JIRA)
David Yan created APEXMALHAR-2249:
-

 Summary: Create SpillableSet and SpillableSetMultimap interfaces 
and implementation
 Key: APEXMALHAR-2249
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2249
 Project: Apache Apex Malhar
  Issue Type: Sub-task
Reporter: David Yan
Assignee: David Yan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2248) Create SpillableSet and SpillableSetMultimap interfaces and implementation

2016-09-19 Thread David Yan (JIRA)
David Yan created APEXMALHAR-2248:
-

 Summary: Create SpillableSet and SpillableSetMultimap interfaces 
and implementation
 Key: APEXMALHAR-2248
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2248
 Project: Apache Apex Malhar
  Issue Type: Sub-task
Reporter: David Yan
Assignee: David Yan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2247) Add iteration feature in SpillableArrayListImpl and generalize SerdeListSlice to SerdeCollectionSlice

2016-09-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505143#comment-15505143
 ] 

ASF GitHub Bot commented on APEXMALHAR-2247:


GitHub user davidyan74 opened a pull request:

https://github.com/apache/apex-malhar/pull/418

APEXMALHAR-2247 #resolve Added iteration feature in SpillableArrayListImpl 
and generalize SerdeListSlice to SerdeCollectionSlice

@siyuanh please review and merge

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davidyan74/apex-malhar APEXMALHAR-2247

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/418.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #418


commit 7c50c9996407cf3cd04e076580095145b8526577
Author: David Yan 
Date:   2016-09-20T00:27:58Z

APEXMALHAR-2247 #resolve Added iteration feature in SpillableArrayListImpl 
and generalize SerdeListSlice to SerdeCollectionSlice




> Add iteration feature in SpillableArrayListImpl and generalize SerdeListSlice 
> to SerdeCollectionSlice
> -
>
> Key: APEXMALHAR-2247
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2247
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>Reporter: David Yan
>Assignee: David Yan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2246) The underlying map of SpillableByteArrayListMultimapImpl uses a primitive byte[] as a key, which won't work because it does not compare the underlying bytes

2016-09-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505125#comment-15505125
 ] 

ASF GitHub Bot commented on APEXMALHAR-2246:


GitHub user davidyan74 opened a pull request:

https://github.com/apache/apex-malhar/pull/417

APEXMALHAR-2246 #resolve use Slice instead of byte[] in the underlying map 
of SpillableByteArrayListMultimapImpl

@siyuanh Please review and merge

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davidyan74/apex-malhar APEXMALHAR-2246

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/417.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #417


commit f59dfe1193d282fa96743be7f800b1d6e1c2882d
Author: David Yan 
Date:   2016-09-20T00:17:57Z

APEXMALHAR-2246 #resolve use Slice instead of byte[] in the underlying map 
of SpillableByteArrayListMultimapImpl




> The underlying map of SpillableByteArrayListMultimapImpl uses a primitive 
> byte[] as a key, which won't work because it does not compare the underlying 
> bytes
> 
>
> Key: APEXMALHAR-2246
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2246
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: David Yan
>Assignee: David Yan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #417: APEXMALHAR-2246 #resolve use Slice instead of...

2016-09-19 Thread davidyan74
GitHub user davidyan74 opened a pull request:

https://github.com/apache/apex-malhar/pull/417

APEXMALHAR-2246 #resolve use Slice instead of byte[] in the underlying map 
of SpillableByteArrayListMultimapImpl

@siyuanh Please review and merge

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davidyan74/apex-malhar APEXMALHAR-2246

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/417.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #417


commit f59dfe1193d282fa96743be7f800b1d6e1c2882d
Author: David Yan 
Date:   2016-09-20T00:17:57Z

APEXMALHAR-2246 #resolve use Slice instead of byte[] in the underlying map 
of SpillableByteArrayListMultimapImpl




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2245) WindowBoundedMapCache.remove is a no-op if the key is not in the cache, resulting in the entry not being removed in the underlying storage

2016-09-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505111#comment-15505111
 ] 

ASF GitHub Bot commented on APEXMALHAR-2245:


GitHub user davidyan74 opened a pull request:

https://github.com/apache/apex-malhar/pull/416

APEXMALHAR-2245 #resolve Add the key in removedKeys even if the key does 
not appear in the cache

@siyuanh Please review and merge

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davidyan74/apex-malhar APEXMALHAR-2245

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/416.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #416


commit 447b26df0f73a29e19d894663c92fd99e6384420
Author: David Yan 
Date:   2016-09-20T00:13:22Z

APEXMALHAR-2245 #resolve Add the key in removedKeys even if the key does 
not appear in the cache




> WindowBoundedMapCache.remove is a no-op if the key is not in the cache, 
> resulting in the entry not being removed in the underlying storage
> --
>
> Key: APEXMALHAR-2245
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2245
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: David Yan
>Assignee: David Yan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #416: APEXMALHAR-2245 #resolve Add the key in remov...

2016-09-19 Thread davidyan74
GitHub user davidyan74 opened a pull request:

https://github.com/apache/apex-malhar/pull/416

APEXMALHAR-2245 #resolve Add the key in removedKeys even if the key does 
not appear in the cache

@siyuanh Please review and merge

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davidyan74/apex-malhar APEXMALHAR-2245

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/416.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #416


commit 447b26df0f73a29e19d894663c92fd99e6384420
Author: David Yan 
Date:   2016-09-20T00:13:22Z

APEXMALHAR-2245 #resolve Add the key in removedKeys even if the key does 
not appear in the cache




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (APEXMALHAR-2246) The underlying map of SpillableByteArrayListMultimapImpl uses a primitive byte[] as a key, which won't work because it does not compare the underlying bytes

2016-09-19 Thread David Yan (JIRA)
David Yan created APEXMALHAR-2246:
-

 Summary: The underlying map of SpillableByteArrayListMultimapImpl 
uses a primitive byte[] as a key, which won't work because it does not compare 
the underlying bytes
 Key: APEXMALHAR-2246
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2246
 Project: Apache Apex Malhar
  Issue Type: Bug
Reporter: David Yan
Assignee: David Yan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (APEXMALHAR-2191) Implement a way to get all keys with or without prefix from SpillableByteMapImpl

2016-09-19 Thread David Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Yan resolved APEXMALHAR-2191.
---
Resolution: Won't Fix

Not valid any more

> Implement a way to get all keys with or without prefix from 
> SpillableByteMapImpl
> 
>
> Key: APEXMALHAR-2191
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2191
> Project: Apache Apex Malhar
>  Issue Type: Sub-task
>Reporter: David Yan
>Assignee: David Yan
>Priority: Critical
>
> WindowedKeyedStorage is basically a Map>, and we need the 
> capability of getting all K's given a window, and getting V given a window 
> and K.
> Currently, the spillable implementation of WindowedKeyedStorage uses two 
> spillable data structures -- SpillableByteMapImpl, V> and 
> SpillableArrayListMultimapImpl.  This will not work because we 
> need to be able to remove a key from a window, which SpillableArrayList does 
> not support, and having two separate spillable data structures if it can be 
> achieved by just one should be avoided in general. 
> We will implement the solution that supports prefix scanning (given a window, 
> return all keys), which requires the keys to be stored in order in managed 
> state.
> Traversing all keys in the spillable map is needed by WindowStateMap. The 
> WindowedOperator needs a way to traverse all windows in its state when firing 
> a trigger. However, this is less urgent since the Window meta info is small 
> and should fit in memory even with millions of windows. This is more for the 
> checkpointing efficiency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2244) Optimize WindowedStorage and Spillable data structures for time series

2016-09-19 Thread David Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Yan updated APEXMALHAR-2244:
--
Description: 
The spillable data structures currently does not make any assumption about the 
key that is used in Managed State, and as a result, it uses ManagedStateImpl to 
interface with Managed State. But for WindowedStorage used by WindowedOperator, 
the key to the storage is a window, which is event time based. Using the 
default ManagedStateImpl would be wrong for event time based keys, since 
ManagedStateImpl appears to purge data based on the apex window id (process 
time based).

In a high level, the below summarizes roughly what needs to be done:

1. a way to tell the spillable data structures to use the 
ManagedTimeUnifiedStateImpl
2. a way to tell the spillable data structures how to extract the timestamp 
from the key. Note that in the case of WindowedOperator, the timestamp should 
be the end timestamp of the window (beginTimeMillis + durationMillis), not the 
begin timestamp.
3. a way to tell the spillable data structures how to assign the time bucket 
given that timestamp
4. only purge a time bucket when all keys that belong to that time bucket are 
removed and the apex window id of the first window in which the keys are all 
removed has been committed



  was:
The spillable data structures currently does not make any assumption about the 
key that is used in Managed State, and as a result, it uses ManagedStateImpl to 
interface with Managed State. But for WindowedStorage used by WindowedOperator, 
the key to the storage is a window, which is event time based. Using the 
default ManagedStateImpl would be wrong for event time based keys, since 
ManagedStateImpl appears to purge data based on the apex window id (process 
time based).

In a high level, the below summarizes roughly what needs to be done:

1. a way to tell the spillable data structures to use the 
ManagedTimeUnifiedStateImpl
2. a way to tell the spillable data structures how to extract the timestamp 
from the key. Note that in the case of WindowedOperator, the timestamp should 
be the end timestamp of the window (beginTimeMillis + durationMillis), not the 
begin timestamp.
3. a way to tell the spillable data structures how to assign the time bucket 
given that timestamp
4. only purge a time bucket when all keys that belong to that time bucket are 
removed




> Optimize WindowedStorage and Spillable data structures for time series
> --
>
> Key: APEXMALHAR-2244
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2244
> Project: Apache Apex Malhar
>  Issue Type: Sub-task
>Reporter: David Yan
>Assignee: Siyuan Hua
>
> The spillable data structures currently does not make any assumption about 
> the key that is used in Managed State, and as a result, it uses 
> ManagedStateImpl to interface with Managed State. But for WindowedStorage 
> used by WindowedOperator, the key to the storage is a window, which is event 
> time based. Using the default ManagedStateImpl would be wrong for event time 
> based keys, since ManagedStateImpl appears to purge data based on the apex 
> window id (process time based).
> In a high level, the below summarizes roughly what needs to be done:
> 1. a way to tell the spillable data structures to use the 
> ManagedTimeUnifiedStateImpl
> 2. a way to tell the spillable data structures how to extract the timestamp 
> from the key. Note that in the case of WindowedOperator, the timestamp should 
> be the end timestamp of the window (beginTimeMillis + durationMillis), not 
> the begin timestamp.
> 3. a way to tell the spillable data structures how to assign the time bucket 
> given that timestamp
> 4. only purge a time bucket when all keys that belong to that time bucket are 
> removed and the apex window id of the first window in which the keys are all 
> removed has been committed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2244) Optimize WindowedStorage and Spillable data structures for time series

2016-09-19 Thread David Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Yan updated APEXMALHAR-2244:
--
Description: 
The spillable data structures currently does not make any assumption about the 
key that is used in Managed State, and as a result, it uses ManagedStateImpl to 
interface with Managed State. But for WindowedStorage used by WindowedOperator, 
the key to the storage is a window, which is event time based. Using the 
default ManagedStateImpl would be wrong for event time based keys, since 
ManagedStateImpl appears to purge data based on the apex window id (process 
time based).

In a high level, the below summarizes roughly what needs to be done:

1. a way to tell the spillable data structures to use the 
ManagedTimeUnifiedStateImpl
2. a way to tell the spillable data structures how to extract the timestamp 
from the key. Note that in the case of WindowedOperator, the timestamp should 
be the end timestamp of the window (beginTimeMillis + durationMillis), not the 
begin timestamp.
3. a way to tell the spillable data structures how to assign the time bucket 
given that timestamp
4. only purge a time bucket when all keys that belong to that time bucket are 
removed



  was:
The spillable data structures currently does not make any assumption about the 
key that is used in Managed State, and as a result, it uses ManagedStateImpl to 
interface with Managed State. But for WindowedStorage used by WindowedOperator, 
the key to the storage is a window, which is time based. Using the default 
ManagedStateImpl would be wrong for event time based keys, since 
ManagedStateImpl appears to purge data based on the apex window id (process 
time based).

In a high level, the below summarizes roughly what needs to be done:

1. a way to tell the spillable data structures to use the 
ManagedTimeUnifiedStateImpl
2. a way to tell the spillable data structures how to extract the timestamp 
from the key. Note that in the case of WindowedOperator, the timestamp should 
be the end timestamp of the window (beginTimeMillis + durationMillis), not the 
begin timestamp.
3. a way to tell the spillable data structures how to assign the time bucket 
given that timestamp
4. only purge a time bucket when all keys that belong to that time bucket are 
removed




> Optimize WindowedStorage and Spillable data structures for time series
> --
>
> Key: APEXMALHAR-2244
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2244
> Project: Apache Apex Malhar
>  Issue Type: Sub-task
>Reporter: David Yan
>Assignee: Siyuan Hua
>
> The spillable data structures currently does not make any assumption about 
> the key that is used in Managed State, and as a result, it uses 
> ManagedStateImpl to interface with Managed State. But for WindowedStorage 
> used by WindowedOperator, the key to the storage is a window, which is event 
> time based. Using the default ManagedStateImpl would be wrong for event time 
> based keys, since ManagedStateImpl appears to purge data based on the apex 
> window id (process time based).
> In a high level, the below summarizes roughly what needs to be done:
> 1. a way to tell the spillable data structures to use the 
> ManagedTimeUnifiedStateImpl
> 2. a way to tell the spillable data structures how to extract the timestamp 
> from the key. Note that in the case of WindowedOperator, the timestamp should 
> be the end timestamp of the window (beginTimeMillis + durationMillis), not 
> the begin timestamp.
> 3. a way to tell the spillable data structures how to assign the time bucket 
> given that timestamp
> 4. only purge a time bucket when all keys that belong to that time bucket are 
> removed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2244) Optimize WindowedStorage and Spillable data structures for time series

2016-09-19 Thread David Yan (JIRA)
David Yan created APEXMALHAR-2244:
-

 Summary: Optimize WindowedStorage and Spillable data structures 
for time series
 Key: APEXMALHAR-2244
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2244
 Project: Apache Apex Malhar
  Issue Type: Sub-task
Reporter: David Yan
Assignee: Siyuan Hua


The spillable data structures currently does not make any assumption about the 
key that is used in Managed State, and as a result, it uses ManagedStateImpl to 
interface with Managed State. But for WindowedStorage used by WindowedOperator, 
the key to the storage is a window, which is time based. Using the default 
ManagedStateImpl would be wrong for event time based keys, since 
ManagedStateImpl appears to purge data based on the apex window id (process 
time based).

In a high level, the below summarizes roughly what needs to be done:

1. a way to tell the spillable data structures to use the 
ManagedTimeUnifiedStateImpl
2. a way to tell the spillable data structures how to extract the timestamp 
from the key. Note that in the case of WindowedOperator, the timestamp should 
be the end timestamp of the window (beginTimeMillis + durationMillis), not the 
begin timestamp.
3. a way to tell the spillable data structures how to assign the time bucket 
given that timestamp
4. only purge a time bucket when all keys that belong to that time bucket are 
removed





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2243) change name StoreOperator.setExeModeStr(String) to setExecModeStr

2016-09-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504400#comment-15504400
 ] 

ASF GitHub Bot commented on APEXMALHAR-2243:


Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/415


> change name StoreOperator.setExeModeStr(String) to setExecModeStr
> -
>
> Key: APEXMALHAR-2243
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2243
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: bright chen
>Assignee: bright chen
>Priority: Minor
>
> change name StoreOperator.setExeModeStr(String) to setExecModeStr for 
> consistence.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (APEXMALHAR-2243) change name StoreOperator.setExeModeStr(String) to setExecModeStr

2016-09-19 Thread Thomas Weise (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise closed APEXMALHAR-2243.

Resolution: Fixed

> change name StoreOperator.setExeModeStr(String) to setExecModeStr
> -
>
> Key: APEXMALHAR-2243
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2243
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: bright chen
>Assignee: bright chen
>Priority: Minor
>
> change name StoreOperator.setExeModeStr(String) to setExecModeStr for 
> consistence.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #415: APEXMALHAR-2243 #resolve #comment change name...

2016-09-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/415


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2243) change name StoreOperator.setExeModeStr(String) to setExecModeStr

2016-09-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503896#comment-15503896
 ] 

ASF GitHub Bot commented on APEXMALHAR-2243:


GitHub user brightchen opened a pull request:

https://github.com/apache/apex-malhar/pull/415

APEXMALHAR-2243 #resolve #comment change name StoreOperator.setExeMod…

…eStr(String) to setExecModeStr

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brightchen/apex-malhar APEXMALHAR-2243

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/415.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #415


commit 0c4b3fce2a8f73dae18d7570ca54d76cf9098e5b
Author: brightchen 
Date:   2016-09-19T16:17:31Z

APEXMALHAR-2243 #resolve #comment change name 
StoreOperator.setExeModeStr(String) to setExecModeStr




> change name StoreOperator.setExeModeStr(String) to setExecModeStr
> -
>
> Key: APEXMALHAR-2243
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2243
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: bright chen
>Assignee: bright chen
>Priority: Minor
>
> change name StoreOperator.setExeModeStr(String) to setExecModeStr for 
> consistence.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #415: APEXMALHAR-2243 #resolve #comment change name...

2016-09-19 Thread brightchen
GitHub user brightchen opened a pull request:

https://github.com/apache/apex-malhar/pull/415

APEXMALHAR-2243 #resolve #comment change name StoreOperator.setExeMod…

…eStr(String) to setExecModeStr

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brightchen/apex-malhar APEXMALHAR-2243

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/415.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #415


commit 0c4b3fce2a8f73dae18d7570ca54d76cf9098e5b
Author: brightchen 
Date:   2016-09-19T16:17:31Z

APEXMALHAR-2243 #resolve #comment change name 
StoreOperator.setExeModeStr(String) to setExecModeStr




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (APEXMALHAR-2243) change name StoreOperator.setExeModeStr(String) to setExecModeStr

2016-09-19 Thread bright chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bright chen reassigned APEXMALHAR-2243:
---

Assignee: bright chen

> change name StoreOperator.setExeModeStr(String) to setExecModeStr
> -
>
> Key: APEXMALHAR-2243
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2243
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: bright chen
>Assignee: bright chen
>Priority: Minor
>
> change name StoreOperator.setExeModeStr(String) to setExecModeStr for 
> consistence.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2243) change name StoreOperator.setExeModeStr(String) to setExecModeStr

2016-09-19 Thread bright chen (JIRA)
bright chen created APEXMALHAR-2243:
---

 Summary: change name StoreOperator.setExeModeStr(String) to 
setExecModeStr
 Key: APEXMALHAR-2243
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2243
 Project: Apache Apex Malhar
  Issue Type: Bug
Reporter: bright chen
Priority: Minor


change name StoreOperator.setExeModeStr(String) to setExecModeStr for 
consistence.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXCORE-526) Publish javadoc for releases on ASF infrastructure

2016-09-19 Thread Thomas Weise (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503804#comment-15503804
 ] 

Thomas Weise commented on APEXCORE-526:
---

We can setup the project on buildbot to build javadoc:aggregate and then move 
it to public-html space.

https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/template.conf


> Publish javadoc for releases on ASF infrastructure 
> ---
>
> Key: APEXCORE-526
> URL: https://issues.apache.org/jira/browse/APEXCORE-526
> Project: Apache Apex Core
>  Issue Type: Documentation
>Reporter: Thomas Weise
>
> Every release should have the javadocs published and we should have it linked 
> from the download page, as is the case with user docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2242) Add documentation for 0.9 version of Kafka Input Operator.

2016-09-19 Thread Chaitanya (JIRA)
Chaitanya created APEXMALHAR-2242:
-

 Summary: Add documentation for 0.9 version of Kafka Input Operator.
 Key: APEXMALHAR-2242
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2242
 Project: Apache Apex Malhar
  Issue Type: Documentation
Reporter: Chaitanya
Assignee: Chaitanya






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)