Re: Proposing Changes To Heron
Thanks Josh for taking the initiative to get this start! SQL on Heron will be a great feature! The plan sounds great to me. Lets first get an initial version of the Heron SQL out and then we can worry about custom / user defined sources and sinks. We can even start talking about UDFs (User defined functions) at that point! Best, Jerry On Sun, Feb 25, 2018 at 9:05 PM, Josh Fischerwrote: > Please see this google drive link for adding comments. I will copy and > paste the drive doc below as well. > > https://docs.google.com/document/d/1PxLCyR_H-mOgPjyFj3DhWXryKW21CH2zFWwzTnqjfEA/edit?usp=sharing > > > Proposal Below > > > > > > > > *I am writing this document to propose changes and to start conversations > on adding functionality similar to Storm SQL to Heron. We would call it > Heron SQL. After reviewing how the code is structured in Storm I have some > suggestions and questions relating to the implementation into the Heron > code base. - High Level Overview Of Code Workflow (Keeping Similar to > Storm)- We would parse the sql with calcite to create the logical and > physical plans- We would then convert the logical and physical plans to a > Heron Topology- We would then submit the Heron Topology into the Heron > System - Some thoughts on code structure and overall functionality- I think > we should place the Heron SQL code base as a top level directory in the > repo. - I will have to add the command “sql” to the Heron command line code > in python.- As a first pass implementation users can interact with Heron > SQL via the following command - heron sql - We > will also support the explain command for displaying the query plan, this > will not deploy the topology- heron sql --explain- After the > first pass implementation is working smoothly, we can then add an > interactive command line interface to accept sql on the fly by omitting the > sql file argument- Heron sql - We would support all of the > existing functionality in Storm SQL today with the exception of being > dependent on trident. We would use Storm SQL as a way to deploy topologies > into Heron. Similar to how you deploy topologies with the Streamlet, > Topology, and ECO APIs- Questions- Do we see any issue with this plan to > implement?- I believe we would have to supply an external jar at times to > connect to external data sources, such as reuse of kafka libraries or > database drivers. I see that Storm has few external connectors for mongo, > kafka, redis and hdfs. Do we want to limit users to what we decide to > build as connectors or do we want to give them the ability to load external > jars at submit time? I don’t think heron offers the ability to pass extra > jars to via the “--jars” or “--artifacts” flags like Storm does today. > Would this be the correct way to pull in external jars? Does anyone have a > different idea? I’m thinking that this might be a v2 feature after we get > Heron sql working well. Ideas, thoughts or concerns?- Is there anything I > missed?*
Re: Proposing Changes To Heron
+1 to SQL on Heron! 2018-02-25 21:05 GMT-08:00 Josh Fischer: > Please see this google drive link for adding comments. I will copy and > paste the drive doc below as well. > > https://docs.google.com/document/d/1PxLCyR_H- > mOgPjyFj3DhWXryKW21CH2zFWwzTnqjfEA/edit?usp=sharing > > > Proposal Below > > > > > > > > *I am writing this document to propose changes and to start conversations > on adding functionality similar to Storm SQL to Heron. We would call it > Heron SQL. After reviewing how the code is structured in Storm I have some > suggestions and questions relating to the implementation into the Heron > code base. - High Level Overview Of Code Workflow (Keeping Similar to > Storm)- We would parse the sql with calcite to create the logical and > physical plans- We would then convert the logical and physical plans to a > Heron Topology- We would then submit the Heron Topology into the Heron > System - Some thoughts on code structure and overall functionality- I think > we should place the Heron SQL code base as a top level directory in the > repo. - I will have to add the command “sql” to the Heron command line code > in python.- As a first pass implementation users can interact with Heron > SQL via the following command - heron sql - We > will also support the explain command for displaying the query plan, this > will not deploy the topology- heron sql --explain- After the > first pass implementation is working smoothly, we can then add an > interactive command line interface to accept sql on the fly by omitting the > sql file argument- Heron sql - We would support all of the > existing functionality in Storm SQL today with the exception of being > dependent on trident. We would use Storm SQL as a way to deploy topologies > into Heron. Similar to how you deploy topologies with the Streamlet, > Topology, and ECO APIs- Questions- Do we see any issue with this plan to > implement?- I believe we would have to supply an external jar at times to > connect to external data sources, such as reuse of kafka libraries or > database drivers. I see that Storm has few external connectors for mongo, > kafka, redis and hdfs. Do we want to limit users to what we decide to > build as connectors or do we want to give them the ability to load external > jars at submit time? I don’t think heron offers the ability to pass extra > jars to via the “--jars” or “--artifacts” flags like Storm does today. > Would this be the correct way to pull in external jars? Does anyone have a > different idea? I’m thinking that this might be a v2 feature after we get > Heron sql working well. Ideas, thoughts or concerns?- Is there anything I > missed?* > -- With my best Regards -- Fu Maosong Twitter Inc. Mobile: +001-415-244-7520
Proposing Changes To Heron
Please see this google drive link for adding comments. I will copy and paste the drive doc below as well. https://docs.google.com/document/d/1PxLCyR_H-mOgPjyFj3DhWXryKW21CH2zFWwzTnqjfEA/edit?usp=sharing Proposal Below *I am writing this document to propose changes and to start conversations on adding functionality similar to Storm SQL to Heron. We would call it Heron SQL. After reviewing how the code is structured in Storm I have some suggestions and questions relating to the implementation into the Heron code base. - High Level Overview Of Code Workflow (Keeping Similar to Storm)- We would parse the sql with calcite to create the logical and physical plans- We would then convert the logical and physical plans to a Heron Topology- We would then submit the Heron Topology into the Heron System - Some thoughts on code structure and overall functionality- I think we should place the Heron SQL code base as a top level directory in the repo. - I will have to add the command “sql” to the Heron command line code in python.- As a first pass implementation users can interact with Heron SQL via the following command - heron sql - We will also support the explain command for displaying the query plan, this will not deploy the topology- heron sql --explain- After the first pass implementation is working smoothly, we can then add an interactive command line interface to accept sql on the fly by omitting the sql file argument- Heron sql - We would support all of the existing functionality in Storm SQL today with the exception of being dependent on trident. We would use Storm SQL as a way to deploy topologies into Heron. Similar to how you deploy topologies with the Streamlet, Topology, and ECO APIs- Questions- Do we see any issue with this plan to implement?- I believe we would have to supply an external jar at times to connect to external data sources, such as reuse of kafka libraries or database drivers. I see that Storm has few external connectors for mongo, kafka, redis and hdfs. Do we want to limit users to what we decide to build as connectors or do we want to give them the ability to load external jars at submit time? I don’t think heron offers the ability to pass extra jars to via the “--jars” or “--artifacts” flags like Storm does today. Would this be the correct way to pull in external jars? Does anyone have a different idea? I’m thinking that this might be a v2 feature after we get Heron sql working well. Ideas, thoughts or concerns?- Is there anything I missed?*