Re: How does one leverage Calcite to federate queries over multiple underlying engines?

2022-08-02 Thread Gavin Ray
I've been meaning to write and publish something on this since I had the
same question + it seems to come up often

There are essentially two important things you need to know about Calcite
to work with federated data:

1. Calcite schemas work like filesystem hierarchies. You have a
"rootSchema" which is like the system root.
Datasources you want to query need to be added as child schemas of this
rootSchema.\

In the case of something like a standard DB (Postgres/MySQL), what you'll
get may not be one schema, but a "tree" of schemas.
What I mean by this is that there are wrapper/utility methods to give you
back a "Calcite" representation of the database structure as schemas

For a Postgres DB, that has a structure like:

my_postgres_db1
 /  \
 public   schema_2

You'll get a Schema that has two child schemas.
To do this, you use JdbcCatalogSchema: JdbcCatalogSchema (Apache Calcite
API)


For a DB like MySQL, which doesn't have the notion of "Databases",
you'd instead use JdbcSchema: JdbcSchema (Apache Calcite API)


2. Don't close your Calcite Connection object. It's stateful, and whatever
you've done to add new datasources to it will vanish if the connection is
closed.
This is especially important to keep in mind because it's
AutoCloseable. Don't use try-with-resources blocks with it, or call .close()

You can see an example of some of this stuff here:
https://github.com/GavinRay97/GraphQLCalcite/blob/92b18a850d5546403b9760d96dc5b3a59533f3ca/src/main/kotlin/calcite/CalciteSchemaManager.kt#L26-L60

Please ignore "addDatabase" manually iterating schemas and not using
JdbcCatalogSchema
Also the methods which run manual RelNode expressions instead of SQL aren't
necessary

I asked about this and Julian said that running SQL through a
java.sql.Connection object amounts to the same thing

Hope this overview helps, let me know if anything is not clear.
Also if anyone else has advice on this topic, I'd be keen to hear it too
(or if I've made mistakes here)


On Tue, Aug 2, 2022 at 2:58 PM Pranav Deshpande <
deshpande.v.pra...@gmail.com> wrote:

> Hi Team,
> Say that I have a query which references 5 tables, and 3 of these are in
> Cassandra and 1 is a CSV file. Out of these, the cassandra tables are split
> across 2 different cassandra DB instances.
>
> How does calcite federation work in this case?
>
> Do I have to instantiate multiple connections? Is there an example for
> this? How does calcite know which tablescan/filter etc. to call?
>
> Regards,
> Pranav
>


How does one leverage Calcite to federate queries over multiple underlying engines?

2022-08-02 Thread Pranav Deshpande
Hi Team,
Say that I have a query which references 5 tables, and 3 of these are in
Cassandra and 1 is a CSV file. Out of these, the cassandra tables are split
across 2 different cassandra DB instances.

How does calcite federation work in this case?

Do I have to instantiate multiple connections? Is there an example for
this? How does calcite know which tablescan/filter etc. to call?

Regards,
Pranav


Re: calcite+graal native-image

2022-08-02 Thread Jacques Nadeau
I definitely suggest that you look at how Substrait compiles. There are a
number of customizations on top of vanilla Calcite that are done. Key
things to observe include:
* Configuration of runtime/compile time classes at build. [1]
* Registration of addition reflection, etc items [2]
* Use of proxying metadata handler (as opposed to code generating one) [3]

The specific problem you are hitting looks like the third bullet above
based on the stacktrace.

Note that anywhere Calcite uses Janino/code generation has to have an
alternative path to function. This means that if you want to use other
parts of Calcite, you'll have to supplement for this requirement (for
example, I believe constant reduction uses janino and thus would need an
alternative path).

[1]
https://github.com/substrait-io/substrait-java/blob/main/isthmus/build.gradle.kts#L42
[2]
https://github.com/substrait-io/substrait-java/blob/main/isthmus/src/main/java/io/substrait/isthmus/RegisterAtRuntime.java
[3]
https://github.com/substrait-io/substrait-java/blob/main/isthmus/src/main/java/io/substrait/isthmus/SqlConverterBase.java#L63

On Tue, Jul 19, 2022 at 11:05 AM Egor Ryashin 
wrote:

> Hi all,
>
> I’ve been trying to build a native executable with GraalVM and got stuck
> with this exception when running the executable:
>
> java.lang.RuntimeException: java.lang.ClassNotFoundException:
> org.apache.calcite.rel.metadata.janino.GeneratedMetadata_CollationHandler
> at
> org.apache.calcite.rel.metadata.JaninoRelMetadataProvider.compile(JaninoRelMetadataProvider.java:180)
> at
> org.apache.calcite.rel.metadata.JaninoRelMetadataProvider.generateCompileAndInstantiate(JaninoRelMetadataProvider.java:138)
> at
> org.apache.calcite.rel.metadata.JaninoRelMetadataProvider.lambda$static$0(JaninoRelMetadataProvider.java:72)
> at
> com.google.common.cache.CacheLoader$FunctionToCacheLoader.load(CacheLoader.java:165)
> at
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3529)
> at
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2278)
> at
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2155)
> at
> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2045)
> at com.google.common.cache.LocalCache.get(LocalCache.java:3951)
> at
> com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)
> at
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958)
> at
> org.apache.calcite.rel.metadata.JaninoRelMetadataProvider.revise(JaninoRelMetadataProvider.java:189)
>
> I found a ticket that probably relates to the issue
> https://issues.apache.org/jira/browse/CALCITE-4539 <
> https://issues.apache.org/jira/browse/CALCITE-4539>
> I also found that substrait.io  (substrait-java)
> already compiles Calcite dependency to a native executable. Right now, I’m
> confused, is it possible to compile Calcite to a working native executable
> or not? It would be great if somebody could give some insight on how to do
> that. I tried to apply the substrait-java configs and used the GraalVM
> trace agent generated configs too but I still get the above exception
> nonetheless.
>
> Thanks
>
>


Re: [VOTE] Release Apache Calcite 1.31.0 (release candidate 3)

2022-08-02 Thread Andrei Sereda
Thanks everyone for testing and voting for this release.

The vote is currently closed and I will send a separate email with the
results.

On Mon, Aug 1, 2022 at 4:32 AM Chunwei Lei  wrote:

> Thanks Andrei for working on this.
>
> Mac OS X 10.16 x86_64,jdk 1.8.0_271,Gradle 7.3
>
> - Checksum and signature: ok
> - Gradle test: ok
> - Went over release note: ok
>
> My vote is: +1 (binding)
>
>
> Best,
> Chunwei
>
>
> On Sat, Jul 30, 2022 at 10:10 PM Stamatis Zampetakis 
> wrote:
>
> > Ubuntu 20.04.4 LTS, jdk1.8.0_261, Gradle wrapper, Gradle 7.4.2
> >
> >  * Checked signatures and checksums OK
> >  * Went over release note OK
> >  * Built from git tag and run tests (./gradlew clean build) OK
> >  * Built from source artifacts and run unit tests + slow tests OK
> >  * Checked diff between git repo and release sources OK
> >
> > +1 (binding)
> >
> > Best,
> > Stamatis
> >
> > On Sat, Jul 30, 2022 at 12:32 PM Francis Chuang <
> francischu...@apache.org>
> > wrote:
> >
> > > My vote is: +1 (binding)
> > >
> > > - Verified GPG signature - OK
> > > - Verified SHA512 - OK
> > > - Diffed source release and git repository - OK
> > > - Checked release notes on tag
> > > (
> > >
> >
> https://github.com/apache/calcite/blob/calcite-1.31.0-rc3/site/_docs/history.md
> > )
> > >
> > > - OK
> > > - Ran tests (gradle check) - OK
> > > - Spot checked Nexus artifacts - OK
> > >
> > > Environment:
> > > Eclipse-temurin:17-jammy docker container in WSL2 (Ubuntu 20.04) on
> > > Windows 10 21h2
> > >
> > >  > docker version
> > > Client: Docker Engine - Community
> > >   Cloud integration: v1.0.28
> > >   Version:   20.10.17
> > >   API version:   1.41
> > >   Go version:go1.17.11
> > >   Git commit:100c701
> > >   Built: Mon Jun  6 23:03:17 2022
> > >   OS/Arch:   linux/amd64
> > >   Context:   default
> > >   Experimental:  true
> > >
> > > Server: Docker Desktop
> > >   Engine:
> > >Version:  20.10.17
> > >API version:  1.41 (minimum version 1.12)
> > >Go version:   go1.17.11
> > >Git commit:   a89b842
> > >Built:Mon Jun  6 23:01:23 2022
> > >OS/Arch:  linux/amd64
> > >Experimental: false
> > >   containerd:
> > >Version:  1.6.6
> > >GitCommit:10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
> > >   runc:
> > >Version:  1.1.2
> > >GitCommit:v1.1.2-0-ga916309
> > >   docker-init:
> > >Version:  0.19.0
> > >GitCommit:de40ad0
> > >
> > >  > gradle -v
> > >
> > > 
> > > Gradle 7.4.2
> > > 
> > >
> > > Build time:   2022-03-31 15:25:29 UTC
> > > Revision: 540473b8118064efcc264694cbcaa4b677f61041
> > >
> > > Kotlin:   1.5.31
> > > Groovy:   3.0.9
> > > Ant:  Apache Ant(TM) version 1.10.11 compiled on July 10 2021
> > > JVM:  17.0.3 (Eclipse Adoptium 17.0.3+7)
> > > OS:   Linux 5.10.102.1-microsoft-standard-WSL2 amd64
> > >
> > >  > java -version
> > > openjdk version "17.0.3" 2022-04-19
> > > OpenJDK Runtime Environment Temurin-17.0.3+7 (build 17.0.3+7)
> > > OpenJDK 64-Bit Server VM Temurin-17.0.3+7 (build 17.0.3+7, mixed mode,
> > > sharing)
> > >
> > > Francis
> > >
> > > On 30/07/2022 6:23 am, Julian Hyde wrote:
> > > > Downloaded, checked sums and signatures, LICENSE, NOTICE, README,
> > > howto.md, history.md; compiled and ran tests using OpenJDK 18, Gradle
> > 7.4.2
> > > on Ubuntu Linux 5.4.0 x86_64; ran rat.
> > > >
> > > > +1 (binding)
> > > >
> > > > Julian
> > > >
> > > >
> > > >> On Jul 29, 2022, at 8:48 AM, Benchao Li 
> wrote:
> > > >>
> > > >> +1 (non-binding)
> > > >>
> > > >> - checked signature and checksum: OK
> > > >> - build and test from source: OK
> > > >> - diff sources with release tag: OK
> > > >> - checked NOTICE file and licenses: OK
> > > >> - checked files in repo staging: OK
> > > >>
> > > >> Ruben Q L  于2022年7月29日周五 22:39写道:
> > > >>
> > > >>> Thanks again Andrei for working on this.
> > > >>>
> > > >>> - Checksum and signature: ok
> > > >>> - Gradle test: ok
> > > >>> - Calcite-based application test suite: ok
> > > >>> - Diff between git repo and release sources: ok
> > > >>>
> > > >>> My vote is: +1 (binding)
> > > >>>
> > > >>> Best,
> > > >>> Ruben
> > > >>>
> > > >>>
> > > >>> On Fri, Jul 29, 2022 at 1:50 PM Enrico Olivelli <
> eolive...@gmail.com
> > >
> > > >>> wrote:
> > > >>>
> > >  +1 (non binding)
> > > 
> > >  All tests of HerdDB pass without any code changes
> > >  CI results on this PR:
> https://github.com/diennea/herddb/pull/789
> > > 
> > > 
> > >  Thanks for driving the release
> > >  Enrico
> > > 
> > >  Il giorno ven 29 lug 2022 alle ore 14:07 Andrei Sereda
> > >   ha scritto:
> > > >
> > > > Hi all,
> > > >
> > > > I have created a build for Apache Calcite 1.31.0, release
> 

[RESULT] [VOTE] Release Apache Calcite 1.31.0 (release candidate 3)

2022-08-02 Thread Andrei Sereda
Thanks to everyone who has tested the release candidate and given
their comments and votes.

The tally is as follows.

5 binding +1s:
Ruben Q L
Julian Hyde
Francis Chuang
Stamatis Zempetakis
Chunwei Lei

3 non-binding +1s:
Andrei Sereda
Enrico Olivelli
Benchao Li

No 0s or -1s.

Therefore, I am delighted to announce that the proposal to release
Apache Calcite 1.31.0 has passed.

Thanks everyone. We’ll now roll the release out to the mirrors.

There was some feedback during voting. I shall open a separate
thread to discuss.

Andrei


[jira] [Created] (CALCITE-5225) No match found for function signature TO_CHAR()

2022-08-02 Thread Jira
杨晓晨 created CALCITE-5225:


 Summary: No match found for function signature TO_CHAR()
 Key: CALCITE-5225
 URL: https://issues.apache.org/jira/browse/CALCITE-5225
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.30.0
 Environment: {code:java}
public class Example1 {
    public static void main(String[] args) throws Exception {
        String sql = "SELECT a.ID, to_char( a.NAME ), to_char( SYSDATE, 
'/mm/dd'), b.AGE FROM HDC_ODS.DEMO a LEFT JOIN HDC_ODS.DEMO1 b ON a.ID = 
b.ID";
        runProjectQueryWithLex(Lex.ORACLE, sql);
    }
    private static void runProjectQueryWithLex(Lex lex, String sql)
            throws SqlParseException, ValidationException, 
RelConversionException {
        SqlParser.Config javaLex = 
SqlParser.configBuilder().setLex(lex).build();
        Planner planner = getPlanner(null, javaLex, 
Programs.ofRules(Programs.RULE_SET));
        SqlNode parse = planner.parse(sql);
        SqlNode validate = planner.validate(parse);
        System.out.println(validate);
    }
    private static Planner getPlanner(List traitDefs,
                                      SqlParser.Config parserConfig, Program... 
programs) {
        CalciteSchema calciteSchema = CalciteSchema.createRootSchema(true, 
true);
        SchemaPlus rootSchema = calciteSchema.plus();
        //final SchemaPlus rootSchema = Frameworks.createRootSchema(true);
        DataSource dataSource = 
JdbcSchema.dataSource("jdbc:oracle:thin:@//XXX:1521/ORCL?fun=oracle",
                "oracle.jdbc.driver.OracleDriver", "xxx", "");
        final JdbcCatalogSchema schema =
                JdbcCatalogSchema.create(null, "", dataSource, "PUBLIC");
        final CalciteSchema rootSchema0 =
                CalciteSchema.createRootSchema(false, false, "", schema);
        final val schemaMap = rootSchema0.getSubSchemaMap();
        schemaMap.forEach((key, value) -> {
            rootSchema.add(key, value.schema);
        });
        SqlOperatorTable opTab =
                SqlLibraryOperatorTableFactory.INSTANCE.getOperatorTable(
                        
EnumSet.of(SqlLibrary.ORACLE,SqlLibrary.STANDARD,SqlLibrary.SPATIAL));

        final FrameworkConfig config = Frameworks.newConfigBuilder()
                .parserConfig(parserConfig)
                .defaultSchema(rootSchema)
                .traitDefs(traitDefs)
                .programs(programs)
                .operatorTable(opTab)
                .build();
        return Frameworks.getPlanner(config);
    }
}
{code}
 

Oracle Table

DEMO:

!image-2022-08-02-17-07-11-242.png!

DEMO1:

!image-2022-08-02-17-08-02-305.png!

 

 

 

 
Reporter: 杨晓晨
 Attachments: image-2022-08-02-17-07-11-242.png, 
image-2022-08-02-17-08-02-305.png

I want to implement a function that can parse and verify Oracle SQL.
But there are some problems in verifying the Oracle function, it seems that 
some functions are not supported, but it can run normally in Oracle! How should 
I solve it? Modify which parameters of Calcite, it is best to support most of 
the functions of Oracle!!
Many thanks!

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: New Polymorphic Tables functionality -- interesting use cases or am I misunderstanding?

2022-08-02 Thread Jing Zhang
Hi, Ray,

> I assume somebody could write custom function to do something like this:
>
> SELECT *
> FROM TABLE (
>HttpReader(
>   method => 'GET'
>   url => 'https://api.mysite.com/users/1'
>   )
> )
>
> Where "https://api.mysite.com/users/1; returns JSON data like:
>
> { "id": 1, "name": "Some Person", "is_registered": true }
>
> And then we can infer table definition:
>
> TABLE (
>   id INT
>   name TEXT
>   is_registered BOOLEAN
> )
>
> And read JSON as row(s):
>
> id | name | is_registered
> 1  | some | true
>
I think it could be done by a polymorphic table function.
Row type of result table is not declared when the function is created.
It depends on the function arguments, in your case, is 'url' literal
argument.
The row type of result table could be inferred when compile the specified
query.
About how to get the json schema?
Perhaps directly get the json result of literal 'url' argument, analyze the
schema.
The json result could be cached because it would be used again in the later
execution phase.

Best,
Jing Zhang

Gavin Ray  于2022年8月1日周一 04:40写道:

> Thank you for the comprehensive response, Jing =)
>
> For example, We often need to consume a CSV file.
> > Generally, the first line of the file contains a list of column names,
> and
> > subsequent lines of the file contain data.
> > However, different CSV files may have different schemas.
> > We could offer a table function named CSVreader which interprets this
> file
>
>
>  FROM TABLE (
> >   CSVreader (File => 'abc.csv')
> > )
>
>
> Ah yes this seems about what I was hoping for!
> I assume somebody could write custom function to do something like this:
>
> SELECT *
> FROM TABLE (
>HttpReader(
>   method => 'GET'
>   url => 'https://api.mysite.com/users/1'
>   )
> )
>
> Where "https://api.mysite.com/users/1; returns JSON data like:
>
> { "id": 1, "name": "Some Person", "is_registered": true }
>
> And then we can infer table definition:
>
> TABLE (
>   id INT
>   name TEXT
>   is_registered BOOLEAN
> )
>
> And read JSON as row(s):
>
> id | name | is_registered
> 1  | some | true
>
>
> On Sun, Jul 31, 2022 at 10:37 AM Jing Zhang  wrote:
>
> > Hi,
> > Thanks for your attention.
> > Yes, we hope to provide full support for polymorphic table functions.
> > We have made some progress in version 1.31.0 which will be released soon.
> > Any feedback is welcome.
> >
> > I'm not sure whether the polymorphic table function could satisfy your
> > requirement because I don't know your demands very well yet.
> > It would be very helpful if you could provide some more detailed
> > description.
> >
> > Here, I would like to share something about polymorphic table function.
> > A polymorphic table function is a function that returns a table whose row
> > type is not declared when the function is created.
> > Rather, the row type of the result may depend on the function arguments
> in
> > the invocation of a PTF,
> > and therefore may vary depending on the precise syntax containing the PTF
> > invocation.
> > In addition, a PTF may have generic table parameters (i.e., no row type
> > declared when the PTF is created),
> > and the row type of the result might depend on the row type(s) of the
> input
> > tables.
> >
> > For example, We often need to consume a CSV file.
> > Generally, the first line of the file contains a list of column names,
> and
> > subsequent lines of the file contain data.
> > However, different CSV files may have different schemas.
> > We could offer a table function named CSVreader which interprets this
> file
> > as a table.
> > For example, there is a csv file named abc.csv with the following
> contents:
> >
> > docno,name,due_date,principle,interest
> > 123,Mary,01/01/2014,234.56,345.67
> > 234,Edgar,01/01/2014,654.32,543.21
> >
> >
> > the query author may write a query such as the following:
> >
> >
> > SELECT *
> > FROM TABLE (
> >   CSVreader (
> > File => 'abc.csv',
> > Floats => DESCRIPTOR ("principle", "interest")
> > Dates => DESCRIPTOR ("due_date")
> >   )
> > ) AS S
> >
> >
> > The result will be
> > docnonamedue_dateprincipleinterest
> > 123 Mary 01/01/2014 234.56 345.67
> > 234 Edgar 01/01/2014 654.32 543.21
> >
> >
> > More examples could be found in the description of CALCITE-4865[1].
> > Users could define a custom polymorphic table function by inheriting a
> > related interface[2].
> >
> > Best,
> > Jing Zhang
> >
> > [1] https://issues.apache.org/jira/browse/CALCITE-4865
> > [2]
> >
> >
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/SqlTableFunction.java
> >
> > Gavin Ray  于2022年7月30日周六 07:27写道:
> >
> > > I see there's a recent commit which references this:
> > > [CALCITE-4864] Supports Polymorphic Table function - ASF JIRA (
> > apache.org)
> > > 
> > >
> > > Something Teiid [0] did was to use JSON_TABLE() from dynamic data at
> > > runtime.
> > >
> > > I had an opportunity to speak to the