[jira] [Commented] (SPARK-9182) filter and groupBy on DataFrames are not passed through to jdbc source
[ https://issues.apache.org/jira/browse/SPARK-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177455#comment-17177455 ] ZhouDaHong commented on SPARK-9182: --- Hello, it seems that the problem is that the "Sal" field is of numerical type, but in the actual SQL process, it is impossible to match the numeric value non equivalently. Try changing the "Sal" field to int or double. > filter and groupBy on DataFrames are not passed through to jdbc source > -- > > Key: SPARK-9182 > URL: https://issues.apache.org/jira/browse/SPARK-9182 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1 >Reporter: Greg Rahn >Assignee: Yijie Shen >Priority: Critical > > When running all of these API calls, the only one that passes the filter > through to the backend jdbc source is equality. All filters in these > commands should be able to be passed through to the jdbc database source. > {code} > val url="jdbc:postgresql:grahn" > val prop = new java.util.Properties > val emp = sqlContext.read.jdbc(url, "emp", prop) > emp.filter(emp("sal") === 5000).show() > emp.filter(emp("sal") < 5000).show() > emp.filter("sal = 3000").show() > emp.filter("sal > 2500").show() > emp.filter("sal >= 2500").show() > emp.filter("sal < 2500").show() > emp.filter("sal <= 2500").show() > emp.filter("sal != 3000").show() > emp.filter("sal between 3000 and 5000").show() > emp.filter("ename in ('SCOTT','BLAKE')").show() > {code} > We see from the PostgreSQL query log the following is run, and see that only > equality predicates are passed through. > {code} > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp WHERE > sal = 5000 > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp WHERE > sal = 3000 > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9182) filter and groupBy on DataFrames are not passed through to jdbc source
[ https://issues.apache.org/jira/browse/SPARK-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037670#comment-15037670 ] Hyukjin Kwon commented on SPARK-9182: - Filed here https://issues.apache.org/jira/browse/SPARK-12126. > filter and groupBy on DataFrames are not passed through to jdbc source > -- > > Key: SPARK-9182 > URL: https://issues.apache.org/jira/browse/SPARK-9182 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1 >Reporter: Greg Rahn >Assignee: Yijie Shen >Priority: Critical > > When running all of these API calls, the only one that passes the filter > through to the backend jdbc source is equality. All filters in these > commands should be able to be passed through to the jdbc database source. > {code} > val url="jdbc:postgresql:grahn" > val prop = new java.util.Properties > val emp = sqlContext.read.jdbc(url, "emp", prop) > emp.filter(emp("sal") === 5000).show() > emp.filter(emp("sal") < 5000).show() > emp.filter("sal = 3000").show() > emp.filter("sal > 2500").show() > emp.filter("sal >= 2500").show() > emp.filter("sal < 2500").show() > emp.filter("sal <= 2500").show() > emp.filter("sal != 3000").show() > emp.filter("sal between 3000 and 5000").show() > emp.filter("ename in ('SCOTT','BLAKE')").show() > {code} > We see from the PostgreSQL query log the following is run, and see that only > equality predicates are passed through. > {code} > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp WHERE > sal = 5000 > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp WHERE > sal = 3000 > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9182) filter and groupBy on DataFrames are not passed through to jdbc source
[ https://issues.apache.org/jira/browse/SPARK-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031526#comment-15031526 ] Cheng Lian commented on SPARK-9182: --- Converting all {{expressions.Filter}} to {{sources.Filter}} basically means that we need to write a new expression library, which might not worth the effort. This reminds me the experimental {{CatalystScan}} trait, which had once been used for {{ParquetRelation}} to handling partition pruning before {{HadoopFsRelation}} was added. Since JDBC is a builtin data source, maybe we can use similar tricks to pass Catalyst expressions rather than {{sources.Filter}} directly to it, so that it can make smarter decisions. > filter and groupBy on DataFrames are not passed through to jdbc source > -- > > Key: SPARK-9182 > URL: https://issues.apache.org/jira/browse/SPARK-9182 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1 >Reporter: Greg Rahn >Assignee: Yijie Shen >Priority: Critical > > When running all of these API calls, the only one that passes the filter > through to the backend jdbc source is equality. All filters in these > commands should be able to be passed through to the jdbc database source. > {code} > val url="jdbc:postgresql:grahn" > val prop = new java.util.Properties > val emp = sqlContext.read.jdbc(url, "emp", prop) > emp.filter(emp("sal") === 5000).show() > emp.filter(emp("sal") < 5000).show() > emp.filter("sal = 3000").show() > emp.filter("sal > 2500").show() > emp.filter("sal >= 2500").show() > emp.filter("sal < 2500").show() > emp.filter("sal <= 2500").show() > emp.filter("sal != 3000").show() > emp.filter("sal between 3000 and 5000").show() > emp.filter("ename in ('SCOTT','BLAKE')").show() > {code} > We see from the PostgreSQL query log the following is run, and see that only > equality predicates are passed through. > {code} > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp WHERE > sal = 5000 > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp WHERE > sal = 3000 > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9182) filter and groupBy on DataFrames are not passed through to jdbc source
[ https://issues.apache.org/jira/browse/SPARK-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031266#comment-15031266 ] Hyukjin Kwon commented on SPARK-9182: - Just a thought, Currently {{DataSourceStrategy.translateFilter}} just converts {{expressions.Filter}} to {{sources.Filter}} which are commonly convertable. However, for other datasources, more or less filters could be processed. As far as I know, to exclude some filters, {{unhanldedFilters}} interface is added by https://issues.apache.org/jira/browse/SPARK-10978. To include some more filters for example for JDBC and others such as Elasticsearch or Solr, should we better just convert all {{expressions.Filter}} to {{sources.Filter}} to hide the internals and then let the {{unhanldedFilters}} select the filters that it can process? Even though adding this logic to the Spark internal datasources (namely correcting the Parquet or ORC datasources to get rid of duplicated filters) should also be done, this still would be advantageous as this would remove Spark-side filtering (currently, the internal datasources filter data twice at Spark side and also datasource side). > filter and groupBy on DataFrames are not passed through to jdbc source > -- > > Key: SPARK-9182 > URL: https://issues.apache.org/jira/browse/SPARK-9182 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1 >Reporter: Greg Rahn >Assignee: Yijie Shen >Priority: Critical > > When running all of these API calls, the only one that passes the filter > through to the backend jdbc source is equality. All filters in these > commands should be able to be passed through to the jdbc database source. > {code} > val url="jdbc:postgresql:grahn" > val prop = new java.util.Properties > val emp = sqlContext.read.jdbc(url, "emp", prop) > emp.filter(emp("sal") === 5000).show() > emp.filter(emp("sal") < 5000).show() > emp.filter("sal = 3000").show() > emp.filter("sal > 2500").show() > emp.filter("sal >= 2500").show() > emp.filter("sal < 2500").show() > emp.filter("sal <= 2500").show() > emp.filter("sal != 3000").show() > emp.filter("sal between 3000 and 5000").show() > emp.filter("ename in ('SCOTT','BLAKE')").show() > {code} > We see from the PostgreSQL query log the following is run, and see that only > equality predicates are passed through. > {code} > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp WHERE > sal = 5000 > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp WHERE > sal = 3000 > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9182) filter and groupBy on DataFrames are not passed through to jdbc source
[ https://issues.apache.org/jira/browse/SPARK-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955646#comment-14955646 ] Davies Liu commented on SPARK-9182: --- For JDBC, I think we could push more stuff (for example, a + b > 3) into remote database, which include casting. This is more useful for JDBC than other file based data sources, we may could spend more efforts on it. > filter and groupBy on DataFrames are not passed through to jdbc source > -- > > Key: SPARK-9182 > URL: https://issues.apache.org/jira/browse/SPARK-9182 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1 >Reporter: Greg Rahn >Assignee: Yijie Shen >Priority: Critical > > When running all of these API calls, the only one that passes the filter > through to the backend jdbc source is equality. All filters in these > commands should be able to be passed through to the jdbc database source. > {code} > val url="jdbc:postgresql:grahn" > val prop = new java.util.Properties > val emp = sqlContext.read.jdbc(url, "emp", prop) > emp.filter(emp("sal") === 5000).show() > emp.filter(emp("sal") < 5000).show() > emp.filter("sal = 3000").show() > emp.filter("sal > 2500").show() > emp.filter("sal >= 2500").show() > emp.filter("sal < 2500").show() > emp.filter("sal <= 2500").show() > emp.filter("sal != 3000").show() > emp.filter("sal between 3000 and 5000").show() > emp.filter("ename in ('SCOTT','BLAKE')").show() > {code} > We see from the PostgreSQL query log the following is run, and see that only > equality predicates are passed through. > {code} > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp WHERE > sal = 5000 > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp WHERE > sal = 3000 > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9182) filter and groupBy on DataFrames are not passed through to jdbc source
[ https://issues.apache.org/jira/browse/SPARK-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740577#comment-14740577 ] Apache Spark commented on SPARK-9182: - User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/8718 > filter and groupBy on DataFrames are not passed through to jdbc source > -- > > Key: SPARK-9182 > URL: https://issues.apache.org/jira/browse/SPARK-9182 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1 >Reporter: Greg Rahn >Assignee: Yijie Shen >Priority: Critical > > When running all of these API calls, the only one that passes the filter > through to the backend jdbc source is equality. All filters in these > commands should be able to be passed through to the jdbc database source. > {code} > val url="jdbc:postgresql:grahn" > val prop = new java.util.Properties > val emp = sqlContext.read.jdbc(url, "emp", prop) > emp.filter(emp("sal") === 5000).show() > emp.filter(emp("sal") < 5000).show() > emp.filter("sal = 3000").show() > emp.filter("sal > 2500").show() > emp.filter("sal >= 2500").show() > emp.filter("sal < 2500").show() > emp.filter("sal <= 2500").show() > emp.filter("sal != 3000").show() > emp.filter("sal between 3000 and 5000").show() > emp.filter("ename in ('SCOTT','BLAKE')").show() > {code} > We see from the PostgreSQL query log the following is run, and see that only > equality predicates are passed through. > {code} > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp WHERE > sal = 5000 > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp WHERE > sal = 3000 > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > LOG: execute : SET extra_float_digits = 3 > LOG: execute : SELECT > "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9182) filter and groupBy on DataFrames are not passed through to jdbc source
[ https://issues.apache.org/jira/browse/SPARK-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694727#comment-14694727 ] Yijie Shen commented on SPARK-9182: --- reverted it via https://github.com/apache/spark/pull/8157 filter and groupBy on DataFrames are not passed through to jdbc source -- Key: SPARK-9182 URL: https://issues.apache.org/jira/browse/SPARK-9182 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.1 Reporter: Greg Rahn Assignee: Yijie Shen Priority: Critical Fix For: 1.5.0 When running all of these API calls, the only one that passes the filter through to the backend jdbc source is equality. All filters in these commands should be able to be passed through to the jdbc database source. {code} val url=jdbc:postgresql:grahn val prop = new java.util.Properties val emp = sqlContext.read.jdbc(url, emp, prop) emp.filter(emp(sal) === 5000).show() emp.filter(emp(sal) 5000).show() emp.filter(sal = 3000).show() emp.filter(sal 2500).show() emp.filter(sal = 2500).show() emp.filter(sal 2500).show() emp.filter(sal = 2500).show() emp.filter(sal != 3000).show() emp.filter(sal between 3000 and 5000).show() emp.filter(ename in ('SCOTT','BLAKE')).show() {code} We see from the PostgreSQL query log the following is run, and see that only equality predicates are passed through. {code} LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp WHERE sal = 5000 LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp WHERE sal = 3000 LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9182) filter and groupBy on DataFrames are not passed through to jdbc source
[ https://issues.apache.org/jira/browse/SPARK-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694739#comment-14694739 ] Cheng Lian commented on SPARK-9182: --- [~grahn] Unfortunately we found a regression in the previous fix and have to revert it. Before a proper fix is delivered, this issue can be worked around by explicit casting over the literal values in the filter. Namely, using {noformat} emp.filter(sal cast(2500 as decimal(7, 2))) {noformat} instead of {noformat} emp.filter(sal 2500) {noformat} filter and groupBy on DataFrames are not passed through to jdbc source -- Key: SPARK-9182 URL: https://issues.apache.org/jira/browse/SPARK-9182 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.1 Reporter: Greg Rahn Assignee: Yijie Shen Priority: Critical Fix For: 1.5.0 When running all of these API calls, the only one that passes the filter through to the backend jdbc source is equality. All filters in these commands should be able to be passed through to the jdbc database source. {code} val url=jdbc:postgresql:grahn val prop = new java.util.Properties val emp = sqlContext.read.jdbc(url, emp, prop) emp.filter(emp(sal) === 5000).show() emp.filter(emp(sal) 5000).show() emp.filter(sal = 3000).show() emp.filter(sal 2500).show() emp.filter(sal = 2500).show() emp.filter(sal 2500).show() emp.filter(sal = 2500).show() emp.filter(sal != 3000).show() emp.filter(sal between 3000 and 5000).show() emp.filter(ename in ('SCOTT','BLAKE')).show() {code} We see from the PostgreSQL query log the following is run, and see that only equality predicates are passed through. {code} LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp WHERE sal = 5000 LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp WHERE sal = 3000 LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9182) filter and groupBy on DataFrames are not passed through to jdbc source
[ https://issues.apache.org/jira/browse/SPARK-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662871#comment-14662871 ] Apache Spark commented on SPARK-9182: - User 'yjshen' has created a pull request for this issue: https://github.com/apache/spark/pull/8049 filter and groupBy on DataFrames are not passed through to jdbc source -- Key: SPARK-9182 URL: https://issues.apache.org/jira/browse/SPARK-9182 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.1 Reporter: Greg Rahn Assignee: Cheng Lian Priority: Critical When running all of these API calls, the only one that passes the filter through to the backend jdbc source is equality. All filters in these commands should be able to be passed through to the jdbc database source. {code} val url=jdbc:postgresql:grahn val prop = new java.util.Properties val emp = sqlContext.read.jdbc(url, emp, prop) emp.filter(emp(sal) === 5000).show() emp.filter(emp(sal) 5000).show() emp.filter(sal = 3000).show() emp.filter(sal 2500).show() emp.filter(sal = 2500).show() emp.filter(sal 2500).show() emp.filter(sal = 2500).show() emp.filter(sal != 3000).show() emp.filter(sal between 3000 and 5000).show() emp.filter(ename in ('SCOTT','BLAKE')).show() {code} We see from the PostgreSQL query log the following is run, and see that only equality predicates are passed through. {code} LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp WHERE sal = 5000 LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp WHERE sal = 3000 LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9182) filter and groupBy on DataFrames are not passed through to jdbc source
[ https://issues.apache.org/jira/browse/SPARK-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660278#comment-14660278 ] Cheng Lian commented on SPARK-9182: --- Hey [~grahn], sorry for the late reply, I somehow missed your last two comments. Thanks for the detailed information. I'm able to reproduce this issue locally now. Confirmed that it's related to NUMERIC. Trying to deliver a fix for this. filter and groupBy on DataFrames are not passed through to jdbc source -- Key: SPARK-9182 URL: https://issues.apache.org/jira/browse/SPARK-9182 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.1 Reporter: Greg Rahn When running all of these API calls, the only one that passes the filter through to the backend jdbc source is equality. All filters in these commands should be able to be passed through to the jdbc database source. {code} val url=jdbc:postgresql:grahn val prop = new java.util.Properties val emp = sqlContext.read.jdbc(url, emp, prop) emp.filter(emp(sal) === 5000).show() emp.filter(emp(sal) 5000).show() emp.filter(sal = 3000).show() emp.filter(sal 2500).show() emp.filter(sal = 2500).show() emp.filter(sal 2500).show() emp.filter(sal = 2500).show() emp.filter(sal != 3000).show() emp.filter(sal between 3000 and 5000).show() emp.filter(ename in ('SCOTT','BLAKE')).show() {code} We see from the PostgreSQL query log the following is run, and see that only equality predicates are passed through. {code} LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp WHERE sal = 5000 LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp WHERE sal = 3000 LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9182) filter and groupBy on DataFrames are not passed through to jdbc source
[ https://issues.apache.org/jira/browse/SPARK-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639065#comment-14639065 ] Greg Rahn commented on SPARK-9182: -- Looks like it's related to NUMERIC data types from a quick test. filter and groupBy on DataFrames are not passed through to jdbc source -- Key: SPARK-9182 URL: https://issues.apache.org/jira/browse/SPARK-9182 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.1 Reporter: Greg Rahn When running all of these API calls, the only one that passes the filter through to the backend jdbc source is equality. All filters in these commands should be able to be passed through to the jdbc database source. {code} val url=jdbc:postgresql:grahn val prop = new java.util.Properties val emp = sqlContext.read.jdbc(url, emp, prop) emp.filter(emp(sal) === 5000).show() emp.filter(emp(sal) 5000).show() emp.filter(sal = 3000).show() emp.filter(sal 2500).show() emp.filter(sal = 2500).show() emp.filter(sal 2500).show() emp.filter(sal = 2500).show() emp.filter(sal != 3000).show() emp.filter(sal between 3000 and 5000).show() emp.filter(ename in ('SCOTT','BLAKE')).show() {code} We see from the PostgreSQL query log the following is run, and see that only equality predicates are passed through. {code} LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp WHERE sal = 5000 LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp WHERE sal = 3000 LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9182) filter and groupBy on DataFrames are not passed through to jdbc source
[ https://issues.apache.org/jira/browse/SPARK-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639036#comment-14639036 ] Greg Rahn commented on SPARK-9182: -- {code} grahn=# \d emp Table public.emp Column | Type | Modifiers --+---+--- empno| numeric(4,0) | ename| character varying(10) | job | character varying(9) | mgr | numeric(4,0) | hiredate | date | sal | numeric(7,2) | comm | numeric(7,2) | deptno | numeric(2,0) | grahn=# select * from emp; empno | ename |job| mgr | hiredate | sal | comm | deptno ---++---+--++-+-+ 7369 | SMITH | CLERK | 7902 | 1980-12-17 | 800.00 | | 20 7499 | ALLEN | SALESMAN | 7698 | 1981-02-20 | 1600.00 | 300.00 | 30 7521 | WARD | SALESMAN | 7698 | 1981-02-22 | 1250.00 | 500.00 | 30 7566 | JONES | MANAGER | 7839 | 1981-04-02 | 2975.00 | | 20 7654 | MARTIN | SALESMAN | 7698 | 1981-09-28 | 1250.00 | 1400.00 | 30 7698 | BLAKE | MANAGER | 7839 | 1981-05-01 | 2850.00 | | 30 7782 | CLARK | MANAGER | 7839 | 1981-06-09 | 2450.00 | | 10 7788 | SCOTT | ANALYST | 7566 | 1982-12-09 | 3000.00 | | 20 7839 | KING | PRESIDENT | | 1981-11-17 | 5000.00 | | 10 7844 | TURNER | SALESMAN | 7698 | 1981-09-08 | 1500.00 |0.00 | 30 7876 | ADAMS | CLERK | 7788 | 1983-01-12 | 1100.00 | | 20 7900 | JAMES | CLERK | 7698 | 1981-12-03 | 950.00 | | 30 7902 | FORD | ANALYST | 7566 | 1981-12-03 | 3000.00 | | 20 7934 | MILLER | CLERK | 7782 | 1982-01-23 | 1300.00 | | 10 (14 rows) {code} filter and groupBy on DataFrames are not passed through to jdbc source -- Key: SPARK-9182 URL: https://issues.apache.org/jira/browse/SPARK-9182 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.1 Reporter: Greg Rahn When running all of these API calls, the only one that passes the filter through to the backend jdbc source is equality. All filters in these commands should be able to be passed through to the jdbc database source. {code} val url=jdbc:postgresql:grahn val prop = new java.util.Properties val emp = sqlContext.read.jdbc(url, emp, prop) emp.filter(emp(sal) === 5000).show() emp.filter(emp(sal) 5000).show() emp.filter(sal = 3000).show() emp.filter(sal 2500).show() emp.filter(sal = 2500).show() emp.filter(sal 2500).show() emp.filter(sal = 2500).show() emp.filter(sal != 3000).show() emp.filter(sal between 3000 and 5000).show() emp.filter(ename in ('SCOTT','BLAKE')).show() {code} We see from the PostgreSQL query log the following is run, and see that only equality predicates are passed through. {code} LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp WHERE sal = 5000 LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp WHERE sal = 3000 LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9182) filter and groupBy on DataFrames are not passed through to jdbc source
[ https://issues.apache.org/jira/browse/SPARK-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638427#comment-14638427 ] Cheng Lian commented on SPARK-9182: --- I suspect that the type of column {{sal}} is {{text}} rather than any numeric type. Reproduced this issue with the following setup. Table DDL: {code:sql} create table t (a int, b real, c double precision, d text); {code} Test data: {code:sql} insert into t values (1, 1.1, 1.2, '1000'); insert into t values (2, 2.1, 2.2, '2000'); {code} Spark shell snippet: {code} val url = jdbc:postgresql:postgres val props = new java.util.Properties val t = sqlContext.read.jdbc(url, t, props) t.filter('d 1500).show() {code} Corresponding PostgreSQL log: {noformat} LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT a,b,c,d FROM t {noformat} filter and groupBy on DataFrames are not passed through to jdbc source -- Key: SPARK-9182 URL: https://issues.apache.org/jira/browse/SPARK-9182 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.1 Reporter: Greg Rahn When running all of these API calls, the only one that passes the filter through to the backend jdbc source is equality. All filters in these commands should be able to be passed through to the jdbc database source. {code} val url=jdbc:postgresql:grahn val prop = new java.util.Properties val emp = sqlContext.read.jdbc(url, emp, prop) emp.filter(emp(sal) === 5000).show() emp.filter(emp(sal) 5000).show() emp.filter(sal = 3000).show() emp.filter(sal 2500).show() emp.filter(sal = 2500).show() emp.filter(sal 2500).show() emp.filter(sal = 2500).show() emp.filter(sal != 3000).show() emp.filter(sal between 3000 and 5000).show() emp.filter(ename in ('SCOTT','BLAKE')).show() {code} We see from the PostgreSQL query log the following is run, and see that only equality predicates are passed through. {code} LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp WHERE sal = 5000 LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp WHERE sal = 3000 LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9182) filter and groupBy on DataFrames are not passed through to jdbc source
[ https://issues.apache.org/jira/browse/SPARK-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638421#comment-14638421 ] Cheng Lian commented on SPARK-9182: --- [~grahn] Could you please provide the schema of the table? Especially I'd like to know the data types of involved columns. filter and groupBy on DataFrames are not passed through to jdbc source -- Key: SPARK-9182 URL: https://issues.apache.org/jira/browse/SPARK-9182 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.1 Reporter: Greg Rahn When running all of these API calls, the only one that passes the filter through to the backend jdbc source is equality. All filters in these commands should be able to be passed through to the jdbc database source. {code} val url=jdbc:postgresql:grahn val prop = new java.util.Properties val emp = sqlContext.read.jdbc(url, emp, prop) emp.filter(emp(sal) === 5000).show() emp.filter(emp(sal) 5000).show() emp.filter(sal = 3000).show() emp.filter(sal 2500).show() emp.filter(sal = 2500).show() emp.filter(sal 2500).show() emp.filter(sal = 2500).show() emp.filter(sal != 3000).show() emp.filter(sal between 3000 and 5000).show() emp.filter(ename in ('SCOTT','BLAKE')).show() {code} We see from the PostgreSQL query log the following is run, and see that only equality predicates are passed through. {code} LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp WHERE sal = 5000 LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp WHERE sal = 3000 LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9182) filter and groupBy on DataFrames are not passed through to jdbc source
[ https://issues.apache.org/jira/browse/SPARK-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636688#comment-14636688 ] Cheng Lian commented on SPARK-9182: --- I'm looking into this. filter and groupBy on DataFrames are not passed through to jdbc source -- Key: SPARK-9182 URL: https://issues.apache.org/jira/browse/SPARK-9182 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.1 Reporter: Greg Rahn When running all of these API calls, the only one that passes the filter through to the backend jdbc source is equality. All filters in these commands should be able to be passed through to the jdbc database source. {code} val url=jdbc:postgresql:grahn val prop = new java.util.Properties val emp = sqlContext.read.jdbc(url, emp, prop) emp.filter(emp(sal) === 5000).show() emp.filter(emp(sal) 5000).show() emp.filter(sal = 3000).show() emp.filter(sal 2500).show() emp.filter(sal = 2500).show() emp.filter(sal 2500).show() emp.filter(sal = 2500).show() emp.filter(sal != 3000).show() emp.filter(sal between 3000 and 5000).show() emp.filter(ename in ('SCOTT','BLAKE')).show() {code} We see from the PostgreSQL query log the following is run, and see that only equality predicates are passed through. {code} LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp WHERE sal = 5000 LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp WHERE sal = 3000 LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9182) filter and groupBy on DataFrames are not passed through to jdbc source
[ https://issues.apache.org/jira/browse/SPARK-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636686#comment-14636686 ] Cheng Lian commented on SPARK-9182: --- I'm looking into this. filter and groupBy on DataFrames are not passed through to jdbc source -- Key: SPARK-9182 URL: https://issues.apache.org/jira/browse/SPARK-9182 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.1 Reporter: Greg Rahn When running all of these API calls, the only one that passes the filter through to the backend jdbc source is equality. All filters in these commands should be able to be passed through to the jdbc database source. {code} val url=jdbc:postgresql:grahn val prop = new java.util.Properties val emp = sqlContext.read.jdbc(url, emp, prop) emp.filter(emp(sal) === 5000).show() emp.filter(emp(sal) 5000).show() emp.filter(sal = 3000).show() emp.filter(sal 2500).show() emp.filter(sal = 2500).show() emp.filter(sal 2500).show() emp.filter(sal = 2500).show() emp.filter(sal != 3000).show() emp.filter(sal between 3000 and 5000).show() emp.filter(ename in ('SCOTT','BLAKE')).show() {code} We see from the PostgreSQL query log the following is run, and see that only equality predicates are passed through. {code} LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp WHERE sal = 5000 LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp WHERE sal = 3000 LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp LOG: execute unnamed: SET extra_float_digits = 3 LOG: execute unnamed: SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org