[jira] [Created] (SPARK-45437) Upgrade SNAPPY to 1.1.10.5 to pick up fix re Linux PLE64
N Campbell created SPARK-45437: -- Summary: Upgrade SNAPPY to 1.1.10.5 to pick up fix re Linux PLE64 Key: SPARK-45437 URL: https://issues.apache.org/jira/browse/SPARK-45437 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.5.0 Reporter: N Campbell SPARK-45323 move to Snappy 1.1.10.4 and is proposing to add to SPARK 3.5.1 Snappy prior to 1.1.10.5 will not work on Linux PLE 64. Moving to Snappy 1.1.10.5 will address that issue https://github.com/xerial/snappy-java/pull/515 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45323) Upgrade snappy to 1.1.10.4
[ https://issues.apache.org/jira/browse/SPARK-45323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17772424#comment-17772424 ] N Campbell commented on SPARK-45323: Is the same fix going to be back ported into a 3.5.x release? Given SPARK 3.5 released with Snappy 1.1.10.3 ? > Upgrade snappy to 1.1.10.4 > -- > > Key: SPARK-45323 > URL: https://issues.apache.org/jira/browse/SPARK-45323 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 4.0.0, 3.5.1 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Security Fix > Fixed SnappyInputStream so as not to allocate too large memory when > decompressing data with an extremely large chunk size by @tunnelshade (code > change) > This does not affect users only using Snappy.compress/uncompress methods -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-20856) support statement using nested joins
[ https://issues.apache.org/jira/browse/SPARK-20856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] N Campbell reopened SPARK-20856: Per prior comment. This is an enhancement request where SPARKSQL was being asked to provide better parity to the joined table syntax that many systems support. > support statement using nested joins > > > Key: SPARK-20856 > URL: https://issues.apache.org/jira/browse/SPARK-20856 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.1.0 >Reporter: N Campbell >Priority: Major > Labels: bulk-closed > > While DB2, ORACLE etc support a join expressed as follows, SPARK SQL does > not. > Not supported > select * from > cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint > on tbint.rnum = tint.rnum > on tint.rnum = tsint.rnum > versus written as shown > select * from > cert.tsint tsint inner join cert.tint tint on tsint.rnum = tint.rnum inner > join cert.tbint tbint on tint.rnum = tbint.rnum > > ERROR_STATE, SQL state: org.apache.spark.sql.catalyst.parser.ParseException: > extraneous input 'on' expecting {, ',', '.', '[', 'WHERE', 'GROUP', > 'ORDER', 'HAVING', 'LIMIT', 'OR', 'AND', 'IN', NOT, 'BETWEEN', 'LIKE', RLIKE, > 'IS', 'JOIN', 'CROSS', 'INNER', 'LEFT', 'RIGHT', 'FULL', 'NATURAL', > 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', EQ, '<=>', > '<>', '!=', '<', LTE, '>', GTE, '+', '-', '*', '/', '%', 'DIV', '&', '|', > '^', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'ANTI'}(line 4, pos 5) > == SQL == > select * from > cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint > on tbint.rnum = tint.rnum > on tint.rnum = tsint.rnum > -^^^ > , Query: select * from > cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint > on tbint.rnum = tint.rnum > on tint.rnum = tsint.rnum. > SQLState: HY000 > ErrorCode: 500051 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20829) var_samp returns Nan while other vendors return a null value
[ https://issues.apache.org/jira/browse/SPARK-20829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844875#comment-16844875 ] N Campbell commented on SPARK-20829: No reason was given. A. Apache SPARK does not want to align to ISO-SQL and will document the delta B. Apache SPARK will add an option for those who want ISO-SQL behaviour > var_samp returns Nan while other vendors return a null value > > > Key: SPARK-20829 > URL: https://issues.apache.org/jira/browse/SPARK-20829 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: N Campbell >Priority: Minor > Labels: bulk-closed > Attachments: TSUPPLY > > > SELECT > sno AS SNO, > pno AS PNO, > VAR_SAMP(qty) AS C1 > FROM > tsupply > GROUP BY > sno, > pno > create table if not exists TSUPPLY (RNUM int , SNO string, PNO string, JNO > string, QTY int ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS textfile ; -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20856) support statement using nested joins
[ https://issues.apache.org/jira/browse/SPARK-20856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844874#comment-16844874 ] N Campbell commented on SPARK-20856: An enhancement was bulk closed and incomplete. No reason given. Unclear if APACHE SPARK team saying they have no intent of ever implementing the enhancement vs a script which clobbered things. > support statement using nested joins > > > Key: SPARK-20856 > URL: https://issues.apache.org/jira/browse/SPARK-20856 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: N Campbell >Priority: Major > Labels: bulk-closed > > While DB2, ORACLE etc support a join expressed as follows, SPARK SQL does > not. > Not supported > select * from > cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint > on tbint.rnum = tint.rnum > on tint.rnum = tsint.rnum > versus written as shown > select * from > cert.tsint tsint inner join cert.tint tint on tsint.rnum = tint.rnum inner > join cert.tbint tbint on tint.rnum = tbint.rnum > > ERROR_STATE, SQL state: org.apache.spark.sql.catalyst.parser.ParseException: > extraneous input 'on' expecting {, ',', '.', '[', 'WHERE', 'GROUP', > 'ORDER', 'HAVING', 'LIMIT', 'OR', 'AND', 'IN', NOT, 'BETWEEN', 'LIKE', RLIKE, > 'IS', 'JOIN', 'CROSS', 'INNER', 'LEFT', 'RIGHT', 'FULL', 'NATURAL', > 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', EQ, '<=>', > '<>', '!=', '<', LTE, '>', GTE, '+', '-', '*', '/', '%', 'DIV', '&', '|', > '^', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'ANTI'}(line 4, pos 5) > == SQL == > select * from > cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint > on tbint.rnum = tint.rnum > on tint.rnum = tsint.rnum > -^^^ > , Query: select * from > cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint > on tbint.rnum = tint.rnum > on tint.rnum = tsint.rnum. > SQLState: HY000 > ErrorCode: 500051 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20827) cannot express HAVING without a GROUP BY clause
[ https://issues.apache.org/jira/browse/SPARK-20827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844872#comment-16844872 ] N Campbell commented on SPARK-20827: An enhancement was bulk closed and incomplete. No reason given. Unclear if APACHE SPARK team saying they have no intent of ever implementing the enhancement vs a script which clobbered things. > cannot express HAVING without a GROUP BY clause > --- > > Key: SPARK-20827 > URL: https://issues.apache.org/jira/browse/SPARK-20827 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: N Campbell >Priority: Minor > Labels: bulk-closed > > SPARK SQL does not support a HAVING clause without a GROUP BY which is valid > SQL and supported by other engines (ORACLE, DB2, ) > SELECT > '' AS `C1` > FROM > `cert`.`tparts` > HAVING > COUNT(`pno`) > 0 > SQL state: java.lang.UnsupportedOperationException: Cannot evaluate > expression: count(input[0, string, true]), Query: SELECT > '' AS `C1` > FROM > `cert`.`tparts` > HAVING > COUNT(`pno`) > 0. > SQLState: HY000 > ErrorCode: 500051 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20856) support statement using nested joins
N Campbell created SPARK-20856: -- Summary: support statement using nested joins Key: SPARK-20856 URL: https://issues.apache.org/jira/browse/SPARK-20856 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.1.0 Reporter: N Campbell While DB2, ORACLE etc support a join expressed as follows, SPARK SQL does not. Not supported select * from cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint on tbint.rnum = tint.rnum on tint.rnum = tsint.rnum versus written as shown select * from cert.tsint tsint inner join cert.tint tint on tsint.rnum = tint.rnum inner join cert.tbint tbint on tint.rnum = tbint.rnum ERROR_STATE, SQL state: org.apache.spark.sql.catalyst.parser.ParseException: extraneous input 'on' expecting {, ',', '.', '[', 'WHERE', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'OR', 'AND', 'IN', NOT, 'BETWEEN', 'LIKE', RLIKE, 'IS', 'JOIN', 'CROSS', 'INNER', 'LEFT', 'RIGHT', 'FULL', 'NATURAL', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', EQ, '<=>', '<>', '!=', '<', LTE, '>', GTE, '+', '-', '*', '/', '%', 'DIV', '&', '|', '^', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'ANTI'}(line 4, pos 5) == SQL == select * from cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint on tbint.rnum = tint.rnum on tint.rnum = tsint.rnum -^^^ , Query: select * from cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint on tbint.rnum = tint.rnum on tint.rnum = tsint.rnum. SQLState: HY000 ErrorCode: 500051 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20829) var_samp returns Nan while other vendors return a null value
[ https://issues.apache.org/jira/browse/SPARK-20829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] N Campbell updated SPARK-20829: --- Attachment: TSUPPLY > var_samp returns Nan while other vendors return a null value > > > Key: SPARK-20829 > URL: https://issues.apache.org/jira/browse/SPARK-20829 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: N Campbell >Priority: Minor > Attachments: TSUPPLY > > > SELECT > sno AS SNO, > pno AS PNO, > VAR_SAMP(qty) AS C1 > FROM > tsupply > GROUP BY > sno, > pno > create table if not exists TSUPPLY (RNUM int , SNO string, PNO string, JNO > string, QTY int ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS textfile ; -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20829) var_samp returns Nan while other vendors return a null value
[ https://issues.apache.org/jira/browse/SPARK-20829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] N Campbell updated SPARK-20829: --- Summary: var_samp returns Nan while other vendors return a null value (was: var_sampe returns Nan while other vendors return a null value) > var_samp returns Nan while other vendors return a null value > > > Key: SPARK-20829 > URL: https://issues.apache.org/jira/browse/SPARK-20829 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: N Campbell >Priority: Minor > > SELECT > sno AS SNO, > pno AS PNO, > VAR_SAMP(qty) AS C1 > FROM > tsupply > GROUP BY > sno, > pno > create table if not exists TSUPPLY (RNUM int , SNO string, PNO string, JNO > string, QTY int ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS textfile ; -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20829) var_sampe returns Nan while other vendors return a null value
N Campbell created SPARK-20829: -- Summary: var_sampe returns Nan while other vendors return a null value Key: SPARK-20829 URL: https://issues.apache.org/jira/browse/SPARK-20829 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.0 Reporter: N Campbell Priority: Minor SELECT sno AS SNO, pno AS PNO, VAR_SAMP(qty) AS C1 FROM tsupply GROUP BY sno, pno create table if not exists TSUPPLY (RNUM int , SNO string, PNO string, JNO string, QTY int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS textfile ; -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20828) Concatenated grouping sets scenario not supported
N Campbell created SPARK-20828: -- Summary: Concatenated grouping sets scenario not supported Key: SPARK-20828 URL: https://issues.apache.org/jira/browse/SPARK-20828 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.0 Reporter: N Campbell Following scenario supported by other vendors (i.e. ORACLE, DB2, ...) not supported by SPARK SQL WITH SQL1 AS ( SELECT sno AS C1, pno AS C2, SUM(qty) AS C3 FROM cert.tsupply GROUP BY ROLLUP(sno), CUBE(pno) ) SELECT SQL1.C1 AS C1, SQL1.C2 AS C2, SQL1.C3 AS C3 FROM SQL1 Error: [Simba][SparkJDBCDriver](500051) ERROR processing query/statement. Error Code: ERROR_STATE, SQL state: org.apache.spark.sql.AnalysisException: expression 'tsupply.`sno`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;; 'Project ['SQL1.C1 AS C1#1517671, 'SQL1.C2 AS C2#1517672, 'SQL1.C3 AS C3#1517673] +- 'SubqueryAlias SQL1 +- 'Aggregate [rollup(sno#1517678), cube(pno#1517679)], [sno#1517678 AS C1#1517674, pno#1517679 AS C2#1517675, sum(cast(qty#1517681 as bigint)) AS C3#1517676L] +- MetastoreRelation cert, tsupply -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20827) cannot express HAVING without a GROUP BY clause
N Campbell created SPARK-20827: -- Summary: cannot express HAVING without a GROUP BY clause Key: SPARK-20827 URL: https://issues.apache.org/jira/browse/SPARK-20827 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.0 Reporter: N Campbell SPARK SQL does not support a HAVING clause without a GROUP BY which is valid SQL and supported by other engines (ORACLE, DB2, ) SELECT '' AS `C1` FROM `cert`.`tparts` HAVING COUNT(`pno`) > 0 SQL state: java.lang.UnsupportedOperationException: Cannot evaluate expression: count(input[0, string, true]), Query: SELECT '' AS `C1` FROM `cert`.`tparts` HAVING COUNT(`pno`) > 0. SQLState: HY000 ErrorCode: 500051 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9686) Spark Thrift server doesn't return correct JDBC metadata
[ https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990849#comment-15990849 ] N Campbell commented on SPARK-9686: --- Is this likely to be fixed? current forces companies to purchase commercial JDBC drivers as a work around. > Spark Thrift server doesn't return correct JDBC metadata > - > > Key: SPARK-9686 > URL: https://issues.apache.org/jira/browse/SPARK-9686 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2 >Reporter: pin_zhang >Assignee: Cheng Lian >Priority: Critical > Attachments: SPARK-9686.1.patch.txt > > > 1. Start start-thriftserver.sh > 2. connect with beeline > 3. create table > 4.show tables, the new created table returned > 5. > Class.forName("org.apache.hive.jdbc.HiveDriver"); > String URL = "jdbc:hive2://localhost:1/default"; >Properties info = new Properties(); > Connection conn = DriverManager.getConnection(URL, info); > ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(), >null, null, null); > Problem: >No tables with returned this API, that work in spark1.3 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20545) union set operator should default to DISTINCT and not ALL semantics
N Campbell created SPARK-20545: -- Summary: union set operator should default to DISTINCT and not ALL semantics Key: SPARK-20545 URL: https://issues.apache.org/jira/browse/SPARK-20545 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.0 Reporter: N Campbell A set operation (i.e union) over two queries that produce identical row values should return the distinct set of rows and not all rows. ISO-SQL set operation semantics default to DISTINCT SPARK implementation is defaulting to ALL While SPARK allows DISTINCT keyword and some might assume ALL is faster, the wrong result set semantically is produced per standard (and commercial SQL systems including: ORACLE, DB2, Teradata, SQL Server etc.) select tsint.csint from cert.tsint union select tint.cint from cert.tint csint -1 0 1 10 -1 0 1 10 vs select tsint.csint from cert.tsint union distinct select tint.cint from cert.tint csint -1 1 10 0 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10777) order by fails when column is aliased and projection includes windowed aggregate
N Campbell created SPARK-10777: -- Summary: order by fails when column is aliased and projection includes windowed aggregate Key: SPARK-10777 URL: https://issues.apache.org/jira/browse/SPARK-10777 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: N Campbell This statement fails in SPARK (works fine in ORACLE, DB2 ) select r as c1, min ( s ) over () as c2 from ( select rnum r, sum ( cint ) s from certstring.tint group by rnum ) t order by r Error: org.apache.spark.sql.AnalysisException: cannot resolve 'r' given input columns c1, c2; line 3 pos 9 SQLState: null ErrorCode: 0 Forcing the aliased column name works around the defect select r as c1, min ( s ) over () as c2 from ( select rnum r, sum ( cint ) s from certstring.tint group by rnum ) t order by c1 These work fine select r as c1, min ( s ) over () as c2 from ( select rnum r, sum ( cint ) s from certstring.tint group by rnum ) t order by c1 select r as c1, s as c2 from ( select rnum r, sum ( cint ) s from certstring.tint group by rnum ) t order by r create table if not exists TINT ( RNUM int , CINT int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS ORC ; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10744) parser error (constant * column is null interpreted as constant * boolean)
N Campbell created SPARK-10744: -- Summary: parser error (constant * column is null interpreted as constant * boolean) Key: SPARK-10744 URL: https://issues.apache.org/jira/browse/SPARK-10744 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: N Campbell Priority: Minor SPARK SQL inherits the same defect as Hive where this statement will not parse/execute. See HIVE-9530 select c1 from t1 where 1 * cnnull is null -vs- select c1 from t1 where (1 * cnnull) is null -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10747) add support for window specification to include how NULLS are ordered
N Campbell created SPARK-10747: -- Summary: add support for window specification to include how NULLS are ordered Key: SPARK-10747 URL: https://issues.apache.org/jira/browse/SPARK-10747 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.5.0 Reporter: N Campbell You cannot express how NULLS are to be sorted in the window order specification and have to use a compensating expression to simulate. Error: org.apache.spark.sql.AnalysisException: line 1:76 missing ) at 'nulls' near 'nulls' line 1:82 missing EOF at 'last' near 'nulls'; SQLState: null Same limitation as Hive reported in Apache JIRA HIVE-9535 ) This fails select rnum, c1, c2, c3, dense_rank() over(partition by c1 order by c3 desc nulls last) from tolap select rnum, c1, c2, c3, dense_rank() over(partition by c1 order by case when c3 is null then 1 else 0 end) from tolap -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10507) reject temporal expressions such as timestamp - timestamp at parse time
[ https://issues.apache.org/jira/browse/SPARK-10507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] N Campbell updated SPARK-10507: --- Description: TIMESTAMP - TIMESTAMP in ISO-SQL should return an interval type which SPARK does not support.. A similar expression in Hive 0.13 fails with Error: Could not create ResultSet: Required field 'type' is unset! Struct:TPrimitiveTypeEntry(type:null) and SPARK has similar "challenges". While Hive 1.2.1 has added some interval type support it is far from complete with respect to ISO-SQL. The ability to compute the period of time (years, days, weeks, hours, ...) between timestamps or add/substract intervals from a timestamp are extremely common in business applications. Currently, a value expression such as select timestampcol - timestampcol from t will fail during execution and not parse time. While the error thrown states that fact, it would better for those value expressions to be rejected at parse time along with indicating the expression that is causing the parser error. Operation: execute Errors: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6214.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6214.0 (TID 21208, sandbox.hortonworks.com): java.lang.RuntimeException: Type TimestampType does not support numeric operations at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.expressions.Subtract.numeric$lzycompute(arithmetic.scala:138) at org.apache.spark.sql.catalyst.expressions.Subtract.numeric(arithmetic.scala:136) at org.apache.spark.sql.catalyst.expressions.Subtract.eval(arithmetic.scala:150) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:113) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:813) at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:813) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498) create table if not exists TTS ( RNUM int , CTS timestamp )TERMINATED BY '\n' STORED AS orc ; was: TIMESTAMP - TIMESTAMP in ISO-SQL is an interval type. Hive 0.13 fails with Error: Could not create ResultSet: Required field 'type' is unset! Struct:TPrimitiveTypeEntry(type:null) and SPARK has similar "challenges". select cts - cts from tts Operation: execute Errors: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6214.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6214.0 (TID 21208, sandbox.hortonworks.com): java.lang.RuntimeException: Type TimestampType does not support numeric operations at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.expressions.Subtract.numeric$lzycompute(arithmetic.scala:138) at org.apache.spark.sql.catalyst.expressions.Subtract.numeric(arithmetic.scala:136) at org.apache.spark.sql.catalyst.expressions.Subtract.eval(arithmetic.scala:150) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:113) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at
[jira] [Created] (SPARK-10502) tidy up the exception message text to be less verbose/"User friendly"
N Campbell created SPARK-10502: -- Summary: tidy up the exception message text to be less verbose/"User friendly" Key: SPARK-10502 URL: https://issues.apache.org/jira/browse/SPARK-10502 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.1 Reporter: N Campbell Priority: Minor When a statement is parsed, it would be preferred is the exception text were more aligned with other vendors re indicating the syntax error without the inclusion of the verbose parse tree. select tbint.rnum,tbint.cbint, nth_value( tbint.cbint, '4' ) over ( order by tbint.rnum) from certstring.tbint Errors: org.apache.spark.sql.AnalysisException: Unsupported language features in query: select tbint.rnum,tbint.cbint, nth_value( tbint.cbint, '4' ) over ( order by tbint.rnum) from certstring.tbint TOK_QUERY 1, 0,40, 94 TOK_FROM 1, 36,40, 94 TOK_TABREF 1, 38,40, 94 TOK_TABNAME 1, 38,40, 94 certstring 1, 38,38, 94 tbint 1, 40,40, 105 TOK_INSERT 0, -1,34, 0 TOK_DESTINATION 0, -1,-1, 0 TOK_DIR 0, -1,-1, 0 TOK_TMP_FILE 0, -1,-1, 0 TOK_SELECT 1, 0,34, 12 TOK_SELEXPR 1, 2,4, 12 . 1, 2,4, 12 TOK_TABLE_OR_COL 1, 2,2, 7 tbint 1, 2,2, 7 rnum 1, 4,4, 13 TOK_SELEXPR 1, 6,8, 23 . 1, 6,8, 23 TOK_TABLE_OR_COL 1, 6,6, 18 tbint 1, 6,6, 18 cbint 1, 8,8, 24 TOK_SELEXPR 1, 11,34, 31 TOK_FUNCTION 1, 11,34, 31 nth_value 1, 11,11, 31 . 1, 14,16, 47 TOK_TABLE_OR_COL 1, 14,14, 42 tbint 1, 14,14, 42 cbint 1, 16,16, 48 '4' 1, 19,19, 55 TOK_WINDOWSPEC 1, 25,34, 82 TOK_PARTITIONINGSPEC 1, 27,33, 82 TOK_ORDERBY 1, 27,33, 82 TOK_TABSORTCOLNAMEASC 1, 31,33, 82 . 1, 31,33, 82 TOK_TABLE_OR_COL 1, 31,31, 77 tbint 1, 31,31, 77 rnum 1, 33,33, 83 scala.NotImplementedError: No parse rules for ASTNode type: 882, text: TOK_WINDOWSPEC : TOK_WINDOWSPEC 1, 25,34, 82 TOK_PARTITIONINGSPEC 1, 27,33, 82 TOK_ORDERBY 1, 27,33, 82 TOK_TABSORTCOLNAMEASC 1, 31,33, 82 . 1, 31,33, 82 TOK_TABLE_OR_COL 1, 31,31, 77 tbint 1, 31,31, 77 rnum 1, 33,33, 83 " + org.apache.spark.sql.hive.HiveQl$.nodeToExpr(HiveQl.scala:1261) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10503) incorrect predicate evaluation involving NULL value
N Campbell created SPARK-10503: -- Summary: incorrect predicate evaluation involving NULL value Key: SPARK-10503 URL: https://issues.apache.org/jira/browse/SPARK-10503 Project: Spark Issue Type: Bug Components: SQL Reporter: N Campbell Query an ORC table in Hive using the following SQL statement via the SPARKSQL thrift-server. The row were rnum=0 has a c1 value of null. The resultset returned by SPARK includes a row where rnum=0 and c1=0 which is incorrect select tint.rnum, tint.rnum from tint where tint.cint in ( tint.cint ) table in Hive create table if not exists TINT ( RNUM int , CINT smallint ) TERMINATED BY '\n' STORED AS orc ; data loaded into ORC table is 0|\N 1|-1 2|0 3|1 4|10 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10508) incorrect evaluation of searched case expression
[ https://issues.apache.org/jira/browse/SPARK-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] N Campbell updated SPARK-10508: --- Summary: incorrect evaluation of searched case expression (was: incorrect evaluation of search case expression) > incorrect evaluation of searched case expression > > > Key: SPARK-10508 > URL: https://issues.apache.org/jira/browse/SPARK-10508 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.1 >Reporter: N Campbell > > The following case expression never evaluates to 'test1' when cdec is -1 or > 10 as it will for Hive 0.13. Instead is returns 'other' for all rows. > select rnum, cdec, case when cdec in ( -1,10,0.1 ) then 'test1' else 'other' > end from tdec > create table if not exists TDEC ( RNUM int , CDEC decimal(7, 2 )) > TERMINATED BY '\n' > STORED AS orc ; > 0|\N > 1|-1.00 > 2|0.00 > 3|1.00 > 4|0.10 > 5|10.00 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10508) incorrect evaluation of search case expression
N Campbell created SPARK-10508: -- Summary: incorrect evaluation of search case expression Key: SPARK-10508 URL: https://issues.apache.org/jira/browse/SPARK-10508 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.1 Reporter: N Campbell The following case expression never evaluates to 'test1' when cdec is -1 or 10 as it will for Hive 0.13. Instead is returns 'other' for all rows. select rnum, cdec, case when cdec in ( -1,10,0.1 ) then 'test1' else 'other' end from tdec create table if not exists TDEC ( RNUM int , CDEC decimal(7, 2 )) TERMINATED BY '\n' STORED AS orc ; 0|\N 1|-1.00 2|0.00 3|1.00 4|0.10 5|10.00 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10504) aggregate where NULL is defined as the value expression aborts when SUM used
N Campbell created SPARK-10504: -- Summary: aggregate where NULL is defined as the value expression aborts when SUM used Key: SPARK-10504 URL: https://issues.apache.org/jira/browse/SPARK-10504 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.1 Reporter: N Campbell Priority: Minor In ISO-SQL the context would determine an implicit type for NULL or one might find that a vendor requires an explicit type via CAST ( NULL as INTEGER). It appears that SPARK presumes a long type i.e. select min(NULL), max(NULL) but a query such the following aborts. select sum ( null ) from tversion Operation: execute Errors: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5232.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5232.0 (TID 18531, sandbox.hortonworks.com): scala.MatchError: NullType (of class org.apache.spark.sql.types.NullType$) at org.apache.spark.sql.catalyst.expressions.Cast.org$apache$spark$sql$catalyst$expressions$Cast$$cast(Cast.scala:403) at org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:422) at org.apache.spark.sql.catalyst.expressions.Cast.cast(Cast.scala:422) at org.apache.spark.sql.catalyst.expressions.Cast.eval(Cast.scala:426) at org.apache.spark.sql.catalyst.expressions.Coalesce.eval(nullFunctions.scala:51) at org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:119) at org.apache.spark.sql.catalyst.expressions.Coalesce.eval(nullFunctions.scala:51) at org.apache.spark.sql.catalyst.expressions.MutableLiteral.update(literals.scala:82) at org.apache.spark.sql.catalyst.expressions.SumFunction.update(aggregates.scala:581) at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$6.apply(Aggregate.scala:133) at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$6.apply(Aggregate.scala:126) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10505) windowed form of count ( star ) fails with No handler for udf class
N Campbell created SPARK-10505: -- Summary: windowed form of count ( star ) fails with No handler for udf class Key: SPARK-10505 URL: https://issues.apache.org/jira/browse/SPARK-10505 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.1 Reporter: N Campbell The following statement will parse/execute in Hive 0.13 but fails in SPARK. create a simple ORC table in Hive create table if not exists TOLAP (RNUM int , C1 string, C2 string, C3 int, C4 int) TERMINATED BY '\n' STORED AS orc ; select rnum, c1, c2, c3, count(*) over(partition by c1) from tolap Error: java.lang.RuntimeException: No handler for udf class org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount SQLState: null ErrorCode: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10507) timestamp - timestamp
N Campbell created SPARK-10507: -- Summary: timestamp - timestamp Key: SPARK-10507 URL: https://issues.apache.org/jira/browse/SPARK-10507 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.1 Reporter: N Campbell TIMESTAMP - TIMESTAMP in ISO-SQL is an interval type. Hive 0.13 fails with Error: Could not create ResultSet: Required field 'type' is unset! Struct:TPrimitiveTypeEntry(type:null) and SPARK has similar "challenges". select cts - cts from tts Operation: execute Errors: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6214.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6214.0 (TID 21208, sandbox.hortonworks.com): java.lang.RuntimeException: Type TimestampType does not support numeric operations at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.expressions.Subtract.numeric$lzycompute(arithmetic.scala:138) at org.apache.spark.sql.catalyst.expressions.Subtract.numeric(arithmetic.scala:136) at org.apache.spark.sql.catalyst.expressions.Subtract.eval(arithmetic.scala:150) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:113) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:813) at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:813) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498) create table if not exists TTS ( RNUM int , CTS timestamp )TERMINATED BY '\n' STORED AS orc ; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org