[jira] [Resolved] (SPARK-5756) Analyzer should not throw scala.NotImplementedError for illegitimate sql
[ https://issues.apache.org/jira/browse/SPARK-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei resolved SPARK-5756. Resolution: Fixed Analyzer should not throw scala.NotImplementedError for illegitimate sql Key: SPARK-5756 URL: https://issues.apache.org/jira/browse/SPARK-5756 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.0 Reporter: wangfei ```SELECT CAST(x AS STRING) FROM src``` comes a NotImplementedError: CliDriver: scala.NotImplementedError: an implementation is missing at scala.Predef$.$qmark$qmark$qmark(Predef.scala:252) at org.apache.spark.sql.catalyst.expressions.PrettyAttribute.dataType(namedExpressions.scala:221) at org.apache.spark.sql.catalyst.expressions.Cast.resolved$lzycompute(Cast.scala:30) at org.apache.spark.sql.catalyst.expressions.Cast.resolved(Cast.scala:30) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:68) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:68) at scala.collection.LinearSeqOptimized$class.exists(LinearSeqOptimized.scala:80) at scala.collection.immutable.List.exists(List.scala:84) at org.apache.spark.sql.catalyst.expressions.Expression.childrenResolved(Expression.scala:68) at org.apache.spark.sql.catalyst.expressions.Expression.resolved$lzycompute(Expression.scala:56) at org.apache.spark.sql.catalyst.expressions.Expression.resolved(Expression.scala:56) at org.apache.spark.sql.catalyst.expressions.NamedExpression.typeSuffix(namedExpressions.scala:62) at org.apache.spark.sql.catalyst.expressions.Alias.toString(namedExpressions.scala:124) at org.apache.spark.sql.catalyst.expressions.Expression.prettyString(Expression.scala:78) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1$$anonfun$7.apply(Analyzer.scala:83) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1$$anonfun$7.apply(Analyzer.scala:83) at scala.collection.immutable.Stream.map(Stream.scala:376) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:83) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:81) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:204) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:81) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:79) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5756) Analyzer should not throw scala.NotImplementedError for legitimate sql
wangfei created SPARK-5756: -- Summary: Analyzer should not throw scala.NotImplementedError for legitimate sql Key: SPARK-5756 URL: https://issues.apache.org/jira/browse/SPARK-5756 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.0 Reporter: wangfei ```SELECT CAST(x AS STRING) FROM src``` comes a NotImplementedError: CliDriver: scala.NotImplementedError: an implementation is missing at scala.Predef$.$qmark$qmark$qmark(Predef.scala:252) at org.apache.spark.sql.catalyst.expressions.PrettyAttribute.dataType(namedExpressions.scala:221) at org.apache.spark.sql.catalyst.expressions.Cast.resolved$lzycompute(Cast.scala:30) at org.apache.spark.sql.catalyst.expressions.Cast.resolved(Cast.scala:30) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:68) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:68) at scala.collection.LinearSeqOptimized$class.exists(LinearSeqOptimized.scala:80) at scala.collection.immutable.List.exists(List.scala:84) at org.apache.spark.sql.catalyst.expressions.Expression.childrenResolved(Expression.scala:68) at org.apache.spark.sql.catalyst.expressions.Expression.resolved$lzycompute(Expression.scala:56) at org.apache.spark.sql.catalyst.expressions.Expression.resolved(Expression.scala:56) at org.apache.spark.sql.catalyst.expressions.NamedExpression.typeSuffix(namedExpressions.scala:62) at org.apache.spark.sql.catalyst.expressions.Alias.toString(namedExpressions.scala:124) at org.apache.spark.sql.catalyst.expressions.Expression.prettyString(Expression.scala:78) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1$$anonfun$7.apply(Analyzer.scala:83) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1$$anonfun$7.apply(Analyzer.scala:83) at scala.collection.immutable.Stream.map(Stream.scala:376) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:83) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:81) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:204) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:81) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:79) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5756) Analyzer should not throw scala.NotImplementedError for illegitimate sql
[ https://issues.apache.org/jira/browse/SPARK-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-5756: --- Summary: Analyzer should not throw scala.NotImplementedError for illegitimate sql (was: Analyzer should not throw scala.NotImplementedError for legitimate sql) Analyzer should not throw scala.NotImplementedError for illegitimate sql Key: SPARK-5756 URL: https://issues.apache.org/jira/browse/SPARK-5756 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.0 Reporter: wangfei ```SELECT CAST(x AS STRING) FROM src``` comes a NotImplementedError: CliDriver: scala.NotImplementedError: an implementation is missing at scala.Predef$.$qmark$qmark$qmark(Predef.scala:252) at org.apache.spark.sql.catalyst.expressions.PrettyAttribute.dataType(namedExpressions.scala:221) at org.apache.spark.sql.catalyst.expressions.Cast.resolved$lzycompute(Cast.scala:30) at org.apache.spark.sql.catalyst.expressions.Cast.resolved(Cast.scala:30) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:68) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:68) at scala.collection.LinearSeqOptimized$class.exists(LinearSeqOptimized.scala:80) at scala.collection.immutable.List.exists(List.scala:84) at org.apache.spark.sql.catalyst.expressions.Expression.childrenResolved(Expression.scala:68) at org.apache.spark.sql.catalyst.expressions.Expression.resolved$lzycompute(Expression.scala:56) at org.apache.spark.sql.catalyst.expressions.Expression.resolved(Expression.scala:56) at org.apache.spark.sql.catalyst.expressions.NamedExpression.typeSuffix(namedExpressions.scala:62) at org.apache.spark.sql.catalyst.expressions.Alias.toString(namedExpressions.scala:124) at org.apache.spark.sql.catalyst.expressions.Expression.prettyString(Expression.scala:78) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1$$anonfun$7.apply(Analyzer.scala:83) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1$$anonfun$7.apply(Analyzer.scala:83) at scala.collection.immutable.Stream.map(Stream.scala:376) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:83) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:81) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:204) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:81) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:79) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5649) Throw exception when can not apply datatype cast
wangfei created SPARK-5649: -- Summary: Throw exception when can not apply datatype cast Key: SPARK-5649 URL: https://issues.apache.org/jira/browse/SPARK-5649 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.0 Reporter: wangfei Throw exception when can not apply datatypes cast to info user the cast issue in the sqls. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5617) test failure of SQLQuerySuite
wangfei created SPARK-5617: -- Summary: test failure of SQLQuerySuite Key: SPARK-5617 URL: https://issues.apache.org/jira/browse/SPARK-5617 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.0 Reporter: wangfei SQLQuerySuite test failure: [info] - simple select (22 milliseconds) [info] - sorting (722 milliseconds) [info] - external sorting (728 milliseconds) [info] - limit (95 milliseconds) [info] - date row *** FAILED *** (35 milliseconds) [info] Results do not match for query: [info] 'Limit 1 [info]'Project [CAST(2015-01-28, DateType) AS c0#3630] [info] 'UnresolvedRelation [testData], None [info] [info] == Analyzed Plan == [info] Limit 1 [info]Project [CAST(2015-01-28, DateType) AS c0#3630] [info] LogicalRDD [key#0,value#1], MapPartitionsRDD[1] at mapPartitions at ExistingRDD.scala:35 [info] [info] == Physical Plan == [info] Limit 1 [info]Project [16463 AS c0#3630] [info] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] at mapPartitions at ExistingRDD.scala:35 [info] [info] == Results == [info] !== Correct Answer - 1 == == Spark Answer - 1 == [info] ![2015-01-28] [2015-01-27] (QueryTest.scala:77) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:495) [info] at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) [info] at org.scalatest.Assertions$class.fail(Assertions.scala:1328) [info] at org.scalatest.FunSuite.fail(FunSuite.scala:1555) [info] at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:77) [info] at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:95) [info] at org.apache.spark.sql.SQLQuerySuite$$anonfun$23.apply$mcV$sp(SQLQuerySuite.scala:300) [info] at org.apache.spark.sql.SQLQuerySuite$$anonfun$23.apply(SQLQuerySuite.scala:300) [info] at org.apache.spark.sql.SQLQuerySuite$$anonfun$23.apply(SQLQuerySuite.scala:300) [info] at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) [info] at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) [info] at org.scalatest.Suite$class.withFixture(Suite.scala:1122) [info] at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) [info] at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) [info] at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) [info] at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) [info] at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) [info] at org.scalatest.FunSuite.runTest(FunSuite.scala:1555) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) [info] at org.scalatest.SuperEngine$$anonfun$traverseSubNode -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5592) java.net.URISyntaxException when insert data to a partitioned table
wangfei created SPARK-5592: -- Summary: java.net.URISyntaxException when insert data to a partitioned table Key: SPARK-5592 URL: https://issues.apache.org/jira/browse/SPARK-5592 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.0 Reporter: wangfei create table sc as select * from (select '2011-01-11', '2011-01-11+14:18:26' from src tablesample (1 rows) union all select '2011-01-11', '2011-01-11+15:18:26' from src tablesample (1 rows) union all select '2011-01-11', '2011-01-11+16:18:26' from src tablesample (1 rows) ) s; create table sc_part (key string) partitioned by (ts string) stored as rcfile; set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; insert overwrite table sc_part partition(ts) select * from sc; java.net.URISyntaxException: Relative path in absolute URI: ts=2011-01-11+15:18:26 at org.apache.hadoop.fs.Path.initialize(Path.java:206) at org.apache.hadoop.fs.Path.init(Path.java:172) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.org$apache$spark$sql$hive$SparkHiveDynamicPartitionWriterContainer$$newWriter$1(hiveWriterContainers.scala:230) at org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer$$anonfun$getLocalFileWriter$1.apply(hiveWriterContainers.scala:243) at org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer$$anonfun$getLocalFileWriter$1.apply(hiveWriterContainers.scala:243) at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189) at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91) at org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.getLocalFileWriter(hiveWriterContainers.scala:243) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:113) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:105) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:105) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:87) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:87) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: java.net.URISyntaxException: Relative path in absolute URI: ts=2011-01-11+15:18:26 at java.net.URI.checkPath(URI.java:1804) at java.net.URI.init(URI.java:752) at org.apache.hadoop.fs.Path.initialize(Path.java:203) ... 21 more -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5591) NoSuchObjectException for CTAS
wangfei created SPARK-5591: -- Summary: NoSuchObjectException for CTAS Key: SPARK-5591 URL: https://issues.apache.org/jira/browse/SPARK-5591 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.0 Reporter: wangfei NoSuchObjectException for CTAS, create table sc as select * from (select '2011-01-11', '2011-01-11+14:18:26' from src tablesample (1 rows) union all select '2011-01-11', '2011-01-11+15:18:26' from src tablesample (1 rows) union all select '2011-01-11', '2011-01-11+16:18:26' from src tablesample (1 rows) ) s; Get this exception: 15/02/04 19:44:02 ERROR Hive: NoSuchObjectException(message:default.sc table not found) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1560) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) at $Proxy8.get_table(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:997) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89) at $Proxy9.getTable(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:976) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950) at org.apache.spark.sql.hive.HiveMetastoreCatalog.tableExists(HiveMetastoreCatalog.scala:152) at org.apache.spark.sql.hive.HiveContext$$anon$2.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$tableExists(HiveContext.scala:309) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.tableExists(Catalog.scala:121) at org.apache.spark.sql.hive.HiveContext$$anon$2.tableExists(HiveContext.scala:309) at org.apache.spark.sql.hive.execution.CreateTableAsSelect.run(CreateTableAsSelect.scala:63) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:53) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5583) Support unique join in hive context
wangfei created SPARK-5583: -- Summary: Support unique join in hive context Key: SPARK-5583 URL: https://issues.apache.org/jira/browse/SPARK-5583 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.0 Reporter: wangfei Support unique join in hive context: FROM UNIQUEJOIN PRESERVE T1 a (a.key), PRESERVE T2 b (b.key), PRESERVE T3 c (c.key) SELECT a.key, b.key, c.key; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5587) Support change database owner
wangfei created SPARK-5587: -- Summary: Support change database owner Key: SPARK-5587 URL: https://issues.apache.org/jira/browse/SPARK-5587 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.0 Reporter: wangfei Support change database owner : create database db_alter_onr; describe database db_alter_onr; alter database db_alter_onr set owner user user1; describe database db_alter_onr; alter database db_alter_onr set owner role role1; describe database db_alter_onr; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5383) support alias for udfs with multi output columns
[ https://issues.apache.org/jira/browse/SPARK-5383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-5383: --- Summary: support alias for udfs with multi output columns (was: Multi alias names support) support alias for udfs with multi output columns Key: SPARK-5383 URL: https://issues.apache.org/jira/browse/SPARK-5383 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.0 Reporter: wangfei now spark sql does not support multi alias names, The following sql failed in spark-sql: select key as (k1, k2), value as (v1, v2) from src limit 5 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5383) support alias for udfs with multi output columns
[ https://issues.apache.org/jira/browse/SPARK-5383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-5383: --- Description: when a udf output multi columns, now we can not use alias for them in spark-sql, see this flowing sql: select stack(1, key, value, key, value) as (a, b, c, d) from src limit 5; was: now spark sql does not support multi alias names, The following sql failed in spark-sql: select key as (k1, k2), value as (v1, v2) from src limit 5 support alias for udfs with multi output columns Key: SPARK-5383 URL: https://issues.apache.org/jira/browse/SPARK-5383 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.0 Reporter: wangfei when a udf output multi columns, now we can not use alias for them in spark-sql, see this flowing sql: select stack(1, key, value, key, value) as (a, b, c, d) from src limit 5; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5383) Multi alias names support
wangfei created SPARK-5383: -- Summary: Multi alias names support Key: SPARK-5383 URL: https://issues.apache.org/jira/browse/SPARK-5383 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.0 Reporter: wangfei now spark sql does not support multi alias names, The following sql failed in spark-sql: select key as (k1, k2), value as (v1, v2) from src limit 5 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5367) support star expression in udf
wangfei created SPARK-5367: -- Summary: support star expression in udf Key: SPARK-5367 URL: https://issues.apache.org/jira/browse/SPARK-5367 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.0 Reporter: wangfei now spark sql does not support star expression in udf, the following sql will get error ` select concat(*) from src ` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5367) support star expression in udf
[ https://issues.apache.org/jira/browse/SPARK-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-5367: --- Description: now spark sql does not support star expression in udf, the following sql will get error ``` select concat( * ) from src ``` was: now spark sql does not support star expression in udf, the following sql will get error ` select concat(*) from src ` support star expression in udf -- Key: SPARK-5367 URL: https://issues.apache.org/jira/browse/SPARK-5367 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.0 Reporter: wangfei now spark sql does not support star expression in udf, the following sql will get error ``` select concat( * ) from src ``` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5373) literal in agg grouping expressioons leads to incorrect result
[ https://issues.apache.org/jira/browse/SPARK-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-5373: --- Description: select key, count( * ) from src group by key, 1 will get the wrong answer! (was: select key, count(*) from src group by key, 1 will get the wrong answer!) literal in agg grouping expressioons leads to incorrect result --- Key: SPARK-5373 URL: https://issues.apache.org/jira/browse/SPARK-5373 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.0 Reporter: wangfei select key, count( * ) from src group by key, 1 will get the wrong answer! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5373) literal in agg grouping expressioons leads to incorrect result
wangfei created SPARK-5373: -- Summary: literal in agg grouping expressioons leads to incorrect result Key: SPARK-5373 URL: https://issues.apache.org/jira/browse/SPARK-5373 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.0 Reporter: wangfei select key, count(*) from src group by key, 1 will get the wrong answer! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5285) Removed GroupExpression in catalyst
wangfei created SPARK-5285: -- Summary: Removed GroupExpression in catalyst Key: SPARK-5285 URL: https://issues.apache.org/jira/browse/SPARK-5285 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.0 Reporter: wangfei Removed GroupExpression in catalyst -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5251) Using `tableIdentifier` in hive metastore
[ https://issues.apache.org/jira/browse/SPARK-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-5251: --- Target Version/s: 1.3.0 Using `tableIdentifier` in hive metastore -- Key: SPARK-5251 URL: https://issues.apache.org/jira/browse/SPARK-5251 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.0 Reporter: wangfei Using `tableIdentifier` in hive metastore -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5251) Using `tableIdentifier` in hive metastore
[ https://issues.apache.org/jira/browse/SPARK-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-5251: --- Target Version/s: (was: 1.3.0) Using `tableIdentifier` in hive metastore -- Key: SPARK-5251 URL: https://issues.apache.org/jira/browse/SPARK-5251 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.0 Reporter: wangfei Using `tableIdentifier` in hive metastore -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5251) Using `tableIdentifier` in hive metastore
wangfei created SPARK-5251: -- Summary: Using `tableIdentifier` in hive metastore Key: SPARK-5251 URL: https://issues.apache.org/jira/browse/SPARK-5251 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.0 Reporter: wangfei Using `tableIdentifier` in hive metastore -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5240) Adding `createDataSourceTable` interface to Catalog
wangfei created SPARK-5240: -- Summary: Adding `createDataSourceTable` interface to Catalog Key: SPARK-5240 URL: https://issues.apache.org/jira/browse/SPARK-5240 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.0 Reporter: wangfei Adding `createDataSourceTable` interface to Catalog. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4861) Refactory command in spark sql
[ https://issues.apache.org/jira/browse/SPARK-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272862#comment-14272862 ] wangfei commented on SPARK-4861: [~yhuai]of course if possible, but i have not find a way to remove it since in HiveCommandStrategy we need to distinguish hive metastore table and temporary table, so now still keep HiveCommandStrategy there. any idea here? Refactory command in spark sql -- Key: SPARK-4861 URL: https://issues.apache.org/jira/browse/SPARK-4861 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.1 Reporter: wangfei Fix For: 1.3.0 Fix a todo in spark sql: remove ```Command``` and use ```RunnableCommand``` instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4572) [SQL] spark-sql exits while encountered an error
[ https://issues.apache.org/jira/browse/SPARK-4572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270960#comment-14270960 ] wangfei commented on SPARK-4572: which version you get this error? it should be fixed already. [SQL] spark-sql exits while encountered an error - Key: SPARK-4572 URL: https://issues.apache.org/jira/browse/SPARK-4572 Project: Spark Issue Type: Bug Components: SQL Reporter: Fuqing Yang Original Estimate: 1h Remaining Estimate: 1h while using spark-sql, found it usually exits while sql failed, and you need to rerun spark-sql . This is not convenient , we should catch the exceptions to change its default action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4574) Adding support for defining schema in foreign DDL commands.
[ https://issues.apache.org/jira/browse/SPARK-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270967#comment-14270967 ] wangfei commented on SPARK-4574: [~pwendell] get it, thanks. Adding support for defining schema in foreign DDL commands. --- Key: SPARK-4574 URL: https://issues.apache.org/jira/browse/SPARK-4574 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Adding support for defining schema in foreign DDL commands. Now foreign DDL support commands like: CREATE TEMPORARY TABLE avroTable USING org.apache.spark.sql.avro OPTIONS (path ../hive/src/test/resources/data/files/episodes.avro) Let user can define schema instead of infer from file, so we can support ddl command as follows: CREATE TEMPORARY TABLE avroTable(a int, b string) USING org.apache.spark.sql.avro OPTIONS (path ../hive/src/test/resources/data/files/episodes.avro) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1442) Add Window function support
[ https://issues.apache.org/jira/browse/SPARK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270973#comment-14270973 ] wangfei commented on SPARK-1442: why the two PR both closed? Add Window function support --- Key: SPARK-1442 URL: https://issues.apache.org/jira/browse/SPARK-1442 Project: Spark Issue Type: New Feature Components: SQL Reporter: Chengxiang Li Attachments: Window Function.pdf similiar to Hive, add window function support for catalyst. https://issues.apache.org/jira/browse/HIVE-4197 https://issues.apache.org/jira/browse/HIVE-896 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-4673) Optimizing limit using coalesce
[ https://issues.apache.org/jira/browse/SPARK-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei closed SPARK-4673. -- Resolution: Fixed since coalesce (1) will lead to run with a single thread, not always speed up limit so close this one. Optimizing limit using coalesce --- Key: SPARK-4673 URL: https://issues.apache.org/jira/browse/SPARK-4673 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.3.0 Now limit used ShuffledRDD and HashPartitioner to repartition 1 which leads to shuffle. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-5000) Alias support string literal in spark sql
[ https://issues.apache.org/jira/browse/SPARK-5000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei closed SPARK-5000. -- Resolution: Fixed Alias support string literal in spark sql - Key: SPARK-5000 URL: https://issues.apache.org/jira/browse/SPARK-5000 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.0 Reporter: wangfei Alias support string literal in spark sql parser: select key , value as 'vvv' from tableA; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5000) Alias support string literal in spark sql
[ https://issues.apache.org/jira/browse/SPARK-5000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270718#comment-14270718 ] wangfei commented on SPARK-5000: backticks can do this, so close this one. Alias support string literal in spark sql - Key: SPARK-5000 URL: https://issues.apache.org/jira/browse/SPARK-5000 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.0 Reporter: wangfei Alias support string literal in spark sql parser: select key , value as 'vvv' from tableA; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5165) Add support for rollup and cube in sqlcontext
wangfei created SPARK-5165: -- Summary: Add support for rollup and cube in sqlcontext Key: SPARK-5165 URL: https://issues.apache.org/jira/browse/SPARK-5165 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 1.2.0 Reporter: wangfei Add support for rollup and cube in sqlcontext -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5029) Enable from follow multiple brackets
wangfei created SPARK-5029: -- Summary: Enable from follow multiple brackets Key: SPARK-5029 URL: https://issues.apache.org/jira/browse/SPARK-5029 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.0 Reporter: wangfei Enable from follow multiple brackets: such as : select key from ((select * from testData limit 1) union all (select * from testData limit 1)) x limit 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5000) Alias support string literal in spark sql parser
wangfei created SPARK-5000: -- Summary: Alias support string literal in spark sql parser Key: SPARK-5000 URL: https://issues.apache.org/jira/browse/SPARK-5000 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.0 Reporter: wangfei Alias support string literal in spark sql parser: select key , value as 'vvv' from tableA; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5000) Alias support string literal in spark sql
[ https://issues.apache.org/jira/browse/SPARK-5000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-5000: --- Summary: Alias support string literal in spark sql (was: Alias support string literal in spark sql parser) Alias support string literal in spark sql - Key: SPARK-5000 URL: https://issues.apache.org/jira/browse/SPARK-5000 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.0 Reporter: wangfei Alias support string literal in spark sql parser: select key , value as 'vvv' from tableA; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4984) add a pop-up containing the full for job description when it is very long
wangfei created SPARK-4984: -- Summary: add a pop-up containing the full for job description when it is very long Key: SPARK-4984 URL: https://issues.apache.org/jira/browse/SPARK-4984 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.2.0 Reporter: wangfei add a pop-up containing the full for job description when it is very long -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4975) HiveInspectorSuite test failure
wangfei created SPARK-4975: -- Summary: HiveInspectorSuite test failure Key: SPARK-4975 URL: https://issues.apache.org/jira/browse/SPARK-4975 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.0 Reporter: wangfei HiveInspectorSuite test failure: [info] - wrap / unwrap null, constant null and writables *** FAILED *** (21 milliseconds) [info] 1 did not equal 0 (HiveInspectorSuite.scala:136) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) [info] at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) [info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) [info] at org.apache.spark.sql.hive.HiveInspectorSuite.checkValues(HiveInspectorSuite.scala:136) [info] at org.apache.spark.sql.hive.HiveInspectorSuite$$anonfun$checkValues$1.apply(HiveInspectorSuite.scala:124) [info] at org.apache.spark.sql.hive.HiveInspectorSuite$$anonfun$checkValues$1.apply(HiveInspectorSuite.scala:123) [info] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) [info] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) [info] at scala.collection.immutable.List.foreach(List.scala:318) [info] at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) [info] at scala.collection.AbstractTraversable.map(Traversable.scala:105) [info] at org.apache.spark.sql.hive.HiveInspectorSuite.checkValues(HiveInspectorSuite.scala:123) [info] at org.apache.spark.sql.hive.HiveInspectorSuite$$anonfun$3.apply$mcV$sp(HiveInspectorSuite.scala:163) [info] at org.apache.spark.sql.hive.HiveInspectorSuite$$anonfun$3.apply(HiveInspectorSuite.scala:148) [info] at org.apache.spark.sql.hive.HiveInspectorSuite$$anonfun$3.apply(HiveInspectorSuite.scala:148) [info] at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) [info] at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) [info] at org.scalatest.Suite$class.withFixture(Suite.scala:1122) [info] at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) [info] at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) [info] at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) [info] at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) [info] at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) [info] at org.scalatest.FunSuite.runTest(FunSuite.scala:1555) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) [info] at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) [info] at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) [info] at scala.collection.immutable.List.foreach(List.scala:318) [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) [info] at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) [info] at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) [info] at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) [info] at org.scalatest.Suite$class.run(Suite.scala:1424) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4935) When hive.cli.print.header configured, spark-sql aborted if passed in a invalid sql
wangfei created SPARK-4935: -- Summary: When hive.cli.print.header configured, spark-sql aborted if passed in a invalid sql Key: SPARK-4935 URL: https://issues.apache.org/jira/browse/SPARK-4935 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0, 1.2.0 Reporter: wangfei Fix For: 1.3.0 When hive.cli.print.header configured, spark-sql aborted if passed in a invalid sql -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4937) Adding optimization to simplify the filter condition
wangfei created SPARK-4937: -- Summary: Adding optimization to simplify the filter condition Key: SPARK-4937 URL: https://issues.apache.org/jira/browse/SPARK-4937 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.3.0 Adding optimization to simplify the filter condition: 1 condition that can get the boolean result such as: a 3 a 5 False a 1 || a 0 True 2 Simplify And, Or condition, such as the sql (one of hive-testbench ): select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#32' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity = 7 and l_quantity = 7 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#35' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity = 15 and l_quantity = 15 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#24' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity = 26 and l_quantity = 26 + 10 and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ); Before optimized it is a CartesianProduct, in my locally test this sql hang and can not get result, after optimization the CartesianProduct replaced by ShuffledHashJoin, which only need 20+ seconds to run this sql. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4938) Adding optimization to simplify the filter condition
wangfei created SPARK-4938: -- Summary: Adding optimization to simplify the filter condition Key: SPARK-4938 URL: https://issues.apache.org/jira/browse/SPARK-4938 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.3.0 Adding optimization to simplify the filter condition: 1 condition that can get the boolean result such as: a 3 a 5 False a 1 || a 0 True 2 Simplify And, Or condition, such as the sql (one of hive-testbench ): select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#32' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity = 7 and l_quantity = 7 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#35' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity = 15 and l_quantity = 15 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#24' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity = 26 and l_quantity = 26 + 10 and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ); Before optimized it is a CartesianProduct, in my locally test this sql hang and can not get result, after optimization the CartesianProduct replaced by ShuffledHashJoin, which only need 20+ seconds to run this sql. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4938) Adding optimization to simplify the filter condition
[ https://issues.apache.org/jira/browse/SPARK-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14257042#comment-14257042 ] wangfei commented on SPARK-4938: Duplicate Adding optimization to simplify the filter condition Key: SPARK-4938 URL: https://issues.apache.org/jira/browse/SPARK-4938 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.3.0 Adding optimization to simplify the filter condition: 1 condition that can get the boolean result such as: a 3 a 5 False a 1 || a 0 True 2 Simplify And, Or condition, such as the sql (one of hive-testbench ): select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#32' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity = 7 and l_quantity = 7 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#35' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity = 15 and l_quantity = 15 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#24' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity = 26 and l_quantity = 26 + 10 and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ); Before optimized it is a CartesianProduct, in my locally test this sql hang and can not get result, after optimization the CartesianProduct replaced by ShuffledHashJoin, which only need 20+ seconds to run this sql. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4861) Refactory command in spark sql
wangfei created SPARK-4861: -- Summary: Refactory command in spark sql Key: SPARK-4861 URL: https://issues.apache.org/jira/browse/SPARK-4861 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.1 Reporter: wangfei Fix For: 1.3.0 Fix a todo in spark sql: remove ```Command``` and use ```RunnableCommand``` instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4845) Adding a parallelismRatio to control the partitions num of shuffledRDD
wangfei created SPARK-4845: -- Summary: Adding a parallelismRatio to control the partitions num of shuffledRDD Key: SPARK-4845 URL: https://issues.apache.org/jira/browse/SPARK-4845 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.3.0 Adding parallelismRatio to control the partitions num of shuffledRDD, the rule is: Math.max(1, parallelismRatio * number of partitions of the largest upstream RDD) The ratio is 1.0 by default to make it compatible with the old version. When we have a good experience on it, we can change this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4695) Get result using executeCollect in spark sql
wangfei created SPARK-4695: -- Summary: Get result using executeCollect in spark sql Key: SPARK-4695 URL: https://issues.apache.org/jira/browse/SPARK-4695 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.3.0 We should use executeCollect to collect the result, because executeCollect is a custom implementation of collect in spark sql which better than rdd's collect -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4695) Get result using executeCollect in spark sql
[ https://issues.apache.org/jira/browse/SPARK-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-4695: --- Issue Type: Improvement (was: Bug) Get result using executeCollect in spark sql -- Key: SPARK-4695 URL: https://issues.apache.org/jira/browse/SPARK-4695 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.3.0 We should use executeCollect to collect the result, because executeCollect is a custom implementation of collect in spark sql which better than rdd's collect -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4673) Optimizing limit using coalesce
wangfei created SPARK-4673: -- Summary: Optimizing limit using coalesce Key: SPARK-4673 URL: https://issues.apache.org/jira/browse/SPARK-4673 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.3.0 Now limit used ShuffledRDD and HashPartitioner to repartition 1 which leads to shuffle. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4618) Make foreign DDL commands options case-insensitive
wangfei created SPARK-4618: -- Summary: Make foreign DDL commands options case-insensitive Key: SPARK-4618 URL: https://issues.apache.org/jira/browse/SPARK-4618 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.3.0 Make foreign DDL commands options case-insensitive So flowing cmd worked ``` create temporary table normal_parquet USING org.apache.spark.sql.parquet OPTIONS ( PATH '/xxx/data' ) ``` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4574) Adding support for defining schema in foreign DDL commands.
wangfei created SPARK-4574: -- Summary: Adding support for defining schema in foreign DDL commands. Key: SPARK-4574 URL: https://issues.apache.org/jira/browse/SPARK-4574 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.2.0 Adding support for defining schema in foreign DDL commands. Now foreign DDL support commands like: CREATE TEMPORARY TABLE avroTable USING org.apache.spark.sql.avro OPTIONS (path ../hive/src/test/resources/data/files/episodes.avro) Let user can define schema instead of infer from file, so we can support ddl command as follows: CREATE TEMPORARY TABLE avroTable(a int, b string) USING org.apache.spark.sql.avro OPTIONS (path ../hive/src/test/resources/data/files/episodes.avro) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4552) query for empty parquet table in spark sql hive get IllegalArgumentException
wangfei created SPARK-4552: -- Summary: query for empty parquet table in spark sql hive get IllegalArgumentException Key: SPARK-4552 URL: https://issues.apache.org/jira/browse/SPARK-4552 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.2.0 run create table test_parquet(key int, value string) stored as parquet; select * from test_parquet; get error as follow java.lang.IllegalArgumentException: Could not find Parquet metadata at path file:/user/hive/warehouse/test_parquet at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.parquet.ParquetTypesConverter$.readMetaData(ParquetTypes.sc -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4553) query for parquet table with string fields in spark sql hive get binary result
wangfei created SPARK-4553: -- Summary: query for parquet table with string fields in spark sql hive get binary result Key: SPARK-4553 URL: https://issues.apache.org/jira/browse/SPARK-4553 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.2.0 run create table test_parquet(key int, value string) stored as parquet; insert into table test_parquet select * from src; select * from test_parquet; get result as follow ... 282 [B@38fda3b 138 [B@1407a24 238 [B@12de6fb 419 [B@6c97695 15 [B@4885067 118 [B@156a8d3 72 [B@65d20dd 90 [B@4c18906 307 [B@60b24cc 19 [B@59cf51b 435 [B@39fdf37 10 [B@4f799d7 277 [B@3950951 273 [B@596bf4b 306 [B@3e91557 224 [B@3781d61 309 [B@2d0d128 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4554) Set fair scheduler pool for JDBC client session in hive 13
wangfei created SPARK-4554: -- Summary: Set fair scheduler pool for JDBC client session in hive 13 Key: SPARK-4554 URL: https://issues.apache.org/jira/browse/SPARK-4554 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.2.0 Now hive 13 shim does not support to set fair scheduler pool -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4559) Adding support for ucase and lcase
wangfei created SPARK-4559: -- Summary: Adding support for ucase and lcase Key: SPARK-4559 URL: https://issues.apache.org/jira/browse/SPARK-4559 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.2.0 Adding support for ucase and lcase in spark sql -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4449) specify port range in spark
wangfei created SPARK-4449: -- Summary: specify port range in spark Key: SPARK-4449 URL: https://issues.apache.org/jira/browse/SPARK-4449 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.2.0 In some case, we need specify port range used in spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4443) Statistics bug for external table in spark sql hive
wangfei created SPARK-4443: -- Summary: Statistics bug for external table in spark sql hive Key: SPARK-4443 URL: https://issues.apache.org/jira/browse/SPARK-4443 Project: Spark Issue Type: Bug Components: SQL Reporter: wangfei -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4443) Statistics bug for external table in spark sql hive
[ https://issues.apache.org/jira/browse/SPARK-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-4443: --- Description: When table is external, the `totalSize` is always zero, which will influence join strategy(always use broadcast join for external table) (was: When table is external, `totalSize` is always zero, which will influence join strategy(always use broadcast join for external table)) Statistics bug for external table in spark sql hive --- Key: SPARK-4443 URL: https://issues.apache.org/jira/browse/SPARK-4443 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.2.0 When table is external, the `totalSize` is always zero, which will influence join strategy(always use broadcast join for external table) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4443) Statistics bug for external table in spark sql hive
[ https://issues.apache.org/jira/browse/SPARK-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-4443: --- Description: When table is external, `totalSize` is always zero, which will influence join strategy(always use broadcast join for external table) Target Version/s: 1.2.0 Affects Version/s: 1.1.0 Fix Version/s: 1.2.0 Statistics bug for external table in spark sql hive --- Key: SPARK-4443 URL: https://issues.apache.org/jira/browse/SPARK-4443 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.2.0 When table is external, `totalSize` is always zero, which will influence join strategy(always use broadcast join for external table) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4292) incorrect result set in JDBC/ODBC
wangfei created SPARK-4292: -- Summary: incorrect result set in JDBC/ODBC Key: SPARK-4292 URL: https://issues.apache.org/jira/browse/SPARK-4292 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.2.0 select * from src, get result as follows: | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4261) make right version info for beeline
[ https://issues.apache.org/jira/browse/SPARK-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-4261: --- Description: Running with spark sql jdbc/odbc, the output will be JackydeMacBook-Pro:spark1 jackylee$ bin/beeline Spark assembly has been built with Hive, including Datanucleus jars on classpath Beeline version ??? by Apache Hive we should make right version info for beeline make right version info for beeline --- Key: SPARK-4261 URL: https://issues.apache.org/jira/browse/SPARK-4261 Project: Spark Issue Type: Bug Components: Build, SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.2.0 Running with spark sql jdbc/odbc, the output will be JackydeMacBook-Pro:spark1 jackylee$ bin/beeline Spark assembly has been built with Hive, including Datanucleus jars on classpath Beeline version ??? by Apache Hive we should make right version info for beeline -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4261) make right version info for beeline
wangfei created SPARK-4261: -- Summary: make right version info for beeline Key: SPARK-4261 URL: https://issues.apache.org/jira/browse/SPARK-4261 Project: Spark Issue Type: Bug Components: Build, SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.2.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4237) Generate right Manifest File for maven building
[ https://issues.apache.org/jira/browse/SPARK-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-4237: --- Description: Now build spark with maven produce the Manifest File of guava, we should make right Manifest File for Maven building was: Running with spark sql jdbc/odbc, the output will be JackydeMacBook-Pro:spark1 jackylee$ bin/beeline Spark assembly has been built with Hive, including Datanucleus jars on classpath Beeline version ??? by Apache Hive we should add Manifest File for Maven building Generate right Manifest File for maven building --- Key: SPARK-4237 URL: https://issues.apache.org/jira/browse/SPARK-4237 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.2.0 Now build spark with maven produce the Manifest File of guava, we should make right Manifest File for Maven building -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4225) jdbc/odbc error when using maven build spark
wangfei created SPARK-4225: -- Summary: jdbc/odbc error when using maven build spark Key: SPARK-4225 URL: https://issues.apache.org/jira/browse/SPARK-4225 Project: Spark Issue Type: Bug Components: Build, SQL Affects Versions: 1.1.0 Reporter: wangfei Priority: Blocker Fix For: 1.2.0 use command as follows to build spark mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.1 -Phive -DskipTests clean package then use beeline to connect to thrift server ,get this error: 14/11/04 11:30:31 INFO ObjectStore: Initialized ObjectStore 14/11/04 11:30:31 INFO AbstractService: Service:ThriftBinaryCLIService is started. 14/11/04 11:30:31 INFO AbstractService: Service:HiveServer2 is started. 14/11/04 11:30:31 INFO HiveThriftServer2: HiveThriftServer2 started 14/11/04 11:30:31 INFO ThriftCLIService: ThriftBinaryCLIService listening on 0.0.0.0/0.0.0.0:1 14/11/04 11:33:26 INFO ThriftCLIService: Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V6 14/11/04 11:33:26 INFO HiveMetaStore: No user is added in admin role, since config is empty 14/11/04 11:33:26 INFO SessionState: No Tez session required at this point. hive.execution.engine=mr. 14/11/04 11:33:26 INFO SessionState: No Tez session required at this point. hive.execution.engine=mr. 14/11/04 11:33:26 ERROR TThreadPoolServer: Thrift error occurred during processing of message. org.apache.thrift.protocol.TProtocolException: Cannot write a TUnion with no set value! at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:240) at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:213) at org.apache.thrift.TUnion.write(TUnion.java:152) at org.apache.hive.service.cli.thrift.TGetInfoResp$TGetInfoRespStandardScheme.write(TGetInfoResp.java:456) at org.apache.hive.service.cli.thrift.TGetInfoResp$TGetInfoRespStandardScheme.write(TGetInfoResp.java:406) at org.apache.hive.service.cli.thrift.TGetInfoResp.write(TGetInfoResp.java:341) at org.apache.hive.service.cli.thrift.TCLIService$GetInfo_result$GetInfo_resultStandardScheme.write(TCLIService.java:3754) at org.apache.hive.service.cli.thrift.TCLIService$GetInfo_result$GetInfo_resultStandardScheme.write(TCLIService.java:3718) at org.apache.hive.service.cli.thrift.TCLIService$GetInfo_result.write(TCLIService.java:3669) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4225) jdbc/odbc error when using maven build spark
[ https://issues.apache.org/jira/browse/SPARK-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196693#comment-14196693 ] wangfei commented on SPARK-4225: it seems there is some difference between using sbt and maven. jdbc/odbc error when using maven build spark Key: SPARK-4225 URL: https://issues.apache.org/jira/browse/SPARK-4225 Project: Spark Issue Type: Bug Components: Build, SQL Affects Versions: 1.1.0 Reporter: wangfei Priority: Blocker Fix For: 1.2.0 use command as follows to build spark mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.1 -Phive -DskipTests clean package then use beeline to connect to thrift server ,get this error: 14/11/04 11:30:31 INFO ObjectStore: Initialized ObjectStore 14/11/04 11:30:31 INFO AbstractService: Service:ThriftBinaryCLIService is started. 14/11/04 11:30:31 INFO AbstractService: Service:HiveServer2 is started. 14/11/04 11:30:31 INFO HiveThriftServer2: HiveThriftServer2 started 14/11/04 11:30:31 INFO ThriftCLIService: ThriftBinaryCLIService listening on 0.0.0.0/0.0.0.0:1 14/11/04 11:33:26 INFO ThriftCLIService: Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V6 14/11/04 11:33:26 INFO HiveMetaStore: No user is added in admin role, since config is empty 14/11/04 11:33:26 INFO SessionState: No Tez session required at this point. hive.execution.engine=mr. 14/11/04 11:33:26 INFO SessionState: No Tez session required at this point. hive.execution.engine=mr. 14/11/04 11:33:26 ERROR TThreadPoolServer: Thrift error occurred during processing of message. org.apache.thrift.protocol.TProtocolException: Cannot write a TUnion with no set value! at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:240) at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:213) at org.apache.thrift.TUnion.write(TUnion.java:152) at org.apache.hive.service.cli.thrift.TGetInfoResp$TGetInfoRespStandardScheme.write(TGetInfoResp.java:456) at org.apache.hive.service.cli.thrift.TGetInfoResp$TGetInfoRespStandardScheme.write(TGetInfoResp.java:406) at org.apache.hive.service.cli.thrift.TGetInfoResp.write(TGetInfoResp.java:341) at org.apache.hive.service.cli.thrift.TCLIService$GetInfo_result$GetInfo_resultStandardScheme.write(TCLIService.java:3754) at org.apache.hive.service.cli.thrift.TCLIService$GetInfo_result$GetInfo_resultStandardScheme.write(TCLIService.java:3718) at org.apache.hive.service.cli.thrift.TCLIService$GetInfo_result.write(TCLIService.java:3669) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-4225) jdbc/odbc error when using maven build spark
[ https://issues.apache.org/jira/browse/SPARK-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196693#comment-14196693 ] wangfei edited comment on SPARK-4225 at 11/4/14 7:46 PM: - it seems there is some difference between using sbt and maven when building spark. was (Author: scwf): it seems there is some difference between using sbt and maven. jdbc/odbc error when using maven build spark Key: SPARK-4225 URL: https://issues.apache.org/jira/browse/SPARK-4225 Project: Spark Issue Type: Bug Components: Build, SQL Affects Versions: 1.1.0 Reporter: wangfei Priority: Blocker Fix For: 1.2.0 use command as follows to build spark mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.1 -Phive -DskipTests clean package then use beeline to connect to thrift server ,get this error: 14/11/04 11:30:31 INFO ObjectStore: Initialized ObjectStore 14/11/04 11:30:31 INFO AbstractService: Service:ThriftBinaryCLIService is started. 14/11/04 11:30:31 INFO AbstractService: Service:HiveServer2 is started. 14/11/04 11:30:31 INFO HiveThriftServer2: HiveThriftServer2 started 14/11/04 11:30:31 INFO ThriftCLIService: ThriftBinaryCLIService listening on 0.0.0.0/0.0.0.0:1 14/11/04 11:33:26 INFO ThriftCLIService: Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V6 14/11/04 11:33:26 INFO HiveMetaStore: No user is added in admin role, since config is empty 14/11/04 11:33:26 INFO SessionState: No Tez session required at this point. hive.execution.engine=mr. 14/11/04 11:33:26 INFO SessionState: No Tez session required at this point. hive.execution.engine=mr. 14/11/04 11:33:26 ERROR TThreadPoolServer: Thrift error occurred during processing of message. org.apache.thrift.protocol.TProtocolException: Cannot write a TUnion with no set value! at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:240) at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:213) at org.apache.thrift.TUnion.write(TUnion.java:152) at org.apache.hive.service.cli.thrift.TGetInfoResp$TGetInfoRespStandardScheme.write(TGetInfoResp.java:456) at org.apache.hive.service.cli.thrift.TGetInfoResp$TGetInfoRespStandardScheme.write(TGetInfoResp.java:406) at org.apache.hive.service.cli.thrift.TGetInfoResp.write(TGetInfoResp.java:341) at org.apache.hive.service.cli.thrift.TCLIService$GetInfo_result$GetInfo_resultStandardScheme.write(TCLIService.java:3754) at org.apache.hive.service.cli.thrift.TCLIService$GetInfo_result$GetInfo_resultStandardScheme.write(TCLIService.java:3718) at org.apache.hive.service.cli.thrift.TCLIService$GetInfo_result.write(TCLIService.java:3669) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4237) add Manifest File for Maven building
wangfei created SPARK-4237: -- Summary: add Manifest File for Maven building Key: SPARK-4237 URL: https://issues.apache.org/jira/browse/SPARK-4237 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.2.0 Running with spark sql jdbc/odbc, the output will be JackydeMacBook-Pro:spark1 jackylee$ bin/beeline Spark assembly has been built with Hive, including Datanucleus jars on classpath Beeline version ??? by Apache Hive we should add Manifest File for Maven building -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4237) add Manifest File for Maven building
[ https://issues.apache.org/jira/browse/SPARK-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197715#comment-14197715 ] wangfei commented on SPARK-4237: The title is not correct, should be generate right Manifest File for maven building. Now the MF is use the guava's which leads to the issues as my PR described. add Manifest File for Maven building Key: SPARK-4237 URL: https://issues.apache.org/jira/browse/SPARK-4237 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.2.0 Running with spark sql jdbc/odbc, the output will be JackydeMacBook-Pro:spark1 jackylee$ bin/beeline Spark assembly has been built with Hive, including Datanucleus jars on classpath Beeline version ??? by Apache Hive we should add Manifest File for Maven building -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4237) Generate right Manifest File for maven building
[ https://issues.apache.org/jira/browse/SPARK-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-4237: --- Summary: Generate right Manifest File for maven building (was: add Manifest File for Maven building) Generate right Manifest File for maven building --- Key: SPARK-4237 URL: https://issues.apache.org/jira/browse/SPARK-4237 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.2.0 Running with spark sql jdbc/odbc, the output will be JackydeMacBook-Pro:spark1 jackylee$ bin/beeline Spark assembly has been built with Hive, including Datanucleus jars on classpath Beeline version ??? by Apache Hive we should add Manifest File for Maven building -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4191) move wrapperFor to HiveInspectors to reuse them
wangfei created SPARK-4191: -- Summary: move wrapperFor to HiveInspectors to reuse them Key: SPARK-4191 URL: https://issues.apache.org/jira/browse/SPARK-4191 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.2.0 move wrapperFor in InsertIntoHiveTable to HiveInspectors to reuse them, this method can be reused when writing date with ObjectInspector(such as orc support) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4191) move wrapperFor to HiveInspectors to reuse them
[ https://issues.apache.org/jira/browse/SPARK-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-4191: --- Issue Type: Improvement (was: Bug) move wrapperFor to HiveInspectors to reuse them --- Key: SPARK-4191 URL: https://issues.apache.org/jira/browse/SPARK-4191 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.2.0 move wrapperFor in InsertIntoHiveTable to HiveInspectors to reuse them, this method can be reused when writing date with ObjectInspector(such as orc support) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3652) upgrade spark sql hive version to 0.13.1
[ https://issues.apache.org/jira/browse/SPARK-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei resolved SPARK-3652. Resolution: Fixed upgrade spark sql hive version to 0.13.1 Key: SPARK-3652 URL: https://issues.apache.org/jira/browse/SPARK-3652 Project: Spark Issue Type: Dependency upgrade Components: SQL Affects Versions: 1.1.0 Reporter: wangfei now spark sql hive version is 0.12.0, compile with 0.13.1 will get errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3322) ConnectionManager logs an error when the application ends
[ https://issues.apache.org/jira/browse/SPARK-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192701#comment-14192701 ] wangfei commented on SPARK-3322: yes, to close this. ConnectionManager logs an error when the application ends - Key: SPARK-3322 URL: https://issues.apache.org/jira/browse/SPARK-3322 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Reporter: wangfei Athough it does not influence the result, it always would log an error from ConnectionManager. Sometimes only log ConnectionManagerId(vm2,40992) not found and sometimes it also will log CancelledKeyException The log Info as fellow: 14/08/29 16:54:53 ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(vm2,40992) not found 14/08/29 16:54:53 INFO ConnectionManager: key already cancelled ? sun.nio.ch.SelectionKeyImpl@457245f9 java.nio.channels.CancelledKeyException at org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:386) at org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-2460) Optimize SparkContext.hadoopFile api
[ https://issues.apache.org/jira/browse/SPARK-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei closed SPARK-2460. -- Resolution: Fixed Optimize SparkContext.hadoopFile api - Key: SPARK-2460 URL: https://issues.apache.org/jira/browse/SPARK-2460 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: wangfei Fix For: 1.2.0 1 use SparkContext.hadoopRDD() instead of instantiate HadoopRDD directly in SparkContext.hadoopFile 2 broadcast jobConf in HadoopRDD, not Configuration -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4177) update build doc for JDBC/CLI already supporting hive 13
wangfei created SPARK-4177: -- Summary: update build doc for JDBC/CLI already supporting hive 13 Key: SPARK-4177 URL: https://issues.apache.org/jira/browse/SPARK-4177 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.2.0 fix build doc since already support hive 13 in jdbc/cli -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib
[ https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178183#comment-14178183 ] wangfei commented on SPARK-4001: Thanks Sean Owen for explaining! Frequent itemset algorithm works by scanning the input data set, there is no probabilistic model in nature. To answer Xiangrui Meng’s earlier questions: 1. These algorithm is used for finding major patterns / association rules in a data set. For a real use case, some analytic applications in telecom domain use them to find subscriber behavior from the data set combining service record, network traffic record, and demographic data. Please refer to this Chinese article for example: http://www.ss-lw.com/wxxw-361.html And, sometimes we use frequent itemset algorithm for preparing features input to other algorithm which selects feature and do other ML task like training a classifier, like this paper: http://dl.acm.org/citation.cfm?id=1401922, 2. Since Apriori is a basic algorithm for frequent itemset mining, I am not aware of any parallel implementation for it. But I think the algorithm fits Spark’s data parallel model since it only need to scan the input data set. And for FP-Growth, I do know there is a Parallel FP-Growth from Haoyuan Li: http://dl.acm.org/citation.cfm?id=1454027 . I think I probably will refer to this paper to implement FP-Growth in Spark 3. The Apriori computation complexity is about O(N*k) where N is the number of item in input data and k is the depth of the frequent item tree to search. FP-Grwoth complexity is about O(N), it is more efficient comparing to Apriori. For space efficiency, FP-growth is also more efficient than Apriori. But in case of smaller data and if frequent itemset is more, Apriori is more efficient. This is because FP-Growth need to construct a FP Tree out of the input data set, and it needs some time. And another advantage of Apriori is that it can output association rules while FP-Growth can not. Although these two algorithms are basic algo (FP-Growth is more complex), I think it will be handy if mllib can include them since there is no frequent itemset mining algo in Spark yet, and especially in distributed environment. Please suggest how to handle this issue. Thanks a lot. Add Apriori algorithm to Spark MLlib Key: SPARK-4001 URL: https://issues.apache.org/jira/browse/SPARK-4001 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Jacky Li Assignee: Jacky Li Apriori is the classic algorithm for frequent item set mining in a transactional data set. It will be useful if Apriori algorithm is added to MLLib in Spark -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-4001) Add Apriori algorithm to Spark MLlib
[ https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178183#comment-14178183 ] wangfei edited comment on SPARK-4001 at 10/21/14 9:38 AM: -- . was (Author: scwf): Thanks Sean Owen for explaining! Frequent itemset algorithm works by scanning the input data set, there is no probabilistic model in nature. To answer Xiangrui Meng’s earlier questions: 1. These algorithm is used for finding major patterns / association rules in a data set. For a real use case, some analytic applications in telecom domain use them to find subscriber behavior from the data set combining service record, network traffic record, and demographic data. Please refer to this Chinese article for example: http://www.ss-lw.com/wxxw-361.html And, sometimes we use frequent itemset algorithm for preparing features input to other algorithm which selects feature and do other ML task like training a classifier, like this paper: http://dl.acm.org/citation.cfm?id=1401922, 2. Since Apriori is a basic algorithm for frequent itemset mining, I am not aware of any parallel implementation for it. But I think the algorithm fits Spark’s data parallel model since it only need to scan the input data set. And for FP-Growth, I do know there is a Parallel FP-Growth from Haoyuan Li: http://dl.acm.org/citation.cfm?id=1454027 . I think I probably will refer to this paper to implement FP-Growth in Spark 3. The Apriori computation complexity is about O(N*k) where N is the number of item in input data and k is the depth of the frequent item tree to search. FP-Grwoth complexity is about O(N), it is more efficient comparing to Apriori. For space efficiency, FP-growth is also more efficient than Apriori. But in case of smaller data and if frequent itemset is more, Apriori is more efficient. This is because FP-Growth need to construct a FP Tree out of the input data set, and it needs some time. And another advantage of Apriori is that it can output association rules while FP-Growth can not. Although these two algorithms are basic algo (FP-Growth is more complex), I think it will be handy if mllib can include them since there is no frequent itemset mining algo in Spark yet, and especially in distributed environment. Please suggest how to handle this issue. Thanks a lot. Add Apriori algorithm to Spark MLlib Key: SPARK-4001 URL: https://issues.apache.org/jira/browse/SPARK-4001 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Jacky Li Assignee: Jacky Li Apriori is the classic algorithm for frequent item set mining in a transactional data set. It will be useful if Apriori algorithm is added to MLLib in Spark -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-4001) Add Apriori algorithm to Spark MLlib
[ https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-4001: --- Comment: was deleted (was: .) Add Apriori algorithm to Spark MLlib Key: SPARK-4001 URL: https://issues.apache.org/jira/browse/SPARK-4001 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Jacky Li Assignee: Jacky Li Apriori is the classic algorithm for frequent item set mining in a transactional data set. It will be useful if Apriori algorithm is added to MLLib in Spark -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4041) convert attributes names in table scan lowercase when compare with relation attributes
wangfei created SPARK-4041: -- Summary: convert attributes names in table scan lowercase when compare with relation attributes Key: SPARK-4041 URL: https://issues.apache.org/jira/browse/SPARK-4041 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.1.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4042) append columns ids and names before broadcast
wangfei created SPARK-4042: -- Summary: append columns ids and names before broadcast Key: SPARK-4042 URL: https://issues.apache.org/jira/browse/SPARK-4042 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.1.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4042) append columns ids and names before broadcast
[ https://issues.apache.org/jira/browse/SPARK-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-4042: --- Description: appended columns ids and names will not broadcast because we append them after create table reader append columns ids and names before broadcast - Key: SPARK-4042 URL: https://issues.apache.org/jira/browse/SPARK-4042 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: wangfei appended columns ids and names will not broadcast because we append them after create table reader -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4042) append columns ids and names before broadcast
[ https://issues.apache.org/jira/browse/SPARK-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-4042: --- Description: appended columns ids and names will not broadcast because we append them after create table reader. This leads to the config broadcasted to executor side dose not contain the configs of appended columns and names. was: appended columns ids and names will not broadcast because we append them after create table reader. This leads to the config broadcasted to executor side dose not contain the configs of appended columns and names append columns ids and names before broadcast - Key: SPARK-4042 URL: https://issues.apache.org/jira/browse/SPARK-4042 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: wangfei appended columns ids and names will not broadcast because we append them after create table reader. This leads to the config broadcasted to executor side dose not contain the configs of appended columns and names. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4042) append columns ids and names before broadcast
[ https://issues.apache.org/jira/browse/SPARK-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-4042: --- Description: appended columns ids and names will not broadcast because we append them after creating table reader. This leads to the config broadcasted to executor side dose not contain the configs of appended columns and names. was: appended columns ids and names will not broadcast because we append them after create table reader. This leads to the config broadcasted to executor side dose not contain the configs of appended columns and names. append columns ids and names before broadcast - Key: SPARK-4042 URL: https://issues.apache.org/jira/browse/SPARK-4042 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: wangfei appended columns ids and names will not broadcast because we append them after creating table reader. This leads to the config broadcasted to executor side dose not contain the configs of appended columns and names. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4042) append columns ids and names before broadcast
[ https://issues.apache.org/jira/browse/SPARK-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-4042: --- Description: appended columns ids and names will not broadcast because we append them after create table reader. This leads to the config broadcasted to executor side dose not contain the configs of appended columns and names was:appended columns ids and names will not broadcast because we append them after create table reader append columns ids and names before broadcast - Key: SPARK-4042 URL: https://issues.apache.org/jira/browse/SPARK-4042 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: wangfei appended columns ids and names will not broadcast because we append them after create table reader. This leads to the config broadcasted to executor side dose not contain the configs of appended columns and names -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3935) Unused variable in PairRDDFunctions.scala
[ https://issues.apache.org/jira/browse/SPARK-3935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-3935: --- Description: There is a unused variable (count) in saveAsHadoopDataset function in PairRDDFunctions.scala. It is better to add a log statement to record the line of output. was: There is a unused variable (count) in saveAsHadoopDataset function in PairRDDFunctions.scala. It is better to add a log statement to record the line of the read file. Unused variable in PairRDDFunctions.scala - Key: SPARK-3935 URL: https://issues.apache.org/jira/browse/SPARK-3935 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Reporter: wangfei Priority: Minor There is a unused variable (count) in saveAsHadoopDataset function in PairRDDFunctions.scala. It is better to add a log statement to record the line of output. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3826) enable hive-thriftserver support hive-0.13.1
[ https://issues.apache.org/jira/browse/SPARK-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-3826: --- Affects Version/s: (was: 1.1.1) 1.1.0 enable hive-thriftserver support hive-0.13.1 Key: SPARK-3826 URL: https://issues.apache.org/jira/browse/SPARK-3826 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Now hive-thriftserver not support hive-0.13, to make it support both 0.12 and 0.13 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3899) wrong links in streaming doc
wangfei created SPARK-3899: -- Summary: wrong links in streaming doc Key: SPARK-3899 URL: https://issues.apache.org/jira/browse/SPARK-3899 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 1.1.0 Reporter: wangfei -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3809) make HiveThriftServer2Suite work correctly
wangfei created SPARK-3809: -- Summary: make HiveThriftServer2Suite work correctly Key: SPARK-3809 URL: https://issues.apache.org/jira/browse/SPARK-3809 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.2.0 Currently HiveThriftServer2Suite is a fake one, actually HiveThriftServer not started there -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3826) enable hive-thriftserver support hive-0.13.1
wangfei created SPARK-3826: -- Summary: enable hive-thriftserver support hive-0.13.1 Key: SPARK-3826 URL: https://issues.apache.org/jira/browse/SPARK-3826 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.1 Reporter: wangfei Now hive-thriftserver not support hive-0.13, to make it support both 0.12 and 0.13 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-3793) use hiveconf when parse hive ql
[ https://issues.apache.org/jira/browse/SPARK-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei closed SPARK-3793. -- Resolution: Fixed should fix it in #2241 use hiveconf when parse hive ql --- Key: SPARK-3793 URL: https://issues.apache.org/jira/browse/SPARK-3793 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Now in spark hive it parse sql use def getAst(sql: String): ASTNode = ParseUtils.findRootNonNullToken((new ParseDriver).parse(sql)) this is ok in hive-0.12, but will lead to NPE version in hive-0.13 So add hiveconf here to make it more general to compatible with hive-0.12 and hive-0.13 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3806) minor bug and exception in CliSuite
wangfei created SPARK-3806: -- Summary: minor bug and exception in CliSuite Key: SPARK-3806 URL: https://issues.apache.org/jira/browse/SPARK-3806 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Clisuite throw exception as follows: Exception in thread Thread-6 java.lang.IndexOutOfBoundsException: 6 at scala.collection.mutable.ResizableArray$class.apply(ResizableArray.scala:43) at scala.collection.mutable.ArrayBuffer.apply(ArrayBuffer.scala:47) at org.apache.spark.sql.hive.thriftserver.CliSuite.org$apache$spark$sql$hive$thriftserver$CliSuite$$captureOutput$1(CliSuite.scala:67) at org.apache.spark.sql.hive.thriftserver.CliSuite$$anonfun$4.apply(CliSuite.scala:78) at org.apache.spark.sql.hive.thriftserver.CliSuite$$anonfun$4.apply(CliSuite.scala:78) at scala.sys.process.ProcessLogger$$anon$1.out(ProcessLogger.scala:96) at scala.sys.process.BasicIO$$anonfun$processOutFully$1.apply(BasicIO.scala:135) at scala.sys.process.BasicIO$$anonfun$processOutFully$1.apply(BasicIO.scala:135) at scala.sys.process.BasicIO$.readFully$1(BasicIO.scala:175) at scala.sys.process.BasicIO$.processLinesFully(BasicIO.scala:179) at scala.sys.process.BasicIO$$anonfun$processFully$1.apply(BasicIO.scala:164) at scala.sys.process.BasicIO$$anonfun$processFully$1.apply(BasicIO.scala:162) at scala.sys.process.ProcessBuilderImpl$Simple$$anonfun$3.apply$mcV$sp(ProcessBuilderImpl.scala:73) at scala.sys.process.ProcessImpl$Spawn$$anon$1.run(ProcessImpl.scala:22) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3806) minor bug in CliSuite
[ https://issues.apache.org/jira/browse/SPARK-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-3806: --- Summary: minor bug in CliSuite (was: minor bug and exception in CliSuite) minor bug in CliSuite - Key: SPARK-3806 URL: https://issues.apache.org/jira/browse/SPARK-3806 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Clisuite throw exception as follows: Exception in thread Thread-6 java.lang.IndexOutOfBoundsException: 6 at scala.collection.mutable.ResizableArray$class.apply(ResizableArray.scala:43) at scala.collection.mutable.ArrayBuffer.apply(ArrayBuffer.scala:47) at org.apache.spark.sql.hive.thriftserver.CliSuite.org$apache$spark$sql$hive$thriftserver$CliSuite$$captureOutput$1(CliSuite.scala:67) at org.apache.spark.sql.hive.thriftserver.CliSuite$$anonfun$4.apply(CliSuite.scala:78) at org.apache.spark.sql.hive.thriftserver.CliSuite$$anonfun$4.apply(CliSuite.scala:78) at scala.sys.process.ProcessLogger$$anon$1.out(ProcessLogger.scala:96) at scala.sys.process.BasicIO$$anonfun$processOutFully$1.apply(BasicIO.scala:135) at scala.sys.process.BasicIO$$anonfun$processOutFully$1.apply(BasicIO.scala:135) at scala.sys.process.BasicIO$.readFully$1(BasicIO.scala:175) at scala.sys.process.BasicIO$.processLinesFully(BasicIO.scala:179) at scala.sys.process.BasicIO$$anonfun$processFully$1.apply(BasicIO.scala:164) at scala.sys.process.BasicIO$$anonfun$processFully$1.apply(BasicIO.scala:162) at scala.sys.process.ProcessBuilderImpl$Simple$$anonfun$3.apply$mcV$sp(ProcessBuilderImpl.scala:73) at scala.sys.process.ProcessImpl$Spawn$$anon$1.run(ProcessImpl.scala:22) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3792) enable JavaHiveQLSuite
wangfei created SPARK-3792: -- Summary: enable JavaHiveQLSuite Key: SPARK-3792 URL: https://issues.apache.org/jira/browse/SPARK-3792 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: wangfei -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3793) add para hiveconf when parse hive ql
wangfei created SPARK-3793: -- Summary: add para hiveconf when parse hive ql Key: SPARK-3793 URL: https://issues.apache.org/jira/browse/SPARK-3793 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Now in spark hive it parse sql use def getAst(sql: String): ASTNode = ParseUtils.findRootNonNullToken((new ParseDriver).parse(sql)) this is ok in hive-0.12, but will lead to NPE version in hive-0.13 So add hiveconf here to make it more general to compatible with hive-0.12 and hive-0.13 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3793) use hiveconf when parse hive ql
[ https://issues.apache.org/jira/browse/SPARK-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-3793: --- Summary: use hiveconf when parse hive ql (was: add para hiveconf when parse hive ql) use hiveconf when parse hive ql --- Key: SPARK-3793 URL: https://issues.apache.org/jira/browse/SPARK-3793 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Now in spark hive it parse sql use def getAst(sql: String): ASTNode = ParseUtils.findRootNonNullToken((new ParseDriver).parse(sql)) this is ok in hive-0.12, but will lead to NPE version in hive-0.13 So add hiveconf here to make it more general to compatible with hive-0.12 and hive-0.13 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3793) add para hiveconf when parse hive ql
[ https://issues.apache.org/jira/browse/SPARK-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-3793: --- Summary: add para hiveconf when parse hive ql (was: use hiveconf when parse hive ql) add para hiveconf when parse hive ql Key: SPARK-3793 URL: https://issues.apache.org/jira/browse/SPARK-3793 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Now in spark hive it parse sql use def getAst(sql: String): ASTNode = ParseUtils.findRootNonNullToken((new ParseDriver).parse(sql)) this is ok in hive-0.12, but will lead to NPE version in hive-0.13 So add hiveconf here to make it more general to compatible with hive-0.12 and hive-0.13 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3793) use hiveconf when parse hive ql
[ https://issues.apache.org/jira/browse/SPARK-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-3793: --- Summary: use hiveconf when parse hive ql (was: add para hiveconf when parse hive ql) use hiveconf when parse hive ql --- Key: SPARK-3793 URL: https://issues.apache.org/jira/browse/SPARK-3793 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: wangfei Now in spark hive it parse sql use def getAst(sql: String): ASTNode = ParseUtils.findRootNonNullToken((new ParseDriver).parse(sql)) this is ok in hive-0.12, but will lead to NPE version in hive-0.13 So add hiveconf here to make it more general to compatible with hive-0.12 and hive-0.13 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3765) add testing with sbt to doc
wangfei created SPARK-3765: -- Summary: add testing with sbt to doc Key: SPARK-3765 URL: https://issues.apache.org/jira/browse/SPARK-3765 Project: Spark Issue Type: Improvement Affects Versions: 1.1.0 Reporter: wangfei -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3766) Snappy is also the default compression codec for broadcast variables
wangfei created SPARK-3766: -- Summary: Snappy is also the default compression codec for broadcast variables Key: SPARK-3766 URL: https://issues.apache.org/jira/browse/SPARK-3766 Project: Spark Issue Type: Improvement Affects Versions: 1.1.0 Reporter: wangfei -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3766) Snappy is also the default compression codec for broadcast variables
[ https://issues.apache.org/jira/browse/SPARK-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-3766: --- Component/s: Documentation Snappy is also the default compression codec for broadcast variables Key: SPARK-3766 URL: https://issues.apache.org/jira/browse/SPARK-3766 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 1.1.0 Reporter: wangfei -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3765) add testing with sbt to doc
[ https://issues.apache.org/jira/browse/SPARK-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-3765: --- Component/s: Documentation add testing with sbt to doc --- Key: SPARK-3765 URL: https://issues.apache.org/jira/browse/SPARK-3765 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 1.1.0 Reporter: wangfei -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3755) Do not bind port 1 - 1024 to server in spark
wangfei created SPARK-3755: -- Summary: Do not bind port 1 - 1024 to server in spark Key: SPARK-3755 URL: https://issues.apache.org/jira/browse/SPARK-3755 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Reporter: wangfei Non-root user use port 1- 1024 to start jetty server will get the exception java.net.SocketException: Permission denied -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3756) check exception is caused by an address-port collision when binding properly
wangfei created SPARK-3756: -- Summary: check exception is caused by an address-port collision when binding properly Key: SPARK-3756 URL: https://issues.apache.org/jira/browse/SPARK-3756 Project: Spark Issue Type: Bug Reporter: wangfei -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3756) check exception is caused by an address-port collision when binding properly
[ https://issues.apache.org/jira/browse/SPARK-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-3756: --- Affects Version/s: 1.1.0 check exception is caused by an address-port collision when binding properly Key: SPARK-3756 URL: https://issues.apache.org/jira/browse/SPARK-3756 Project: Spark Issue Type: Bug Affects Versions: 1.1.0 Reporter: wangfei -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3756) check exception is caused by an address-port collision when binding properly
[ https://issues.apache.org/jira/browse/SPARK-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-3756: --- Description: a tiny bug in method isBindCollision Target Version/s: 1.2.0 check exception is caused by an address-port collision when binding properly Key: SPARK-3756 URL: https://issues.apache.org/jira/browse/SPARK-3756 Project: Spark Issue Type: Bug Affects Versions: 1.1.0 Reporter: wangfei a tiny bug in method isBindCollision -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3756) check exception is caused by an address-port collision properly
[ https://issues.apache.org/jira/browse/SPARK-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei updated SPARK-3756: --- Summary: check exception is caused by an address-port collision properly (was: check exception is caused by an address-port collision when binding properly) check exception is caused by an address-port collision properly --- Key: SPARK-3756 URL: https://issues.apache.org/jira/browse/SPARK-3756 Project: Spark Issue Type: Bug Affects Versions: 1.1.0 Reporter: wangfei a tiny bug in method isBindCollision -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org