[jira] [Created] (SPARK-8865) Fix bug: init SimpleConsumerConfig with kafka params
guowei created SPARK-8865: - Summary: Fix bug: init SimpleConsumerConfig with kafka params Key: SPARK-8865 URL: https://issues.apache.org/jira/browse/SPARK-8865 Project: Spark Issue Type: Bug Components: Streaming Reporter: guowei Fix For: 1.4.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8833) Kafka Direct API support offset in zookeeper
guowei created SPARK-8833: - Summary: Kafka Direct API support offset in zookeeper Key: SPARK-8833 URL: https://issues.apache.org/jira/browse/SPARK-8833 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.4.0 Reporter: guowei Kafka Direct API only support consume the topic from latest or earliest. but user usually need to consume message from last offset when restart stream app . -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-1442) Add Window function support
[ https://issues.apache.org/jira/browse/SPARK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guowei updated SPARK-1442: -- Attachment: (was: Window Function.pdf) Add Window function support --- Key: SPARK-1442 URL: https://issues.apache.org/jira/browse/SPARK-1442 Project: Spark Issue Type: New Feature Components: SQL Reporter: Chengxiang Li Priority: Blocker similiar to Hive, add window function support for catalyst. https://issues.apache.org/jira/browse/HIVE-4197 https://issues.apache.org/jira/browse/HIVE-896 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5710) Combines two adjacent `Cast` expressions into one
[ https://issues.apache.org/jira/browse/SPARK-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339649#comment-14339649 ] guowei commented on SPARK-5710: --- How about limit merging adjacent casts that are only added in `typeCoercionRules` ? We can add a label in `cast` to mark them which are added in `typeCoercionRules`. Combines two adjacent `Cast` expressions into one - Key: SPARK-5710 URL: https://issues.apache.org/jira/browse/SPARK-5710 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.2.1 Reporter: guowei Priority: Minor A plan after `analyzer` with `typeCoercionRules` may produce many `cast` expressions. we can combine the adjacent ones. For example. create table test(a decimal(3,1)); explain select * from test where a*2-11; == Physical Plan == Filter (CAST(CAST((CAST(CAST((CAST(a#5, DecimalType()) * 2), DecimalType(21,1)), DecimalType()) - 1), DecimalType(22,1)), DecimalType()) 1) HiveTableScan [a#5], (MetastoreRelation default, test, None), None -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5710) Combines two adjacent `Cast` expressions into one
guowei created SPARK-5710: - Summary: Combines two adjacent `Cast` expressions into one Key: SPARK-5710 URL: https://issues.apache.org/jira/browse/SPARK-5710 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.0 Reporter: guowei A plan after `analyzer` with `typeCoercionRules` may produce many `cast` expressions. we can combine the adjacent ones. For example. create table test(a decimal(3,1)); explain select * from test where a*2-11; == Physical Plan == Filter (CAST(CAST((CAST(CAST((CAST(a#5, DecimalType()) * 2), DecimalType(21,1)), DecimalType()) - 1), DecimalType(22,1)), DecimalType()) 1) HiveTableScan [a#5], (MetastoreRelation default, test, None), None -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5203) union with different decimal type report error
guowei created SPARK-5203: - Summary: union with different decimal type report error Key: SPARK-5203 URL: https://issues.apache.org/jira/browse/SPARK-5203 Project: Spark Issue Type: Bug Components: SQL Reporter: guowei cases like this create table test (a decimal(10,1)); select a from test union all select a*2 from test; 15/01/12 16:28:54 ERROR SparkSQLDriver: Failed in [select a from test union all select a*2 from test] org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: *, tree: 'Project [*] 'Subquery _u1 'Union Project [a#1] MetastoreRelation default, test, None Project [CAST((CAST(a#2, DecimalType()) * CAST(CAST(2, DecimalType(10,0)), DecimalType())), DecimalType(21,1)) AS _c0#0] MetastoreRelation default, test, None at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:85) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:83) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:83) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:81) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59) at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51) at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60) at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51) at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:410) at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:410) at org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:411) at org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:411) at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:412) at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:412) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:417) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:415) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:421) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:421) at org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:369) at org.apache.spark.sql.hive.thriftserver.AbstractSparkSQLDriver.run(AbstractSparkSQLDriver.scala:58) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:275) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:211) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5118) Create table test stored as parquet as select ... report error
guowei created SPARK-5118: - Summary: Create table test stored as parquet as select ... report error Key: SPARK-5118 URL: https://issues.apache.org/jira/browse/SPARK-5118 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: guowei -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5118) Create table test stored as parquet as select ... report error
[ https://issues.apache.org/jira/browse/SPARK-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guowei updated SPARK-5118: -- Description: Caused by: java.lang.RuntimeException: Unhandled clauses: TOK_TBLPARQUETFILE Create table test stored as parquet as select ... report error Key: SPARK-5118 URL: https://issues.apache.org/jira/browse/SPARK-5118 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: guowei Caused by: java.lang.RuntimeException: Unhandled clauses: TOK_TBLPARQUETFILE -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5066) Can not get all key that has same hashcode when reading key ordered from different Streaming.
[ https://issues.apache.org/jira/browse/SPARK-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263787#comment-14263787 ] guowei commented on SPARK-5066: --- it should return K4 first in your example Can not get all key that has same hashcode when reading key ordered from different Streaming. --- Key: SPARK-5066 URL: https://issues.apache.org/jira/browse/SPARK-5066 Project: Spark Issue Type: Bug Affects Versions: 1.2.0 Reporter: DoingDone9 Priority: Critical when spill is open, data ordered by hashCode will be spilled to disk. We need get all key that has the same hashCode from different tmp files when merge value, but it just read the key that has the minHashCode that in a tmp file, we can not read all key. Example : If file1 has [k1, k2, k3], file2 has [k4,k5,k1]. And hashcode of k4 hashcode of k5 hashcode of k1 hashcode of k2 hashcode of k3 we just read k1 from file1 and k4 from file2. Can not read all k1. Code : private val inputStreams = (Seq(sortedMap) ++ spilledMaps).map(it = it.buffered) inputStreams.foreach { it = val kcPairs = new ArrayBuffer[(K, C)] readNextHashCode(it, kcPairs) if (kcPairs.length 0) { mergeHeap.enqueue(new StreamBuffer(it, kcPairs)) } } private def readNextHashCode(it: BufferedIterator[(K, C)], buf: ArrayBuffer[(K, C)]): Unit = { if (it.hasNext) { var kc = it.next() buf += kc val minHash = hashKey(kc) while (it.hasNext it.head._1.hashCode() == minHash) { kc = it.next() buf += kc } } } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4988) Create table ..as select ..from..order by .. limit 10 report error when one col is a Decimal
guowei created SPARK-4988: - Summary: Create table ..as select ..from..order by .. limit 10 report error when one col is a Decimal Key: SPARK-4988 URL: https://issues.apache.org/jira/browse/SPARK-4988 Project: Spark Issue Type: Bug Components: SQL Reporter: guowei A table 'test' with a decimal type col. create table test1 as select * from test order by a limit 10; org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2, localhost): java.lang.ClassCastException: scala.math.BigDecimal cannot be cast to org.apache.spark.sql.catalyst.types.decimal.Decimal at org.apache.spark.sql.hive.HiveInspectors$$anonfun$wrapperFor$2.apply(HiveInspectors.scala:339) at org.apache.spark.sql.hive.HiveInspectors$$anonfun$wrapperFor$2.apply(HiveInspectors.scala:339) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:111) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:108) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:108) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:87) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:87) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:195) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4988) Create table ..as select ..from..order by .. limit 10 report error when one col is a Decimal
[ https://issues.apache.org/jira/browse/SPARK-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259918#comment-14259918 ] guowei commented on SPARK-4988: --- in `ScalaReflection.scala`, method `convertToScala` `case (d: Decimal, _: DecimalType) = d.toBigDecimal`. so, `HiveShim.createDecimal(o.asInstanceOf[Decimal].toBigDecimal.underlying())` in `HiveInspectors.scala` report error Create table ..as select ..from..order by .. limit 10 report error when one col is a Decimal -- Key: SPARK-4988 URL: https://issues.apache.org/jira/browse/SPARK-4988 Project: Spark Issue Type: Bug Components: SQL Reporter: guowei A table 'test' with a decimal type col. create table test1 as select * from test order by a limit 10; org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2, localhost): java.lang.ClassCastException: scala.math.BigDecimal cannot be cast to org.apache.spark.sql.catalyst.types.decimal.Decimal at org.apache.spark.sql.hive.HiveInspectors$$anonfun$wrapperFor$2.apply(HiveInspectors.scala:339) at org.apache.spark.sql.hive.HiveInspectors$$anonfun$wrapperFor$2.apply(HiveInspectors.scala:339) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:111) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:108) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:108) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:87) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:87) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:195) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4928) Operation ,,=,= with Decimal report error
guowei created SPARK-4928: - Summary: Operation ,,=,= with Decimal report error Key: SPARK-4928 URL: https://issues.apache.org/jira/browse/SPARK-4928 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: guowei create table test (a Decimal(10,1)); select * from test where a1; WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Types do not match DecimalType(10,1) != DecimalType(10,0), tree: (input[0] 1) at org.apache.spark.sql.catalyst.expressions.Expression.c2(Expression.scala:249) at org.apache.spark.sql.catalyst.expressions.GreaterThan.eval(predicates.scala:204) at org.apache.spark.sql.catalyst.expressions.InterpretedPredicate$$anonfun$apply$1.apply(predicates.scala:30) at org.apache.spark.sql.catalyst.expressions.InterpretedPredicate$$anonfun$apply$1.apply(predicates.scala:30) at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:794) at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:794) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1324) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1324) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:195) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4903) RDD remains cached after DROP TABLE
[ https://issues.apache.org/jira/browse/SPARK-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255379#comment-14255379 ] guowei commented on SPARK-4903: --- uncache table test it is OK on my workspace RDD remains cached after DROP TABLE - Key: SPARK-4903 URL: https://issues.apache.org/jira/browse/SPARK-4903 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Environment: Spark master @ Dec 17 (3cd516191baadf8496ccdae499771020e89acd7e) Reporter: Evert Lammerts Priority: Critical In beeline, when I run: {code:sql} CREATE TABLE test AS select col from table; CACHE TABLE test DROP TABLE test {code} The the table is removed but the RDD is still cached. Running UNCACHE is not possible anymore (table not found from metastore). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2087) Clean Multi-user semantics for thrift JDBC/ODBC server.
[ https://issues.apache.org/jira/browse/SPARK-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251343#comment-14251343 ] guowei commented on SPARK-2087: --- I'm not sure that a full SQLContext per session is a good idea. One SQLConf per session is OK 1. I don't think CACHE TABLE ... AS SELECT... is just a temporary table, we treat it as a cached table in many cases. 2. SQl session dose not exist a long time usually , does it need to have its own temporary table? Clean Multi-user semantics for thrift JDBC/ODBC server. --- Key: SPARK-2087 URL: https://issues.apache.org/jira/browse/SPARK-2087 Project: Spark Issue Type: Bug Components: SQL Reporter: Michael Armbrust Priority: Minor Configuration and temporary tables should exist per-user. Cached tables should be shared across users. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4815) ThriftServer use only one SessionState to run sql using hive
guowei created SPARK-4815: - Summary: ThriftServer use only one SessionState to run sql using hive Key: SPARK-4815 URL: https://issues.apache.org/jira/browse/SPARK-4815 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: guowei ThriftServer use only one SessionState to run sql using hive, though it from different hive sessions. This will make mistakes: For example, one user run use database in one beeline client. the database in other beeline change too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4756) sessionToActivePool grow infinitely, even as sessions expire
guowei created SPARK-4756: - Summary: sessionToActivePool grow infinitely, even as sessions expire Key: SPARK-4756 URL: https://issues.apache.org/jira/browse/SPARK-4756 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: guowei sessionToActivePool in SparkSQLOperationManager grow infinitely, even as sessions expire. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-1442) Add Window function support
[ https://issues.apache.org/jira/browse/SPARK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guowei updated SPARK-1442: -- Attachment: (was: Window Function.pdf) Add Window function support --- Key: SPARK-1442 URL: https://issues.apache.org/jira/browse/SPARK-1442 Project: Spark Issue Type: New Feature Components: SQL Reporter: Chengxiang Li Attachments: Window Function.pdf similiar to Hive, add window function support for catalyst. https://issues.apache.org/jira/browse/HIVE-4197 https://issues.apache.org/jira/browse/HIVE-896 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-1442) Add Window function support
[ https://issues.apache.org/jira/browse/SPARK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guowei updated SPARK-1442: -- Attachment: (was: Window Function.pdf) Add Window function support --- Key: SPARK-1442 URL: https://issues.apache.org/jira/browse/SPARK-1442 Project: Spark Issue Type: New Feature Components: SQL Reporter: Chengxiang Li Attachments: Window Function.pdf similiar to Hive, add window function support for catalyst. https://issues.apache.org/jira/browse/HIVE-4197 https://issues.apache.org/jira/browse/HIVE-896 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-1442) Add Window function support
[ https://issues.apache.org/jira/browse/SPARK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guowei updated SPARK-1442: -- Attachment: Window Function.pdf Add Window function support --- Key: SPARK-1442 URL: https://issues.apache.org/jira/browse/SPARK-1442 Project: Spark Issue Type: New Feature Components: SQL Reporter: Chengxiang Li Attachments: Window Function.pdf similiar to Hive, add window function support for catalyst. https://issues.apache.org/jira/browse/HIVE-4197 https://issues.apache.org/jira/browse/HIVE-896 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-1442) Add Window function support
[ https://issues.apache.org/jira/browse/SPARK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guowei updated SPARK-1442: -- Attachment: (was: Window Function.pdf) Add Window function support --- Key: SPARK-1442 URL: https://issues.apache.org/jira/browse/SPARK-1442 Project: Spark Issue Type: New Feature Components: SQL Reporter: Chengxiang Li Attachments: Window Function.pdf similiar to Hive, add window function support for catalyst. https://issues.apache.org/jira/browse/HIVE-4197 https://issues.apache.org/jira/browse/HIVE-896 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-1442) Add Window function support
[ https://issues.apache.org/jira/browse/SPARK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guowei updated SPARK-1442: -- Attachment: Window Function.pdf Add Window function support --- Key: SPARK-1442 URL: https://issues.apache.org/jira/browse/SPARK-1442 Project: Spark Issue Type: New Feature Components: SQL Reporter: Chengxiang Li Attachments: Window Function.pdf similiar to Hive, add window function support for catalyst. https://issues.apache.org/jira/browse/HIVE-4197 https://issues.apache.org/jira/browse/HIVE-896 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-1442) Add Window function support
[ https://issues.apache.org/jira/browse/SPARK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guowei updated SPARK-1442: -- Attachment: Window Function.pdf Add Window function support --- Key: SPARK-1442 URL: https://issues.apache.org/jira/browse/SPARK-1442 Project: Spark Issue Type: New Feature Components: SQL Reporter: Chengxiang Li Attachments: Window Function.pdf similiar to Hive, add window function support for catalyst. https://issues.apache.org/jira/browse/HIVE-4197 https://issues.apache.org/jira/browse/HIVE-896 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3292) Shuffle Tasks run incessantly even though there's no inputs
[ https://issues.apache.org/jira/browse/SPARK-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136999#comment-14136999 ] guowei commented on SPARK-3292: --- [~saisai_shao] i test the scenario with windowing operators, it seems OK. WindowedDstream compute base on the slice RDD cached between the from and end time, it seems does not need commit job Shuffle Tasks run incessantly even though there's no inputs --- Key: SPARK-3292 URL: https://issues.apache.org/jira/browse/SPARK-3292 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.0.2 Reporter: guowei such as repartition groupby join and cogroup for example. if i want the shuffle outputs save as hadoop file ,even though there is no inputs , many emtpy file generate too. it's too expensive , -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3292) Shuffle Tasks run incessantly even though there's no inputs
[ https://issues.apache.org/jira/browse/SPARK-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138371#comment-14138371 ] guowei commented on SPARK-3292: --- below is where i changed in Dstream whether commit job or not , the rdd of the time tag for windowing operators generated generatedRDDs.put(time, newRDD) if (newRDD.partitions.size 0) { Some(newRDD) } else { None } Shuffle Tasks run incessantly even though there's no inputs --- Key: SPARK-3292 URL: https://issues.apache.org/jira/browse/SPARK-3292 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.0.2 Reporter: guowei such as repartition groupby join and cogroup for example. if i want the shuffle outputs save as hadoop file ,even though there is no inputs , many emtpy file generate too. it's too expensive , -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3292) Shuffle Tasks run incessantly even though there's no inputs
[ https://issues.apache.org/jira/browse/SPARK-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117308#comment-14117308 ] guowei commented on SPARK-3292: --- i've created PR 2192 to fix it . but i don't know how to link it Shuffle Tasks run incessantly even though there's no inputs --- Key: SPARK-3292 URL: https://issues.apache.org/jira/browse/SPARK-3292 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.0.2 Reporter: guowei such as repartition groupby join and cogroup for example. if i want the shuffle outputs save as hadoop file ,even though there is no inputs , many emtpy file generate too. it's too expensive , -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3292) Shuffle Tasks run indefinitely even though there's no inputs
guowei created SPARK-3292: - Summary: Shuffle Tasks run indefinitely even though there's no inputs Key: SPARK-3292 URL: https://issues.apache.org/jira/browse/SPARK-3292 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.0.2 Reporter: guowei such as repartition groupby join and cogroup it's too expensive , for example if i want outputs save as hadoop file ,then many emtpy file generate. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3292) Shuffle Tasks run incessantly even though there's no inputs
[ https://issues.apache.org/jira/browse/SPARK-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guowei updated SPARK-3292: -- Summary: Shuffle Tasks run incessantly even though there's no inputs (was: Shuffle Tasks run indefinitely even though there's no inputs) Shuffle Tasks run incessantly even though there's no inputs --- Key: SPARK-3292 URL: https://issues.apache.org/jira/browse/SPARK-3292 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.0.2 Reporter: guowei such as repartition groupby join and cogroup it's too expensive , for example if i want outputs save as hadoop file ,then many emtpy file generate. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3292) Shuffle Tasks run incessantly even though there's no inputs
[ https://issues.apache.org/jira/browse/SPARK-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guowei updated SPARK-3292: -- Description: such as repartition groupby join and cogroup it's too expensive , for example if i want outputs save as hadoop file ,then many emtpy file generate. was: such as repartition groupby join and cogroup it's too expensive , for example if i want outputs save as hadoop file ,then many emtpy file generate. Shuffle Tasks run incessantly even though there's no inputs --- Key: SPARK-3292 URL: https://issues.apache.org/jira/browse/SPARK-3292 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.0.2 Reporter: guowei such as repartition groupby join and cogroup it's too expensive , for example if i want outputs save as hadoop file ,then many emtpy file generate. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3292) Shuffle Tasks run incessantly even though there's no inputs
[ https://issues.apache.org/jira/browse/SPARK-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guowei updated SPARK-3292: -- Description: such as repartition groupby join and cogroup it's too expensive , for example. if i want the shuffle outputs save as hadoop file ,even though there is no inputs , many emtpy file generate was: such as repartition groupby join and cogroup it's too expensive , for example if i want outputs save as hadoop file ,then many emtpy file generate. Shuffle Tasks run incessantly even though there's no inputs --- Key: SPARK-3292 URL: https://issues.apache.org/jira/browse/SPARK-3292 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.0.2 Reporter: guowei such as repartition groupby join and cogroup it's too expensive , for example. if i want the shuffle outputs save as hadoop file ,even though there is no inputs , many emtpy file generate -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3292) Shuffle Tasks run incessantly even though there's no inputs
[ https://issues.apache.org/jira/browse/SPARK-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guowei updated SPARK-3292: -- Description: such as repartition groupby join and cogroup for example. if i want the shuffle outputs save as hadoop file ,even though there is no inputs , many emtpy file generate too. it's too expensive , was: such as repartition groupby join and cogroup it's too expensive , for example. if i want the shuffle outputs save as hadoop file ,even though there is no inputs , many emtpy file generate Shuffle Tasks run incessantly even though there's no inputs --- Key: SPARK-3292 URL: https://issues.apache.org/jira/browse/SPARK-3292 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.0.2 Reporter: guowei such as repartition groupby join and cogroup for example. if i want the shuffle outputs save as hadoop file ,even though there is no inputs , many emtpy file generate too. it's too expensive , -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-2986) setting properties seems not effective
guowei created SPARK-2986: - Summary: setting properties seems not effective Key: SPARK-2986 URL: https://issues.apache.org/jira/browse/SPARK-2986 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.2 Reporter: guowei setting properties like set spark.sql.shuffle.partitions=100 seems not effective。 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-2364) ShuffledDStream run tasks only when dstream has partition items
guowei created SPARK-2364: - Summary: ShuffledDStream run tasks only when dstream has partition items Key: SPARK-2364 URL: https://issues.apache.org/jira/browse/SPARK-2364 Project: Spark Issue Type: Improvement Components: Streaming Reporter: guowei ShuffledDStream run tasks no matter whether dstream has partition items -- This message was sent by Atlassian JIRA (v6.2#6252)