[jira] [Created] (SPARK-8865) Fix bug: init SimpleConsumerConfig with kafka params

2015-07-07 Thread guowei (JIRA)
guowei created SPARK-8865:
-

 Summary: Fix bug:  init SimpleConsumerConfig with kafka params
 Key: SPARK-8865
 URL: https://issues.apache.org/jira/browse/SPARK-8865
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Reporter: guowei
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8833) Kafka Direct API support offset in zookeeper

2015-07-06 Thread guowei (JIRA)
guowei created SPARK-8833:
-

 Summary: Kafka Direct API support offset in zookeeper
 Key: SPARK-8833
 URL: https://issues.apache.org/jira/browse/SPARK-8833
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.4.0
Reporter: guowei


Kafka Direct API only support consume the topic from latest or earliest.
but user usually need to consume message from last offset  when restart stream 
app .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1442) Add Window function support

2015-04-21 Thread guowei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guowei updated SPARK-1442:
--
Attachment: (was: Window Function.pdf)

 Add Window function support
 ---

 Key: SPARK-1442
 URL: https://issues.apache.org/jira/browse/SPARK-1442
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Chengxiang Li
Priority: Blocker

 similiar to Hive, add window function support for catalyst.
 https://issues.apache.org/jira/browse/HIVE-4197
 https://issues.apache.org/jira/browse/HIVE-896



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5710) Combines two adjacent `Cast` expressions into one

2015-02-26 Thread guowei (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339649#comment-14339649
 ] 

guowei commented on SPARK-5710:
---

How about limit merging adjacent casts that are only added in 
`typeCoercionRules` ?
We can add a label in `cast` to  mark them which are added in 
`typeCoercionRules`.

 Combines two adjacent `Cast` expressions into one
 -

 Key: SPARK-5710
 URL: https://issues.apache.org/jira/browse/SPARK-5710
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.1
Reporter: guowei
Priority: Minor

 A plan after `analyzer` with `typeCoercionRules` may produce many `cast` 
 expressions. we can combine the adjacent ones.
 For example. 
 create table test(a decimal(3,1));
 explain select * from test where a*2-11;
 == Physical Plan ==
 Filter (CAST(CAST((CAST(CAST((CAST(a#5, DecimalType()) * 2), 
 DecimalType(21,1)), DecimalType()) - 1), DecimalType(22,1)), DecimalType())  
 1)
  HiveTableScan [a#5], (MetastoreRelation default, test, None), None



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5710) Combines two adjacent `Cast` expressions into one

2015-02-09 Thread guowei (JIRA)
guowei created SPARK-5710:
-

 Summary: Combines two adjacent `Cast` expressions into one
 Key: SPARK-5710
 URL: https://issues.apache.org/jira/browse/SPARK-5710
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: guowei


A plan after `analyzer` with `typeCoercionRules` may produce many `cast` 
expressions. we can combine the adjacent ones.

For example. 
create table test(a decimal(3,1));
explain select * from test where a*2-11;

== Physical Plan ==
Filter (CAST(CAST((CAST(CAST((CAST(a#5, DecimalType()) * 2), 
DecimalType(21,1)), DecimalType()) - 1), DecimalType(22,1)), DecimalType())  1)
 HiveTableScan [a#5], (MetastoreRelation default, test, None), None




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5203) union with different decimal type report error

2015-01-12 Thread guowei (JIRA)
guowei created SPARK-5203:
-

 Summary: union with different decimal type report error
 Key: SPARK-5203
 URL: https://issues.apache.org/jira/browse/SPARK-5203
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: guowei


cases like this
create table test (a decimal(10,1));
select a from test union all select a*2 from test;

15/01/12 16:28:54 ERROR SparkSQLDriver: Failed in [select a from test union all 
select a*2 from test]
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved 
attributes: *, tree:
'Project [*]
 'Subquery _u1
  'Union 
   Project [a#1]
MetastoreRelation default, test, None
   Project [CAST((CAST(a#2, DecimalType()) * CAST(CAST(2, DecimalType(10,0)), 
DecimalType())), DecimalType(21,1)) AS _c0#0]
MetastoreRelation default, test, None

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:85)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:83)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:83)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:81)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
at 
scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
at 
scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
at scala.collection.immutable.List.foreach(List.scala:318)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:410)
at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:410)
at 
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:411)
at 
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:411)
at 
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:412)
at 
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:412)
at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:417)
at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:415)
at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:421)
at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:421)
at 
org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:369)
at 
org.apache.spark.sql.hive.thriftserver.AbstractSparkSQLDriver.run(AbstractSparkSQLDriver.scala:58)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:275)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:211)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5118) Create table test stored as parquet as select ... report error

2015-01-06 Thread guowei (JIRA)
guowei created SPARK-5118:
-

 Summary: Create table test stored as parquet as select ... 
report error
 Key: SPARK-5118
 URL: https://issues.apache.org/jira/browse/SPARK-5118
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: guowei






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5118) Create table test stored as parquet as select ... report error

2015-01-06 Thread guowei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guowei updated SPARK-5118:
--
Description: Caused by: java.lang.RuntimeException: Unhandled clauses: 
TOK_TBLPARQUETFILE

 Create table test stored as parquet as select ... report error
 

 Key: SPARK-5118
 URL: https://issues.apache.org/jira/browse/SPARK-5118
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: guowei

 Caused by: java.lang.RuntimeException: Unhandled clauses: TOK_TBLPARQUETFILE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5066) Can not get all key that has same hashcode when reading key ordered from different Streaming.

2015-01-04 Thread guowei (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263787#comment-14263787
 ] 

guowei commented on SPARK-5066:
---

it should return K4 first in your example

 Can not get all key that has same hashcode  when reading key ordered  from 
 different Streaming.
 ---

 Key: SPARK-5066
 URL: https://issues.apache.org/jira/browse/SPARK-5066
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: DoingDone9
Priority: Critical

 when spill is open, data ordered by hashCode will be spilled to disk. We need 
 get all key that has the same hashCode from different tmp files when merge 
 value, but it just read the key that has the minHashCode that in a tmp file, 
 we can not read all key.
 Example :
 If file1 has [k1, k2, k3], file2 has [k4,k5,k1].
 And hashcode of k4  hashcode of k5  hashcode of k1   hashcode of k2   
 hashcode of k3
 we just  read k1 from file1 and k4 from file2. Can not read all k1.
 Code :
 private val inputStreams = (Seq(sortedMap) ++ spilledMaps).map(it = 
 it.buffered)
 inputStreams.foreach { it =
   val kcPairs = new ArrayBuffer[(K, C)]
   readNextHashCode(it, kcPairs)
   if (kcPairs.length  0) {
 mergeHeap.enqueue(new StreamBuffer(it, kcPairs))
   }
 }
  private def readNextHashCode(it: BufferedIterator[(K, C)], buf: 
 ArrayBuffer[(K, C)]): Unit = {
   if (it.hasNext) {
 var kc = it.next()
 buf += kc
 val minHash = hashKey(kc)
 while (it.hasNext  it.head._1.hashCode() == minHash) {
   kc = it.next()
   buf += kc
 }
   }
 }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4988) Create table ..as select ..from..order by .. limit 10 report error when one col is a Decimal

2014-12-28 Thread guowei (JIRA)
guowei created SPARK-4988:
-

 Summary: Create table ..as select ..from..order by .. limit 10 
report error when one col is a Decimal
 Key: SPARK-4988
 URL: https://issues.apache.org/jira/browse/SPARK-4988
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: guowei


A table 'test' with a decimal type col.
create table test1 as select * from test order by a limit 10;

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 
2, localhost): java.lang.ClassCastException: scala.math.BigDecimal cannot be 
cast to org.apache.spark.sql.catalyst.types.decimal.Decimal
at 
org.apache.spark.sql.hive.HiveInspectors$$anonfun$wrapperFor$2.apply(HiveInspectors.scala:339)
at 
org.apache.spark.sql.hive.HiveInspectors$$anonfun$wrapperFor$2.apply(HiveInspectors.scala:339)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:111)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:108)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at 
org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:108)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:87)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:87)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:195)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4988) Create table ..as select ..from..order by .. limit 10 report error when one col is a Decimal

2014-12-28 Thread guowei (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259918#comment-14259918
 ] 

guowei commented on SPARK-4988:
---

in `ScalaReflection.scala`, method `convertToScala`  `case (d: Decimal, _: 
DecimalType) = d.toBigDecimal`.

so, `HiveShim.createDecimal(o.asInstanceOf[Decimal].toBigDecimal.underlying())` 
in `HiveInspectors.scala` report error



 Create table ..as select ..from..order by .. limit 10 report error when one 
 col is a Decimal
 --

 Key: SPARK-4988
 URL: https://issues.apache.org/jira/browse/SPARK-4988
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: guowei

 A table 'test' with a decimal type col.
 create table test1 as select * from test order by a limit 10;
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 
 (TID 2, localhost): java.lang.ClassCastException: scala.math.BigDecimal 
 cannot be cast to org.apache.spark.sql.catalyst.types.decimal.Decimal
   at 
 org.apache.spark.sql.hive.HiveInspectors$$anonfun$wrapperFor$2.apply(HiveInspectors.scala:339)
   at 
 org.apache.spark.sql.hive.HiveInspectors$$anonfun$wrapperFor$2.apply(HiveInspectors.scala:339)
   at 
 org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:111)
   at 
 org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:108)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at 
 org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
   at 
 org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:108)
   at 
 org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:87)
   at 
 org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:87)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:195)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4928) Operation ,,=,= with Decimal report error

2014-12-22 Thread guowei (JIRA)
guowei created SPARK-4928:
-

 Summary: Operation ,,=,= with Decimal report error
 Key: SPARK-4928
 URL: https://issues.apache.org/jira/browse/SPARK-4928
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: guowei


create table test (a Decimal(10,1));
select * from test where a1;

WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): 
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Types do not 
match DecimalType(10,1) != DecimalType(10,0), tree: (input[0]  1)
at 
org.apache.spark.sql.catalyst.expressions.Expression.c2(Expression.scala:249)
at 
org.apache.spark.sql.catalyst.expressions.GreaterThan.eval(predicates.scala:204)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedPredicate$$anonfun$apply$1.apply(predicates.scala:30)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedPredicate$$anonfun$apply$1.apply(predicates.scala:30)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:794)
at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:794)
at 
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1324)
at 
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1324)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:195)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4903) RDD remains cached after DROP TABLE

2014-12-21 Thread guowei (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255379#comment-14255379
 ] 

guowei commented on SPARK-4903:
---

uncache table test

it is OK on my workspace

 RDD remains cached after DROP TABLE
 -

 Key: SPARK-4903
 URL: https://issues.apache.org/jira/browse/SPARK-4903
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
 Environment: Spark master @ Dec 17 
 (3cd516191baadf8496ccdae499771020e89acd7e)
Reporter: Evert Lammerts
Priority: Critical

 In beeline, when I run:
 {code:sql}
 CREATE TABLE test AS select col from table;
 CACHE TABLE test
 DROP TABLE test
 {code}
 The the table is removed but the RDD is still cached. Running UNCACHE is not 
 possible anymore (table not found from metastore).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2087) Clean Multi-user semantics for thrift JDBC/ODBC server.

2014-12-17 Thread guowei (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251343#comment-14251343
 ] 

guowei commented on SPARK-2087:
---

I'm not sure that a full SQLContext per session is a good idea. One SQLConf per 
session is OK 
1. I don't think CACHE TABLE ... AS SELECT... is just a temporary table, we 
treat it as a cached table in many cases.
2. SQl session dose not exist a long time usually , does it need to have its 
own temporary table?

 Clean Multi-user semantics for thrift JDBC/ODBC server.
 ---

 Key: SPARK-2087
 URL: https://issues.apache.org/jira/browse/SPARK-2087
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Michael Armbrust
Priority: Minor

 Configuration and temporary tables should exist per-user.  Cached tables 
 should be shared across users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4815) ThriftServer use only one SessionState to run sql using hive

2014-12-10 Thread guowei (JIRA)
guowei created SPARK-4815:
-

 Summary: ThriftServer use only one SessionState to run sql using 
hive 
 Key: SPARK-4815
 URL: https://issues.apache.org/jira/browse/SPARK-4815
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: guowei


ThriftServer use only one SessionState to run sql using hive, though it from 
different hive sessions.
This will make mistakes:
For example, one user run use database in one beeline client. the database in 
other  beeline change too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4756) sessionToActivePool grow infinitely, even as sessions expire

2014-12-04 Thread guowei (JIRA)
guowei created SPARK-4756:
-

 Summary: sessionToActivePool  grow infinitely, even as sessions 
expire
 Key: SPARK-4756
 URL: https://issues.apache.org/jira/browse/SPARK-4756
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: guowei


sessionToActivePool  in SparkSQLOperationManager grow infinitely, even as 
sessions expire.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1442) Add Window function support

2014-11-28 Thread guowei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guowei updated SPARK-1442:
--
Attachment: (was: Window Function.pdf)

 Add Window function support
 ---

 Key: SPARK-1442
 URL: https://issues.apache.org/jira/browse/SPARK-1442
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Chengxiang Li
 Attachments: Window Function.pdf


 similiar to Hive, add window function support for catalyst.
 https://issues.apache.org/jira/browse/HIVE-4197
 https://issues.apache.org/jira/browse/HIVE-896



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1442) Add Window function support

2014-11-03 Thread guowei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guowei updated SPARK-1442:
--
Attachment: (was: Window Function.pdf)

 Add Window function support
 ---

 Key: SPARK-1442
 URL: https://issues.apache.org/jira/browse/SPARK-1442
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Chengxiang Li
 Attachments: Window Function.pdf


 similiar to Hive, add window function support for catalyst.
 https://issues.apache.org/jira/browse/HIVE-4197
 https://issues.apache.org/jira/browse/HIVE-896



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1442) Add Window function support

2014-11-03 Thread guowei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guowei updated SPARK-1442:
--
Attachment: Window Function.pdf

 Add Window function support
 ---

 Key: SPARK-1442
 URL: https://issues.apache.org/jira/browse/SPARK-1442
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Chengxiang Li
 Attachments: Window Function.pdf


 similiar to Hive, add window function support for catalyst.
 https://issues.apache.org/jira/browse/HIVE-4197
 https://issues.apache.org/jira/browse/HIVE-896



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1442) Add Window function support

2014-10-29 Thread guowei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guowei updated SPARK-1442:
--
Attachment: (was: Window Function.pdf)

 Add Window function support
 ---

 Key: SPARK-1442
 URL: https://issues.apache.org/jira/browse/SPARK-1442
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Chengxiang Li
 Attachments: Window Function.pdf


 similiar to Hive, add window function support for catalyst.
 https://issues.apache.org/jira/browse/HIVE-4197
 https://issues.apache.org/jira/browse/HIVE-896



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1442) Add Window function support

2014-10-29 Thread guowei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guowei updated SPARK-1442:
--
Attachment: Window Function.pdf

 Add Window function support
 ---

 Key: SPARK-1442
 URL: https://issues.apache.org/jira/browse/SPARK-1442
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Chengxiang Li
 Attachments: Window Function.pdf


 similiar to Hive, add window function support for catalyst.
 https://issues.apache.org/jira/browse/HIVE-4197
 https://issues.apache.org/jira/browse/HIVE-896



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1442) Add Window function support

2014-10-28 Thread guowei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guowei updated SPARK-1442:
--
Attachment: Window Function.pdf

 Add Window function support
 ---

 Key: SPARK-1442
 URL: https://issues.apache.org/jira/browse/SPARK-1442
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Chengxiang Li
 Attachments: Window Function.pdf


 similiar to Hive, add window function support for catalyst.
 https://issues.apache.org/jira/browse/HIVE-4197
 https://issues.apache.org/jira/browse/HIVE-896



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3292) Shuffle Tasks run incessantly even though there's no inputs

2014-09-17 Thread guowei (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136999#comment-14136999
 ] 

guowei commented on SPARK-3292:
---

[~saisai_shao]
i test  the scenario with windowing operators, it seems OK.

WindowedDstream compute base on the slice RDD  cached between the from and end 
time,  it seems does not need commit job 

 Shuffle Tasks run incessantly even though there's no inputs
 ---

 Key: SPARK-3292
 URL: https://issues.apache.org/jira/browse/SPARK-3292
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 1.0.2
Reporter: guowei

 such as repartition groupby join and cogroup
 for example. 
 if i want the shuffle outputs save as hadoop file ,even though  there is no 
 inputs , many emtpy file generate too.
 it's too expensive , 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3292) Shuffle Tasks run incessantly even though there's no inputs

2014-09-17 Thread guowei (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138371#comment-14138371
 ] 

guowei commented on SPARK-3292:
---

 below is where i changed in Dstream
 whether commit job or not , the  rdd of  the  time  tag  for windowing 
operators generated
 
   generatedRDDs.put(time, newRDD)
  if (newRDD.partitions.size  0) {
Some(newRDD)
  } else {
None
  }

 Shuffle Tasks run incessantly even though there's no inputs
 ---

 Key: SPARK-3292
 URL: https://issues.apache.org/jira/browse/SPARK-3292
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 1.0.2
Reporter: guowei

 such as repartition groupby join and cogroup
 for example. 
 if i want the shuffle outputs save as hadoop file ,even though  there is no 
 inputs , many emtpy file generate too.
 it's too expensive , 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3292) Shuffle Tasks run incessantly even though there's no inputs

2014-09-01 Thread guowei (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117308#comment-14117308
 ] 

guowei commented on SPARK-3292:
---

i've created  PR 2192 to fix it . but i don't know how to link it

 Shuffle Tasks run incessantly even though there's no inputs
 ---

 Key: SPARK-3292
 URL: https://issues.apache.org/jira/browse/SPARK-3292
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 1.0.2
Reporter: guowei

 such as repartition groupby join and cogroup
 for example. 
 if i want the shuffle outputs save as hadoop file ,even though  there is no 
 inputs , many emtpy file generate too.
 it's too expensive , 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3292) Shuffle Tasks run indefinitely even though there's no inputs

2014-08-28 Thread guowei (JIRA)
guowei created SPARK-3292:
-

 Summary: Shuffle Tasks run indefinitely even though there's no 
inputs
 Key: SPARK-3292
 URL: https://issues.apache.org/jira/browse/SPARK-3292
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 1.0.2
Reporter: guowei


such as repartition groupby join and cogroup
it's too expensive , for example if i want outputs save as hadoop file ,then 
many emtpy file generate.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3292) Shuffle Tasks run incessantly even though there's no inputs

2014-08-28 Thread guowei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guowei updated SPARK-3292:
--

Summary: Shuffle Tasks run incessantly even though there's no inputs  (was: 
Shuffle Tasks run indefinitely even though there's no inputs)

 Shuffle Tasks run incessantly even though there's no inputs
 ---

 Key: SPARK-3292
 URL: https://issues.apache.org/jira/browse/SPARK-3292
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 1.0.2
Reporter: guowei

 such as repartition groupby join and cogroup
 it's too expensive , for example if i want outputs save as hadoop file ,then 
 many emtpy file generate.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3292) Shuffle Tasks run incessantly even though there's no inputs

2014-08-28 Thread guowei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guowei updated SPARK-3292:
--

Description: 
such as repartition groupby join and cogroup
it's too expensive , 
for example if i want outputs save as hadoop file ,then many emtpy file 
generate.

  was:
such as repartition groupby join and cogroup
it's too expensive , for example if i want outputs save as hadoop file ,then 
many emtpy file generate.


 Shuffle Tasks run incessantly even though there's no inputs
 ---

 Key: SPARK-3292
 URL: https://issues.apache.org/jira/browse/SPARK-3292
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 1.0.2
Reporter: guowei

 such as repartition groupby join and cogroup
 it's too expensive , 
 for example if i want outputs save as hadoop file ,then many emtpy file 
 generate.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3292) Shuffle Tasks run incessantly even though there's no inputs

2014-08-28 Thread guowei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guowei updated SPARK-3292:
--

Description: 
such as repartition groupby join and cogroup
it's too expensive , 
for example. if i want the shuffle outputs save as hadoop file ,even though  
there is no inputs , many emtpy file generate 

  was:
such as repartition groupby join and cogroup
it's too expensive , 
for example if i want outputs save as hadoop file ,then many emtpy file 
generate.


 Shuffle Tasks run incessantly even though there's no inputs
 ---

 Key: SPARK-3292
 URL: https://issues.apache.org/jira/browse/SPARK-3292
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 1.0.2
Reporter: guowei

 such as repartition groupby join and cogroup
 it's too expensive , 
 for example. if i want the shuffle outputs save as hadoop file ,even though  
 there is no inputs , many emtpy file generate 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3292) Shuffle Tasks run incessantly even though there's no inputs

2014-08-28 Thread guowei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guowei updated SPARK-3292:
--

Description: 
such as repartition groupby join and cogroup
for example. 
if i want the shuffle outputs save as hadoop file ,even though  there is no 
inputs , many emtpy file generate too.
it's too expensive , 

  was:
such as repartition groupby join and cogroup
it's too expensive , 
for example. if i want the shuffle outputs save as hadoop file ,even though  
there is no inputs , many emtpy file generate 


 Shuffle Tasks run incessantly even though there's no inputs
 ---

 Key: SPARK-3292
 URL: https://issues.apache.org/jira/browse/SPARK-3292
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 1.0.2
Reporter: guowei

 such as repartition groupby join and cogroup
 for example. 
 if i want the shuffle outputs save as hadoop file ,even though  there is no 
 inputs , many emtpy file generate too.
 it's too expensive , 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-2986) setting properties seems not effective

2014-08-12 Thread guowei (JIRA)
guowei created SPARK-2986:
-

 Summary: setting properties seems not effective
 Key: SPARK-2986
 URL: https://issues.apache.org/jira/browse/SPARK-2986
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.2
Reporter: guowei


setting properties like set spark.sql.shuffle.partitions=100 seems not 
effective。




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-2364) ShuffledDStream run tasks only when dstream has partition items

2014-07-03 Thread guowei (JIRA)
guowei created SPARK-2364:
-

 Summary: ShuffledDStream run tasks only when dstream has partition 
items
 Key: SPARK-2364
 URL: https://issues.apache.org/jira/browse/SPARK-2364
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Reporter: guowei


ShuffledDStream run tasks no matter whether dstream has partition items



--
This message was sent by Atlassian JIRA
(v6.2#6252)