[GitHub] spark pull request #16099: [SPARK-18665][SQL] set statement state to "ERROR"...

2018-02-04 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16099#discussion_r165878592
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
 ---
@@ -241,6 +241,8 @@ private[hive] class SparkExecuteStatementOperation(
   dataTypes = 
result.queryExecution.analyzed.output.map(_.dataType).toArray
 } catch {
   case e: HiveSQLException =>
+HiveThriftServer2.listener.onStatementError(
+  statementId, e.getMessage, SparkUtils.exceptionString(e))
--- End diff --

please catch the exception in ThriftServerPage.errorMessageCell. Without my 
pr, that function still throw exception.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16099: [SPARK-18665][SQL] set statement state to "ERROR" after ...

2018-02-04 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/16099
  
@gatorsmile two years passed... I don't know what to say.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16099: [SPARK-18665][SQL] set statement state to "ERROR"...

2018-02-04 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16099#discussion_r165877094
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
 ---
@@ -241,6 +241,8 @@ private[hive] class SparkExecuteStatementOperation(
   dataTypes = 
result.queryExecution.analyzed.output.map(_.dataType).toArray
 } catch {
   case e: HiveSQLException =>
+HiveThriftServer2.listener.onStatementError(
+  statementId, e.getMessage, SparkUtils.exceptionString(e))
--- End diff --

ok, but My pr is closed by community...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14129: [SPARK-16280][SQL] Implement histogram_numeric SQL funct...

2017-12-08 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/14129
  
Is this pr available?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19756: [SPARK-22527][SQL] Reuse coordinated exchanges if...

2017-11-16 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/19756#discussion_r151615572
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/Exchange.scala 
---
@@ -109,3 +109,67 @@ case class ReuseExchange(conf: SQLConf) extends 
Rule[SparkPlan] {
 }
   }
 }
+
+/**
+ * Find out duplicated coordinated exchanges in the spark plan, then use 
the same exchange for all
+ * the references.
+ */
+case class ReuseExchangeWithCoordinator(conf: SQLConf) extends 
Rule[SparkPlan] {
+
+  // Returns true if a SparkPlan has coordinated ShuffleExchangeExec 
children.
+  private def hasCoordinatedExchanges(plan: SparkPlan): Boolean = {
+plan.children.nonEmpty && plan.children.forall {
+  case ShuffleExchangeExec(_, _, Some(_)) => true
+  case _ => false
+}
+  }
+
+  // Returns true if two sequences of exchanges are producing the same 
results.
+  private def hasExchangesWithSameResults(
+  source: Seq[ShuffleExchangeExec],
+  target: Seq[ShuffleExchangeExec]): Boolean = {
+source.length == target.length &&
+  source.zip(target).forall(x => 
x._1.withoutCoordinator.sameResult(x._2.withoutCoordinator))
+  }
+
+  type CoordinatorSignature = (Int, Long, Option[Int])
+
+  def apply(plan: SparkPlan): SparkPlan = {
+if (!conf.exchangeReuseEnabled) {
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19756: [SPARK-22527][SQL] Reuse coordinated exchanges if...

2017-11-16 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/19756#discussion_r151612678
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/Exchange.scala 
---
@@ -109,3 +109,67 @@ case class ReuseExchange(conf: SQLConf) extends 
Rule[SparkPlan] {
 }
   }
 }
+
+/**
+ * Find out duplicated coordinated exchanges in the spark plan, then use 
the same exchange for all
+ * the references.
+ */
+case class ReuseExchangeWithCoordinator(conf: SQLConf) extends 
Rule[SparkPlan] {
+
+  // Returns true if a SparkPlan has coordinated ShuffleExchangeExec 
children.
+  private def hasCoordinatedExchanges(plan: SparkPlan): Boolean = {
+plan.children.nonEmpty && plan.children.forall {
+  case ShuffleExchangeExec(_, _, Some(_)) => true
+  case _ => false
+}
+  }
+
+  // Returns true if two sequences of exchanges are producing the same 
results.
+  private def hasExchangesWithSameResults(
+  source: Seq[ShuffleExchangeExec],
+  target: Seq[ShuffleExchangeExec]): Boolean = {
+source.length == target.length &&
+  source.zip(target).forall(x => 
x._1.withoutCoordinator.sameResult(x._2.withoutCoordinator))
+  }
+
+  type CoordinatorSignature = (Int, Long, Option[Int])
+
+  def apply(plan: SparkPlan): SparkPlan = {
+if (!conf.exchangeReuseEnabled) {
--- End diff --

I don't know whether spark 2.2 still has this bug or not. I am using spark 
2.1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19756: [SPARK-22527][SQL] Reuse coordinated exchanges if...

2017-11-16 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/19756#discussion_r151611386
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/Exchange.scala 
---
@@ -109,3 +109,67 @@ case class ReuseExchange(conf: SQLConf) extends 
Rule[SparkPlan] {
 }
   }
 }
+
+/**
+ * Find out duplicated coordinated exchanges in the spark plan, then use 
the same exchange for all
+ * the references.
+ */
+case class ReuseExchangeWithCoordinator(conf: SQLConf) extends 
Rule[SparkPlan] {
+
+  // Returns true if a SparkPlan has coordinated ShuffleExchangeExec 
children.
+  private def hasCoordinatedExchanges(plan: SparkPlan): Boolean = {
+plan.children.nonEmpty && plan.children.forall {
+  case ShuffleExchangeExec(_, _, Some(_)) => true
+  case _ => false
+}
+  }
+
+  // Returns true if two sequences of exchanges are producing the same 
results.
+  private def hasExchangesWithSameResults(
+  source: Seq[ShuffleExchangeExec],
+  target: Seq[ShuffleExchangeExec]): Boolean = {
+source.length == target.length &&
+  source.zip(target).forall(x => 
x._1.withoutCoordinator.sameResult(x._2.withoutCoordinator))
+  }
+
+  type CoordinatorSignature = (Int, Long, Option[Int])
+
+  def apply(plan: SparkPlan): SparkPlan = {
+if (!conf.exchangeReuseEnabled) {
--- End diff --

exchangeReuseEnabled still has a bug: SPARK-20295, can we use a new 
configuration?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-10-08 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/18270
  
@gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19301: [SPARK-22084][SQL] Fix performance regression in aggrega...

2017-09-21 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/19301
  
should `sum(mt_cnt)` and  `sum(ele_cnt)` be compute again?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19301: [SPARK-22084][SQL] Fix performance regression in aggrega...

2017-09-21 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/19301
  
I don't know wether my case can be optimized or not. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19301: [SPARK-22084][SQL] Fix performance regression in aggrega...

2017-09-21 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/19301
  
my case:
```sql
select dt,
geohash_of_latlng,
sum(mt_cnt),
sum(ele_cnt),
round(sum(mt_cnt) * 1.0 * 100 / sum(mt_cnt_all), 2),
round(sum(ele_cnt) * 1.0 * 100 / sum(ele_cnt_all), 2)
from temp.test_geohash_match_parquet
group by dt, geohash_of_latlng
order by dt, geohash_of_latlng limit 10;
```
before your fix
```java
TakeOrderedAndProject(limit=10, orderBy=[dt#502 ASC NULLS 
FIRST,geohash_of_latlng#507 ASC NULLS FIRST], 
output=[dt#502,geohash_of_latlng#507,sum(mt_cnt)#521L,sum(ele_cnt)#522L,round((CAST((CAST((CAST(CAST(sum(CAST(mt_cnt
 AS BIGINT)) AS DECIMAL(20,0)) AS DECIMAL(21,1)) * CAST(1.0 AS DECIMAL(21,1))) 
AS DECIMAL(23,1)) * CAST(CAST(100 AS DECIMAL(23,1)) AS DECIMAL(23,1))) AS 
DECIMAL(38,2)) / CAST(CAST(sum(CAST(mt_cnt_all AS BIGINT)) AS DECIMAL(20,0)) AS 
DECIMAL(38,2))), 2)#523,round((CAST((CAST((CAST(CAST(sum(CAST(ele_cnt AS 
BIGINT)) AS DECIMAL(20,0)) AS DECIMAL(21,1)) * CAST(1.0 AS DECIMAL(21,1))) AS 
DECIMAL(23,1)) * CAST(CAST(100 AS DECIMAL(23,1)) AS DECIMAL(23,1))) AS 
DECIMAL(38,2)) / CAST(CAST(sum(CAST(ele_cnt_all AS BIGINT)) AS DECIMAL(20,0)) 
AS DECIMAL(38,2))), 2)#524])
+- *HashAggregate(keys=[dt#502, geohash_of_latlng#507], 
functions=[sum(cast(mt_cnt#511 as bigint)), sum(cast(ele_cnt#512 as bigint)), 
sum(cast(mt_cnt#511 as bigint)), sum(cast(mt_cnt_all#513 as bigint)), 
sum(cast(ele_cnt#512 as bigint)), sum(cast(ele_cnt_all#514 as bigint))])
   +- Exchange(coordinator id: 148401229) hashpartitioning(dt#502, 
geohash_of_latlng#507, 1000), coordinator[target post-shuffle partition size: 
200]
  +- *HashAggregate(keys=[dt#502, geohash_of_latlng#507], 
functions=[partial_sum(cast(mt_cnt#511 as bigint)), 
partial_sum(cast(ele_cnt#512 as bigint)), partial_sum(cast(mt_cnt#511 as 
bigint)), partial_sum(cast(mt_cnt_all#513 as bigint)), 
partial_sum(cast(ele_cnt#512 as bigint)), partial_sum(cast(ele_cnt_all#514 as 
bigint))])
 +- HiveTableScan [geohash_of_latlng#507, mt_cnt#511, ele_cnt#512, 
mt_cnt_all#513, ele_cnt_all#514, dt#502], MetastoreRelation temp, 
test_geohash_match_parquet
```

after your fix
```java
TakeOrderedAndProject(limit=10, orderBy=[dt#467 ASC NULLS 
FIRST,geohash_of_latlng#472 ASC NULLS FIRST], 
output=[dt#467,geohash_of_latlng#472,sum(mt_cnt)#486L,sum(ele_cnt)#487L,round((CAST((CAST((CAST(CAST(sum(CAST(mt_cnt
 AS BIGINT)) AS DECIMAL(20,0)) AS DECIMAL(21,1)) * CAST(1.0 AS DECIMAL(21,1))) 
AS DECIMAL(23,1)) * CAST(CAST(100 AS DECIMAL(23,1)) AS DECIMAL(23,1))) AS 
DECIMAL(38,2)) / CAST(CAST(sum(CAST(mt_cnt_all AS BIGINT)) AS DECIMAL(20,0)) AS 
DECIMAL(38,2))), 2)#488,round((CAST((CAST((CAST(CAST(sum(CAST(ele_cnt AS 
BIGINT)) AS DECIMAL(20,0)) AS DECIMAL(21,1)) * CAST(1.0 AS DECIMAL(21,1))) AS 
DECIMAL(23,1)) * CAST(CAST(100 AS DECIMAL(23,1)) AS DECIMAL(23,1))) AS 
DECIMAL(38,2)) / CAST(CAST(sum(CAST(ele_cnt_all AS BIGINT)) AS DECIMAL(20,0)) 
AS DECIMAL(38,2))), 2)#489])
+- *HashAggregate(keys=[dt#467, geohash_of_latlng#472], 
functions=[sum(cast(mt_cnt#476 as bigint)), sum(cast(ele_cnt#477 as bigint)), 
sum(cast(mt_cnt#476 as bigint)), sum(cast(mt_cnt_all#478 as bigint)), 
sum(cast(ele_cnt#477 as bigint)), sum(cast(ele_cnt_all#479 as bigint))])
   +- Exchange(coordinator id: 227998366) hashpartitioning(dt#467, 
geohash_of_latlng#472, 1000), coordinator[target post-shuffle partition size: 
200]
  +- *HashAggregate(keys=[dt#467, geohash_of_latlng#472], 
functions=[partial_sum(cast(mt_cnt#476 as bigint)), 
partial_sum(cast(ele_cnt#477 as bigint)), partial_sum(cast(mt_cnt#476 as 
bigint)), partial_sum(cast(mt_cnt_all#478 as bigint)), 
partial_sum(cast(ele_cnt#477 as bigint)), partial_sum(cast(ele_cnt_all#479 as 
bigint))])
 +- HiveTableScan [geohash_of_latlng#472, mt_cnt#476, ele_cnt#477, 
mt_cnt_all#478, ele_cnt_all#479, dt#467], MetastoreRelation temp, 
test_geohash_match_parquet
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19301: [SPARK-22084][SQL] Fix performance regression in ...

2017-09-21 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/19301#discussion_r140155406
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala 
---
@@ -38,7 +38,7 @@ import org.apache.spark.sql.internal.SQLConf
  * view resolution, in this way, we are able to get the correct 
view column ordering and
  * omit the extra columns that we don't require);
  *1.2. Else set the child output attributes to `queryOutput`.
- * 2. Map the `queryQutput` to view output by index, if the corresponding 
attributes don't match,
+ * 2. Map the `queryOutput` to view output by index, if the corresponding 
attributes don't match,
--- End diff --

It looks all the same?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18193: [SPARK-15616] [SQL] CatalogRelation should fallba...

2017-09-19 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/18193#discussion_r139861601
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
@@ -140,6 +141,62 @@ class DetermineTableStats(session: SparkSession) 
extends Rule[LogicalPlan] {
 }
 
 /**
+ *
+ * TODO: merge this with PruneFileSourcePartitions after we completely 
make hive as a data source.
+ */
+case class PruneHiveTablePartitions(
+session: SparkSession) extends Rule[LogicalPlan] with PredicateHelper {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
+case filter @ Filter(condition, relation: HiveTableRelation) if 
relation.isPartitioned =>
+  val predicates = splitConjunctivePredicates(condition)
+  val normalizedFilters = predicates.map { e =>
+e transform {
+  case a: AttributeReference =>
+a.withName(relation.output.find(_.semanticEquals(a)).get.name)
+}
+  }
+  val partitionSet = AttributeSet(relation.partitionCols)
+  val pruningPredicates = normalizedFilters.filter { predicate =>
+!predicate.references.isEmpty &&
+  predicate.references.subsetOf(partitionSet)
+  }
+  if (pruningPredicates.nonEmpty && 
session.sessionState.conf.fallBackToHdfsForStatsEnabled &&
+session.sessionState.conf.metastorePartitionPruning) {
+val prunedPartitions = 
session.sharedState.externalCatalog.listPartitionsByFilter(
+  relation.tableMeta.database,
+  relation.tableMeta.identifier.table,
+  pruningPredicates,
+  session.sessionState.conf.sessionLocalTimeZone)
+val sizeInBytes = try {
+  prunedPartitions.map { part =>
+val totalSize = 
part.parameters.get(StatsSetupConst.TOTAL_SIZE).map(_.toLong)
+val rawDataSize = 
part.parameters.get(StatsSetupConst.RAW_DATA_SIZE).map(_.toLong)
+if (totalSize.isDefined && totalSize.get > 0L) {
--- End diff --

I think we should first use rawDataSize, because 1MB orc file is equal to 
5MB textfile...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18193: [SPARK-15616] [SQL] CatalogRelation should fallba...

2017-09-17 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/18193#discussion_r139312866
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
@@ -139,6 +138,54 @@ class DetermineTableStats(session: SparkSession) 
extends Rule[LogicalPlan] {
 }
 
 /**
+ *
+ * TODO: merge this with PruneFileSourcePartitions after we completely 
make hive as a data source.
+ */
+case class PruneHiveTablePartitions(
+session: SparkSession) extends Rule[LogicalPlan] with PredicateHelper {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
+case filter @ Filter(condition, relation: HiveTableRelation) if 
relation.isPartitioned =>
+  val predicates = splitConjunctivePredicates(condition)
+  val normalizedFilters = predicates.map { e =>
+e transform {
+  case a: AttributeReference =>
+a.withName(relation.output.find(_.semanticEquals(a)).get.name)
+}
+  }
+  val partitionSet = AttributeSet(relation.partitionCols)
+  val pruningPredicates = normalizedFilters.filter { predicate =>
+!predicate.references.isEmpty &&
+  predicate.references.subsetOf(partitionSet)
+  }
+  if (pruningPredicates.nonEmpty && 
session.sessionState.conf.fallBackToHdfsForStatsEnabled &&
+session.sessionState.conf.metastorePartitionPruning) {
+val prunedPartitions = 
session.sharedState.externalCatalog.listPartitionsByFilter(
+  relation.tableMeta.database,
+  relation.tableMeta.identifier.table,
+  pruningPredicates,
+  session.sessionState.conf.sessionLocalTimeZone)
+val sizeInBytes = try {
+  prunedPartitions.map { part =>
+CommandUtils.calculateLocationSize(
--- End diff --

I think we should first check  whether partition.parameters  contains 
SetupConst.RAW_DATA_SIZE and  StatsSetupConst.TOTAL_SIZE) or not. If 
partition.parameters contains the size of the partition, use it instead of 
getConetSummary of hdfs


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19219: [SPARK-21993][SQL][WIP] Close sessionState when f...

2017-09-16 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/19219#discussion_r139281683
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala 
---
@@ -42,7 +42,7 @@ class HiveSessionStateBuilder(session: SparkSession, 
parentState: Option[Session
* Create a Hive aware resource loader.
*/
   override protected lazy val resourceLoader: HiveSessionResourceLoader = {
-val client: HiveClient = externalCatalog.client.newSession()
+val client: HiveClient = externalCatalog.client
--- End diff --

newSession is to isolate SparkSession in Spark ThriftServer, You must judge 
whether is thriftserver or not.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-09-03 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/17924
  
@dongjoon-hyun I have a question: does this orc data sources reader support 
a table contains multiple file format
for example:
table/
day=2017-09-01 RCFile
day=2017-09-02 ORCFile

ParquetFileFormat doesn't support this feature. 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-09-02 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/18270
  
@jinxing64  I think you may revert the changes in Spark, and use the same 
logic of grouping__id as hive. Keep the wrong result consistently as hive did.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-09-02 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/18270
  
@gatorsmile I had already tried to resolve grouping__id in  
ResolveFunctions. But ResolveFunctions is behind ResolveGroupingAnalytics. 
grouping__id may change in ResolveGroupingAnalytics.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18270: [SPARK-21055][SQL] replace grouping__id with grou...

2017-09-02 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/18270#discussion_r136695110
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/group-analytics.sql.out ---
@@ -223,12 +223,19 @@ grouping_id() can only be used with 
GroupingSets/Cube/Rollup;
 
 
 -- !query 16
-SELECT course, year, grouping__id FROM courseSales GROUP BY CUBE(course, 
year)
+SELECT course, year, grouping__id FROM courseSales GROUP BY CUBE(course, 
year) ORDER BY grouping__id, course, year
--- End diff --

thanks for your tips


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-09-02 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/18270
  
why failed? Couldn't I add order by?
```java
org.scalatest.exceptions.TestFailedException: Expected "...Y CUBE(course, 
year)[ ORDER BY grouping__id, course, year]", but got "...Y CUBE(course, 
year)[]" SQL query did not match for query #16 SELECT course, year, 
grouping__id FROM courseSales GROUP BY CUBE(course, year) ORDER BY 
grouping__id, course, year
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-08-31 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/18270
  
I can't see any comment at 77d4f7c?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-08-30 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/18270
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18270: [SPARK-21055][SQL] replace grouping__id with grou...

2017-08-30 Thread cenyuhai
GitHub user cenyuhai reopened a pull request:

https://github.com/apache/spark/pull/18270

[SPARK-21055][SQL] replace grouping__id with grouping_id()

## What changes were proposed in this pull request?
spark does not support grouping__id, it has grouping_id() instead.
But it is not convenient for hive user to change to spark-sql
so this pr is to replace grouping__id with grouping_id()
hive user need not to alter their scripts

## How was this patch tested?

test with SQLQuerySuite.scala


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cenyuhai/spark SPARK-21055

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18270.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18270


commit 36ff72a44ad19efe5bcb2fe461a700d4c54f89ef
Author: CenYuhai <yuhai@ele.me>
Date:   2017-08-30T15:22:13Z

eplace grouping__id with grouping_id()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19087: [SPARK-21055][SQL] replace grouping__id with grou...

2017-08-30 Thread cenyuhai
Github user cenyuhai closed the pull request at:

https://github.com/apache/spark/pull/19087


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19087: [SPARK-21055][SQL] replace grouping__id with grou...

2017-08-30 Thread cenyuhai
GitHub user cenyuhai opened a pull request:

https://github.com/apache/spark/pull/19087

[SPARK-21055][SQL] replace grouping__id with grouping_id()

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cenyuhai/spark SPARK-21055

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19087.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19087


commit 36ff72a44ad19efe5bcb2fe461a700d4c54f89ef
Author: CenYuhai <yuhai@ele.me>
Date:   2017-08-30T15:22:13Z

eplace grouping__id with grouping_id()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18270: [SPARK-21055][SQL] replace grouping__id with grou...

2017-08-30 Thread cenyuhai
Github user cenyuhai closed the pull request at:

https://github.com/apache/spark/pull/18270


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18270: [SPARK-21055][SQL] replace grouping__id with grou...

2017-08-30 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/18270#discussion_r136082775
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -954,6 +951,12 @@ class Analyzer(
 try {
   expr transformUp {
 case GetColumnByOrdinal(ordinal, _) => plan.output(ordinal)
+case u @ UnresolvedAttribute(_) if resolver(u.name, 
VirtualColumn.hiveGroupingIdName) =>
--- End diff --

I don't think I can do it,because ResolveFunctions is behind 
ResolveGroupingAnalytics


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-08-30 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/18270
  
Ok,I will update it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18193: [SPARK-15616] [SQL] CatalogRelation should fallba...

2017-06-25 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/18193#discussion_r123903846
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
@@ -139,6 +138,54 @@ class DetermineTableStats(session: SparkSession) 
extends Rule[LogicalPlan] {
   }
 }
 
+case class DeterminePartitionedTableStats(
+session: SparkSession) extends Rule[LogicalPlan] with PredicateHelper {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
+case filter @ Filter(condition, relation: CatalogRelation)
+  if DDLUtils.isHiveTable(relation.tableMeta) && 
relation.isPartitioned =>
+  val predicates = splitConjunctivePredicates(condition)
+  val normalizedFilters = predicates.map { e =>
+e transform {
+  case a: AttributeReference =>
+a.withName(relation.output.find(_.semanticEquals(a)).get.name)
+}
+  }
+  val partitionSet = AttributeSet(relation.partitionCols)
+  val pruningPredicates = normalizedFilters.filter { predicate =>
+!predicate.references.isEmpty &&
+  predicate.references.subsetOf(partitionSet)
+  }
+  if (pruningPredicates.nonEmpty && 
session.sessionState.conf.fallBackToHdfsForStatsEnabled &&
+session.sessionState.conf.metastorePartitionPruning) {
+val prunedPartitions = 
session.sharedState.externalCatalog.listPartitionsByFilter(
+  relation.tableMeta.database,
+  relation.tableMeta.identifier.table,
+  pruningPredicates,
+  session.sessionState.conf.sessionLocalTimeZone)
+val hiveTable = HiveClientImpl.toHiveTable(relation.tableMeta)
+val partitions = 
prunedPartitions.map(HiveClientImpl.toHivePartition(_, hiveTable))
+val sizeInBytes = try {
+  val hadoopConf = session.sessionState.newHadoopConf()
+  partitions.map { partition =>
+val fs: FileSystem = 
partition.getDataLocation.getFileSystem(hadoopConf)
+fs.getContentSummary(partition.getDataLocation).getLength
+  }.sum
--- End diff --

if there are too many partitions, it will be very slow.
can you add a check that whether the sum is larger than threshold, if true 
then break.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16832: [SPARK-19490][SQL] ignore case sensitivity when filterin...

2017-06-16 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/16832
  
@taklwu this pr is completed, you can merge this pr by yourself. A 
committer told me that  other pr has fixed this bug, my pr will not be 
merged..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-06-12 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/18270
  
why it failed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18270: [SPARK-21055][SQL] replace grouping__id with grou...

2017-06-11 Thread cenyuhai
GitHub user cenyuhai opened a pull request:

https://github.com/apache/spark/pull/18270

[SPARK-21055][SQL] replace grouping__id with grouping_id()

## What changes were proposed in this pull request?
spark does not support grouping__id, it has grouping_id() instead.
But it is not convenient for hive user to change to spark-sql
so this pr is to replace grouping__id with grouping_id()
hive user need not to alter their scripts

## How was this patch tested?

test with SQLQuerySuite.scala


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cenyuhai/spark SPARK-21055

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18270.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18270


commit 6fd567ca8d2d9e302612f281f4143033efa2c156
Author: cenyuhai <261810...@qq.com>
Date:   2017-06-11T12:04:05Z

replace grouping__id with grouping_id()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17362: [SPARK-20033][SQL] support hive permanent function

2017-03-23 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/17362
  
Ok, I think @weiqingy 's pr will resolve this problem


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17362: [SPARK-20033][SQL] support hive permanent functio...

2017-03-23 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/17362#discussion_r107828225
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala ---
@@ -135,6 +142,35 @@ private[sql] class HiveSessionCatalog(
 }
   }
 
+  override def loadFunctionResources(resources: Seq[FunctionResource]): 
Unit = {
+logDebug("loading hive permanent function resources")
+resources.foreach { resource =>
+  val resourceType = resource.resourceType match {
+case JarResource =>
+  ResourceType.JAR
+case FileResource =>
+  ResourceType.FILE
+case ArchiveResource =>
+  ResourceType.ARCHIVE
+  }
+  val uri = if (!Shell.WINDOWS) {
+new URI(resource.uri)
+  }
+  else {
+new Path(resource.uri).toUri
+  }
+  val scheme = if (uri.getScheme == null) null else 
uri.getScheme.toLowerCase
+  if (scheme == null || scheme == "file") {
+functionResourceLoader.loadResource(resource)
+  } else {
+val sessionState = SessionState.get()
+val localPath = sessionState.add_resource(resourceType, 
resource.uri)
--- End diff --

no, I just use sessionState.add_resource to download the resouces


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17362: [SPARK-20033][SQL] support hive permanent function

2017-03-23 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/17362
  
@gatorsmile hi,spark just suport HIVE UDF which resources is in the local 
disk, or spark-sql --jars xxx.jar or something else. But I think spark don't 
support the hive permanent function which resources are in hdfs.
CREATE FUNCTION hdfs_udf AS 'xxx.udf' USING JAR 
'hdfs:///user/udf/.jar';
My pr just download the hdfs resources and add to the classpath.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17362: [SPARK-20033][SQL] support hive permanent functio...

2017-03-20 Thread cenyuhai
GitHub user cenyuhai opened a pull request:

https://github.com/apache/spark/pull/17362

[SPARK-20033][SQL] support hive permanent function

## What changes were proposed in this pull request?

support hive permanent function

## How was this patch tested?


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cenyuhai/spark SPARK-20033

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17362.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17362


commit 703d23e067e2d02f69bd4ec429106a426c6bf132
Author: cenyuhai <cenyu...@didichuxing.com>
Date:   2017-03-20T12:36:22Z

support hive permanent function




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16832: [SPARK-19490][SQL] change hive column names to lower cas...

2017-02-09 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/16832
  
@rxin I think it is safe, it is only used to check whether the schema 
contains the columns. Hive columns are not case-sensitive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16832: [SPARK-19490][SQL] change column names to lower c...

2017-02-07 Thread cenyuhai
GitHub user cenyuhai opened a pull request:

https://github.com/apache/spark/pull/16832

[SPARK-19490][SQL] change column names to lower case

## What changes were proposed in this pull request?
change column names to lower case

## How was this patch tested?


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cenyuhai/spark SPARK-19490

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16832.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16832


commit efb0b4bfc5643b5cd513682b035f24b5fa9656e9
Author: cenyuhai <261810...@qq.com>
Date:   2017-02-07T11:33:15Z

change column names to lower case




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16481: [SPARK-19092] [SQL] Save() API of DataFrameWriter...

2017-01-11 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16481#discussion_r95538748
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -413,17 +413,22 @@ case class DataSource(
 relation
   }
 
-  /** Writes the given [[DataFrame]] out to this [[DataSource]]. */
+  /**
+   * Writes the given [[DataFrame]] out to this [[DataSource]].
+   *
+   * @param isForWriteOnly Whether to just write the data without 
returning a [[BaseRelation]].
+   */
   def write(
   mode: SaveMode,
-  data: DataFrame): BaseRelation = {
+  data: DataFrame,
+  isForWriteOnly: Boolean = false): Option[BaseRelation] = {
 if 
(data.schema.map(_.dataType).exists(_.isInstanceOf[CalendarIntervalType])) {
   throw new AnalysisException("Cannot save interval data type into 
external storage.")
 }
 
 providingClass.newInstance() match {
   case dataSource: CreatableRelationProvider =>
-dataSource.createRelation(sparkSession.sqlContext, mode, 
caseInsensitiveOptions, data)
+Some(dataSource.createRelation(sparkSession.sqlContext, mode, 
caseInsensitiveOptions, data))
--- End diff --

maybe we can set a parameter here, let user to choose true or false


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15109: [SPARK-17501][CORE] Record executor heartbeat timestamp ...

2016-12-15 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/15109
  
Ok, I will close this PR. This is not a big problem


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15109: [SPARK-17501][CORE] Record executor heartbeat tim...

2016-12-15 Thread cenyuhai
Github user cenyuhai closed the pull request at:

https://github.com/apache/spark/pull/15109


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16099: [SPARK-18665][SQL] set statement state to error a...

2016-12-01 Thread cenyuhai
GitHub user cenyuhai opened a pull request:

https://github.com/apache/spark/pull/16099

[SPARK-18665][SQL] set statement state to error after user canceled job

## What changes were proposed in this pull request?
set statement state to error after user canceled job



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cenyuhai/spark spark-18665

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16099.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16099


commit e3a9b0896db7371252a66cc6b84bd7db921268c1
Author: cenyuhai <261810...@qq.com>
Date:   2016-12-01T10:26:43Z

set statement state to error after user canceled job




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16097: [SPARK-18665] set job to "ERROR" when job is canc...

2016-11-30 Thread cenyuhai
Github user cenyuhai closed the pull request at:

https://github.com/apache/spark/pull/16097


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16097: [SPARK-18665] set job to "ERROR" when job is canc...

2016-11-30 Thread cenyuhai
GitHub user cenyuhai opened a pull request:

https://github.com/apache/spark/pull/16097

[SPARK-18665]  set job to "ERROR" when job is canceled

## What changes were proposed in this pull request?
set job to "ERROR" when job is canceled

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cenyuhai/spark SPARK-18665

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16097.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16097


commit 869eaaf23f79eefbc6a8ff7a7b9efbc4a9f8c6b7
Author: 岑玉海 <261810...@qq.com>
Date:   2016-08-21T03:55:04Z

Merge pull request #8 from apache/master

merge latest code to my fork

commit b6b0d0a41c1aa59bc97a0aa438619d903b78b108
Author: 岑玉海 <261810...@qq.com>
Date:   2016-09-06T03:03:08Z

Merge pull request #9 from apache/master

Merge latest code to my fork

commit abd7924eab25b6dfdfd78c23a78dadcb3b9fbe1e
Author: 岑玉海 <261810...@qq.com>
Date:   2016-09-08T17:10:12Z

Merge pull request #10 from apache/master

Merge latest code to my fork

commit 4b460e218244cdb0884e73c5fca29cc43b516972
Author: cenyuhai <cenyu...@didichuxing.com>
Date:   2016-09-15T09:25:24Z

Merge remote-tracking branch 'remotes/apache/master'

commit 22cb0a6f6f60ffae4a449727959cdd2940699f8e
Author: 岑玉海 <261810...@qq.com>
Date:   2016-12-01T06:09:42Z

Merge pull request #12 from apache/master

Merge latest code to my branch

commit 8b9322d1f8421a1868d8d39472d1b6f3681b4de3
Author: cenyuhai <261810...@qq.com>
Date:   2016-12-01T06:36:26Z

set statement state to error after user canceled job




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15041: [SPARK-17488][CORE] TakeAndOrder will OOM when the data ...

2016-09-23 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/15041
  
Ok, I close this PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15041: [SPARK-17488][CORE] TakeAndOrder will OOM when th...

2016-09-23 Thread cenyuhai
Github user cenyuhai closed the pull request at:

https://github.com/apache/spark/pull/15041


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15041: [SPARK-17488][CORE] TakeAndOrder will OOM when the data ...

2016-09-23 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/15041
  
Driver is ok, but executor is running out of memory, this method is called 
by executor. Our maximum limit of memory is 15G.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15041: [SPARK-17488][CORE] TakeAndOrder will OOM when the data ...

2016-09-23 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/15041
  
In terms of data security, we can't get all data. Every sql should limit 10 
million records. But sometimes it will OOM... My demand is to avoid OOM. 
@srowen Do you have any idea?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15041: [SPARK-17488][CORE] TakeAndOrder will OOM when th...

2016-09-20 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15041#discussion_r79754097
  
--- Diff: core/src/main/scala/org/apache/spark/util/collection/Utils.scala 
---
@@ -30,10 +34,22 @@ private[spark] object Utils {
* Returns the first K elements from the input as defined by the 
specified implicit Ordering[T]
* and maintains the ordering.
*/
-  def takeOrdered[T](input: Iterator[T], num: Int)(implicit ord: 
Ordering[T]): Iterator[T] = {
-val ordering = new GuavaOrdering[T] {
-  override def compare(l: T, r: T): Int = ord.compare(l, r)
+  def takeOrdered[T](input: Iterator[T], num: Int,
+  ser: Serializer = SparkEnv.get.serializer)(implicit ord: 
Ordering[T]): Iterator[T] = {
+val context = TaskContext.get()
+if (context == null) {
+  val ordering = new GuavaOrdering[T] {
+override def compare(l: T, r: T): Int = ord.compare(l, r)
+  }
+  ordering.leastOf(input.asJava, num).iterator.asScala
+} else {
+  val sorter =
+new ExternalSorter[T, Any, Any](context, None, None, Some(ord), 
ser)
+  sorter.insertAll(input.map(x => (x, null)))
--- End diff --

1.In my case,  user execute a sql "select * from table sort by time limit 
1000", the k is very large, it's an extreme case. I need not change 
RDD.takeOrdered. I will limit the changes in limit.scala.
2. GuavaOrdering will sort all data in memory and then take top k. If there 
is enough memory, ExternalSorter will not spill.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15041: [SPARK-17488][CORE] TakeAndOrder will OOM when th...

2016-09-20 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15041#discussion_r79753174
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1384,14 +1385,15 @@ abstract class RDD[T: ClassTag](
* @param ord the implicit ordering for T
* @return an array of top elements
*/
-  def takeOrdered(num: Int)(implicit ord: Ordering[T]): Array[T] = 
withScope {
+  def takeOrdered(num: Int, ser: Serializer = SparkEnv.get.serializer)
--- End diff --

Yes, you are right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14969: [SPARK-17406][WEB UI] limit timeline executor events

2016-09-15 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/14969
  
OK, so it's still not sure that this will never happen again because 
SparkQA can't find out whether developer has added all excludes. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14969: [SPARK-17406][WEB UI] limit timeline executor events

2016-09-15 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/14969
  
Ah, I was confused by MimaExcludes.scala.  I asked @liancheng, he told me 
that just add these to  MimaExcludes.scala which is imported from spark 2.0. I 
see your HOTFIX, you just remove what I added. If I don't add this changes into 
MimaExcludes.scala, I can't compile project. Do you know the right way?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15109: [SPARK-17501][CORE] Record executor heartbeat tim...

2016-09-15 Thread cenyuhai
GitHub user cenyuhai opened a pull request:

https://github.com/apache/spark/pull/15109

[SPARK-17501][CORE] Record executor heartbeat timestamp when received  
heartbeat event.

## What changes were proposed in this pull request?
Record executor's latest heartbeat timestamp when receiving  heartbeat 
event.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cenyuhai/spark SPARK-17501

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15109.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15109


commit 869eaaf23f79eefbc6a8ff7a7b9efbc4a9f8c6b7
Author: 岑玉海 <261810...@qq.com>
Date:   2016-08-21T03:55:04Z

Merge pull request #8 from apache/master

merge latest code to my fork

commit b6b0d0a41c1aa59bc97a0aa438619d903b78b108
Author: 岑玉海 <261810...@qq.com>
Date:   2016-09-06T03:03:08Z

Merge pull request #9 from apache/master

Merge latest code to my fork

commit abd7924eab25b6dfdfd78c23a78dadcb3b9fbe1e
Author: 岑玉海 <261810...@qq.com>
Date:   2016-09-08T17:10:12Z

Merge pull request #10 from apache/master

Merge latest code to my fork

commit 4b460e218244cdb0884e73c5fca29cc43b516972
Author: cenyuhai <cenyu...@didichuxing.com>
Date:   2016-09-15T09:25:24Z

Merge remote-tracking branch 'remotes/apache/master'

commit 9839193f3ce234525040adcbd91c79eeac067c0e
Author: cenyuhai <cenyu...@didichuxing.com>
Date:   2016-09-15T09:34:43Z

fix registered blockmanager again and again




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14969: [SPARK-17406][WEB UI] limit timeline executor events

2016-09-13 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/14969
  
OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14969: [SPARK-17406][WEB UI] limit timeline executor eve...

2016-09-12 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14969#discussion_r78360879
  
--- Diff: core/src/main/scala/org/apache/spark/ui/exec/ExecutorsTab.scala 
---
@@ -38,47 +37,68 @@ private[ui] class ExecutorsTab(parent: SparkUI) extends 
SparkUITab(parent, "exec
   }
 }
 
+private[ui] case class ExecutorTaskSummary(
+var executorId: String,
+var totalCores: Int = 0,
+var tasksMax: Int = 0,
+var tasksActive: Int = 0,
+var tasksFailed: Int = 0,
+var tasksComplete: Int = 0,
+var duration: Long = 0L,
+var jvmGCTime: Long = 0L,
+var inputBytes: Long = 0L,
+var inputRecords: Long = 0L,
+var outputBytes: Long = 0L,
+var outputRecords: Long = 0L,
+var shuffleRead: Long = 0L,
+var shuffleWrite: Long = 0L,
+var executorLogs: Map[String, String] = Map.empty,
+var isAlive: Boolean = true
+)
+
 /**
  * :: DeveloperApi ::
  * A SparkListener that prepares information to be displayed on the 
ExecutorsTab
  */
 @DeveloperApi
 class ExecutorsListener(storageStatusListener: StorageStatusListener, 
conf: SparkConf)
 extends SparkListener {
-  val executorToTotalCores = HashMap[String, Int]()
-  val executorToTasksMax = HashMap[String, Int]()
-  val executorToTasksActive = HashMap[String, Int]()
-  val executorToTasksComplete = HashMap[String, Int]()
-  val executorToTasksFailed = HashMap[String, Int]()
-  val executorToDuration = HashMap[String, Long]()
-  val executorToJvmGCTime = HashMap[String, Long]()
-  val executorToInputBytes = HashMap[String, Long]()
-  val executorToInputRecords = HashMap[String, Long]()
-  val executorToOutputBytes = HashMap[String, Long]()
-  val executorToOutputRecords = HashMap[String, Long]()
-  val executorToShuffleRead = HashMap[String, Long]()
-  val executorToShuffleWrite = HashMap[String, Long]()
-  val executorToLogUrls = HashMap[String, Map[String, String]]()
-  val executorIdToData = HashMap[String, ExecutorUIData]()
+  var executorToTaskSummary = LinkedHashMap[String, ExecutorTaskSummary]()
+  var executorEvents = new ListBuffer[SparkListenerEvent]()
+
+  private val maxTimelineExecutors = 
conf.getInt("spark.ui.timeline.executors.maximum", 1000)
+  private val retainedDeadExecutors = 
conf.getInt("spark.ui.retainedDeadExecutors", 100)
+  private var deadExecutorCount = 0
--- End diff --

yes, thank you for your advise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14969: [SPARK-17406][WEB UI] limit timeline executor eve...

2016-09-12 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14969#discussion_r78358827
  
--- Diff: core/src/main/scala/org/apache/spark/ui/exec/ExecutorsTab.scala 
---
@@ -38,47 +37,68 @@ private[ui] class ExecutorsTab(parent: SparkUI) extends 
SparkUITab(parent, "exec
   }
 }
 
+private[ui] case class ExecutorTaskSummary(
+var executorId: String,
+var totalCores: Int = 0,
+var tasksMax: Int = 0,
+var tasksActive: Int = 0,
+var tasksFailed: Int = 0,
+var tasksComplete: Int = 0,
+var duration: Long = 0L,
+var jvmGCTime: Long = 0L,
+var inputBytes: Long = 0L,
+var inputRecords: Long = 0L,
+var outputBytes: Long = 0L,
+var outputRecords: Long = 0L,
+var shuffleRead: Long = 0L,
+var shuffleWrite: Long = 0L,
+var executorLogs: Map[String, String] = Map.empty,
+var isAlive: Boolean = true
+)
+
 /**
  * :: DeveloperApi ::
  * A SparkListener that prepares information to be displayed on the 
ExecutorsTab
  */
 @DeveloperApi
 class ExecutorsListener(storageStatusListener: StorageStatusListener, 
conf: SparkConf)
 extends SparkListener {
-  val executorToTotalCores = HashMap[String, Int]()
-  val executorToTasksMax = HashMap[String, Int]()
-  val executorToTasksActive = HashMap[String, Int]()
-  val executorToTasksComplete = HashMap[String, Int]()
-  val executorToTasksFailed = HashMap[String, Int]()
-  val executorToDuration = HashMap[String, Long]()
-  val executorToJvmGCTime = HashMap[String, Long]()
-  val executorToInputBytes = HashMap[String, Long]()
-  val executorToInputRecords = HashMap[String, Long]()
-  val executorToOutputBytes = HashMap[String, Long]()
-  val executorToOutputRecords = HashMap[String, Long]()
-  val executorToShuffleRead = HashMap[String, Long]()
-  val executorToShuffleWrite = HashMap[String, Long]()
-  val executorToLogUrls = HashMap[String, Map[String, String]]()
-  val executorIdToData = HashMap[String, ExecutorUIData]()
+  var executorToTaskSummary = LinkedHashMap[String, ExecutorTaskSummary]()
+  var executorEvents = new ListBuffer[SparkListenerEvent]()
+
+  private val maxTimelineExecutors = 
conf.getInt("spark.ui.timeline.executors.maximum", 1000)
+  private val retainedDeadExecutors = 
conf.getInt("spark.ui.retainedDeadExecutors", 100)
+  private var deadExecutorCount = 0
--- End diff --

If I dont' remove dead executor information, there may be a memory leak. If 
I remove immediately, we can hardly see the dead executor log from executor 
page.
Do you have any idea?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14969: [SPARK-17406][WEB UI] limit timeline executor eve...

2016-09-12 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14969#discussion_r78357353
  
--- Diff: core/src/main/scala/org/apache/spark/ui/exec/ExecutorsTab.scala 
---
@@ -38,47 +37,68 @@ private[ui] class ExecutorsTab(parent: SparkUI) extends 
SparkUITab(parent, "exec
   }
 }
 
+private[ui] case class ExecutorTaskSummary(
+var executorId: String,
+var totalCores: Int = 0,
+var tasksMax: Int = 0,
+var tasksActive: Int = 0,
+var tasksFailed: Int = 0,
+var tasksComplete: Int = 0,
+var duration: Long = 0L,
+var jvmGCTime: Long = 0L,
+var inputBytes: Long = 0L,
+var inputRecords: Long = 0L,
+var outputBytes: Long = 0L,
+var outputRecords: Long = 0L,
+var shuffleRead: Long = 0L,
+var shuffleWrite: Long = 0L,
+var executorLogs: Map[String, String] = Map.empty,
+var isAlive: Boolean = true
+)
+
 /**
  * :: DeveloperApi ::
  * A SparkListener that prepares information to be displayed on the 
ExecutorsTab
  */
 @DeveloperApi
 class ExecutorsListener(storageStatusListener: StorageStatusListener, 
conf: SparkConf)
 extends SparkListener {
-  val executorToTotalCores = HashMap[String, Int]()
-  val executorToTasksMax = HashMap[String, Int]()
-  val executorToTasksActive = HashMap[String, Int]()
-  val executorToTasksComplete = HashMap[String, Int]()
-  val executorToTasksFailed = HashMap[String, Int]()
-  val executorToDuration = HashMap[String, Long]()
-  val executorToJvmGCTime = HashMap[String, Long]()
-  val executorToInputBytes = HashMap[String, Long]()
-  val executorToInputRecords = HashMap[String, Long]()
-  val executorToOutputBytes = HashMap[String, Long]()
-  val executorToOutputRecords = HashMap[String, Long]()
-  val executorToShuffleRead = HashMap[String, Long]()
-  val executorToShuffleWrite = HashMap[String, Long]()
-  val executorToLogUrls = HashMap[String, Map[String, String]]()
-  val executorIdToData = HashMap[String, ExecutorUIData]()
+  var executorToTaskSummary = LinkedHashMap[String, ExecutorTaskSummary]()
+  var executorEvents = new ListBuffer[SparkListenerEvent]()
+
+  private val maxTimelineExecutors = 
conf.getInt("spark.ui.timeline.executors.maximum", 1000)
+  private val retainedDeadExecutors = 
conf.getInt("spark.ui.retainedDeadExecutors", 100)
+  private var deadExecutorCount = 0
--- End diff --

deadExecutorCount is used to count the dead executors. It is compatible for 
that the dead executors are removed immediately after spark 2.0. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14969: [SPARK-17406][WEB UI] limit timeline executor eve...

2016-09-12 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14969#discussion_r78355162
  
--- Diff: core/src/main/scala/org/apache/spark/ui/exec/ExecutorsTab.scala 
---
@@ -38,47 +37,68 @@ private[ui] class ExecutorsTab(parent: SparkUI) extends 
SparkUITab(parent, "exec
   }
 }
 
+private[ui] case class ExecutorTaskSummary(
+var executorId: String,
+var totalCores: Int = 0,
+var tasksMax: Int = 0,
+var tasksActive: Int = 0,
+var tasksFailed: Int = 0,
+var tasksComplete: Int = 0,
+var duration: Long = 0L,
+var jvmGCTime: Long = 0L,
+var inputBytes: Long = 0L,
+var inputRecords: Long = 0L,
+var outputBytes: Long = 0L,
+var outputRecords: Long = 0L,
+var shuffleRead: Long = 0L,
+var shuffleWrite: Long = 0L,
+var executorLogs: Map[String, String] = Map.empty,
+var isAlive: Boolean = true
+)
+
 /**
  * :: DeveloperApi ::
  * A SparkListener that prepares information to be displayed on the 
ExecutorsTab
  */
 @DeveloperApi
 class ExecutorsListener(storageStatusListener: StorageStatusListener, 
conf: SparkConf)
 extends SparkListener {
-  val executorToTotalCores = HashMap[String, Int]()
-  val executorToTasksMax = HashMap[String, Int]()
-  val executorToTasksActive = HashMap[String, Int]()
-  val executorToTasksComplete = HashMap[String, Int]()
-  val executorToTasksFailed = HashMap[String, Int]()
-  val executorToDuration = HashMap[String, Long]()
-  val executorToJvmGCTime = HashMap[String, Long]()
-  val executorToInputBytes = HashMap[String, Long]()
-  val executorToInputRecords = HashMap[String, Long]()
-  val executorToOutputBytes = HashMap[String, Long]()
-  val executorToOutputRecords = HashMap[String, Long]()
-  val executorToShuffleRead = HashMap[String, Long]()
-  val executorToShuffleWrite = HashMap[String, Long]()
-  val executorToLogUrls = HashMap[String, Map[String, String]]()
-  val executorIdToData = HashMap[String, ExecutorUIData]()
+  var executorToTaskSummary = LinkedHashMap[String, ExecutorTaskSummary]()
+  var executorEvents = new ListBuffer[SparkListenerEvent]()
+
+  private val maxTimelineExecutors = 
conf.getInt("spark.ui.timeline.executors.maximum", 1000)
--- End diff --

executorToTaskSummary is used by ExecutorsPage. Dead executors are still 
retained in ExecutorsPage. So I can't remove this executor's information  
immediately after it is removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14969: [SPARK-17406][WEB UI] limit timeline executor eve...

2016-09-12 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14969#discussion_r78352242
  
--- Diff: core/src/main/scala/org/apache/spark/ui/exec/ExecutorsTab.scala 
---
@@ -38,47 +37,68 @@ private[ui] class ExecutorsTab(parent: SparkUI) extends 
SparkUITab(parent, "exec
   }
 }
 
+private[ui] case class ExecutorTaskSummary(
+var executorId: String,
+var totalCores: Int = 0,
+var tasksMax: Int = 0,
+var tasksActive: Int = 0,
+var tasksFailed: Int = 0,
+var tasksComplete: Int = 0,
+var duration: Long = 0L,
+var jvmGCTime: Long = 0L,
+var inputBytes: Long = 0L,
+var inputRecords: Long = 0L,
+var outputBytes: Long = 0L,
+var outputRecords: Long = 0L,
+var shuffleRead: Long = 0L,
+var shuffleWrite: Long = 0L,
+var executorLogs: Map[String, String] = Map.empty,
+var isAlive: Boolean = true
+)
+
 /**
  * :: DeveloperApi ::
  * A SparkListener that prepares information to be displayed on the 
ExecutorsTab
  */
 @DeveloperApi
 class ExecutorsListener(storageStatusListener: StorageStatusListener, 
conf: SparkConf)
 extends SparkListener {
-  val executorToTotalCores = HashMap[String, Int]()
-  val executorToTasksMax = HashMap[String, Int]()
-  val executorToTasksActive = HashMap[String, Int]()
-  val executorToTasksComplete = HashMap[String, Int]()
-  val executorToTasksFailed = HashMap[String, Int]()
-  val executorToDuration = HashMap[String, Long]()
-  val executorToJvmGCTime = HashMap[String, Long]()
-  val executorToInputBytes = HashMap[String, Long]()
-  val executorToInputRecords = HashMap[String, Long]()
-  val executorToOutputBytes = HashMap[String, Long]()
-  val executorToOutputRecords = HashMap[String, Long]()
-  val executorToShuffleRead = HashMap[String, Long]()
-  val executorToShuffleWrite = HashMap[String, Long]()
-  val executorToLogUrls = HashMap[String, Map[String, String]]()
-  val executorIdToData = HashMap[String, ExecutorUIData]()
+  var executorToTaskSummary = LinkedHashMap[String, ExecutorTaskSummary]()
+  var executorEvents = new ListBuffer[SparkListenerEvent]()
+
+  private val maxTimelineExecutors = 
conf.getInt("spark.ui.timeline.executors.maximum", 1000)
--- End diff --

spark.ui.timeline.executors.maximum is similar to 
spark.ui.timeline.tasks.maximum. It is a configuration about ExecutorAdded 
event and ExecutorRemoved event, so spark.ui.timeline.retainedDeadExecutors is 
not suitable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14969: [SPARK-17406][WEB UI] limit timeline executor eve...

2016-09-12 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14969#discussion_r78351884
  
--- Diff: core/src/main/scala/org/apache/spark/ui/exec/ExecutorsTab.scala 
---
@@ -38,47 +37,68 @@ private[ui] class ExecutorsTab(parent: SparkUI) extends 
SparkUITab(parent, "exec
   }
 }
 
+private[ui] case class ExecutorTaskSummary(
+var executorId: String,
+var totalCores: Int = 0,
+var tasksMax: Int = 0,
+var tasksActive: Int = 0,
+var tasksFailed: Int = 0,
+var tasksComplete: Int = 0,
+var duration: Long = 0L,
+var jvmGCTime: Long = 0L,
+var inputBytes: Long = 0L,
+var inputRecords: Long = 0L,
+var outputBytes: Long = 0L,
+var outputRecords: Long = 0L,
+var shuffleRead: Long = 0L,
+var shuffleWrite: Long = 0L,
+var executorLogs: Map[String, String] = Map.empty,
+var isAlive: Boolean = true
+)
+
 /**
  * :: DeveloperApi ::
  * A SparkListener that prepares information to be displayed on the 
ExecutorsTab
  */
 @DeveloperApi
 class ExecutorsListener(storageStatusListener: StorageStatusListener, 
conf: SparkConf)
 extends SparkListener {
-  val executorToTotalCores = HashMap[String, Int]()
-  val executorToTasksMax = HashMap[String, Int]()
-  val executorToTasksActive = HashMap[String, Int]()
-  val executorToTasksComplete = HashMap[String, Int]()
-  val executorToTasksFailed = HashMap[String, Int]()
-  val executorToDuration = HashMap[String, Long]()
-  val executorToJvmGCTime = HashMap[String, Long]()
-  val executorToInputBytes = HashMap[String, Long]()
-  val executorToInputRecords = HashMap[String, Long]()
-  val executorToOutputBytes = HashMap[String, Long]()
-  val executorToOutputRecords = HashMap[String, Long]()
-  val executorToShuffleRead = HashMap[String, Long]()
-  val executorToShuffleWrite = HashMap[String, Long]()
-  val executorToLogUrls = HashMap[String, Map[String, String]]()
-  val executorIdToData = HashMap[String, ExecutorUIData]()
+  var executorToTaskSummary = LinkedHashMap[String, ExecutorTaskSummary]()
+  var executorEvents = new ListBuffer[SparkListenerEvent]()
+
+  private val maxTimelineExecutors = 
conf.getInt("spark.ui.timeline.executors.maximum", 1000)
+  private val retainedDeadExecutors = 
conf.getInt("spark.ui.retainedDeadExecutors", 100)
+  private var deadExecutorCount = 0
 
   def activeStorageStatusList: Seq[StorageStatus] = 
storageStatusListener.storageStatusList
 
   def deadStorageStatusList: Seq[StorageStatus] = 
storageStatusListener.deadStorageStatusList
 
   override def onExecutorAdded(executorAdded: SparkListenerExecutorAdded): 
Unit = synchronized {
 val eid = executorAdded.executorId
-executorToLogUrls(eid) = executorAdded.executorInfo.logUrlMap
-executorToTotalCores(eid) = executorAdded.executorInfo.totalCores
-executorToTasksMax(eid) = executorToTotalCores(eid) / 
conf.getInt("spark.task.cpus", 1)
-executorIdToData(eid) = new ExecutorUIData(executorAdded.time)
+val taskSummary = executorToTaskSummary.getOrElseUpdate(eid, 
ExecutorTaskSummary(eid))
+taskSummary.executorLogs = executorAdded.executorInfo.logUrlMap
+taskSummary.totalCores = executorAdded.executorInfo.totalCores
+taskSummary.tasksMax = taskSummary.totalCores / 
conf.getInt("spark.task.cpus", 1)
+executorEvents += executorAdded
+if (executorEvents.size > maxTimelineExecutors) {
+  executorEvents.remove(0)
+}
+if (deadExecutorCount > retainedDeadExecutors) {
+  val head = executorToTaskSummary.filter(e => !e._2.isAlive).head
+  executorToTaskSummary.remove(head._1)
+  deadExecutorCount -= 1
+}
   }
 
   override def onExecutorRemoved(
   executorRemoved: SparkListenerExecutorRemoved): Unit = synchronized {
-val eid = executorRemoved.executorId
-val uiData = executorIdToData(eid)
-uiData.finishTime = Some(executorRemoved.time)
-uiData.finishReason = Some(executorRemoved.reason)
+executorEvents += executorRemoved
+if (executorEvents.size > maxTimelineExecutors) {
+  executorEvents.remove(0)
+}
+deadExecutorCount += 1
+executorToTaskSummary.get(executorRemoved.executorId).map(e => 
e.isAlive = false)
--- End diff --

OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure a

[GitHub] spark issue #14737: [SPARK-17171][WEB UI] DAG will list all partitions in th...

2016-09-12 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/14737
  
thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15041: [SPARK-17488][CORE] TakeAndOrder will OOM when th...

2016-09-10 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15041#discussion_r78272386
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -493,8 +494,7 @@ abstract class RDD[T: ClassTag](
*
* @param weights weights for splits, will be normalized if they don't 
sum to 1
* @param seed random seed
-   *
-   * @return split RDDs in an array
+* @return split RDDs in an array
--- End diff --

I am so sorry for this. I will revert it later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15041: [SPARK-17488][CORE] TakeAndOrder will OOM when th...

2016-09-09 Thread cenyuhai
GitHub user cenyuhai opened a pull request:

https://github.com/apache/spark/pull/15041

[SPARK-17488][CORE] TakeAndOrder will OOM when the data is very large

## What changes were proposed in this pull request?
In function Utils.takeOrdered, it will sort all data in memory, when the 
data is very large, It will OOM. This pr is to add external sorter for function 
takeOrdered.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cenyuhai/spark SPARK-17488

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15041.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15041


commit 869eaaf23f79eefbc6a8ff7a7b9efbc4a9f8c6b7
Author: 岑玉海 <261810...@qq.com>
Date:   2016-08-21T03:55:04Z

Merge pull request #8 from apache/master

merge latest code to my fork

commit b6b0d0a41c1aa59bc97a0aa438619d903b78b108
Author: 岑玉海 <261810...@qq.com>
Date:   2016-09-06T03:03:08Z

Merge pull request #9 from apache/master

Merge latest code to my fork

commit abd7924eab25b6dfdfd78c23a78dadcb3b9fbe1e
Author: 岑玉海 <261810...@qq.com>
Date:   2016-09-08T17:10:12Z

Merge pull request #10 from apache/master

Merge latest code to my fork

commit 07ad91b02ad2e644788a7e432472e8c5384a29c6
Author: cenyuhai <cenyu...@didichuxing.com>
Date:   2016-09-10T05:17:49Z

add exterlnal sorter for takeOrdered function




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15014: [SPARK-17429][SQL] use ImplicitCastInputTypes with funct...

2016-09-09 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/15014
  
In this case , we store a business type by int (to decrease record size). 
for example, xxx are machine error types,  are application types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15014: [SPARK-17429][SQL] use ImplicitCastInputTypes with funct...

2016-09-09 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/15014
  
@hvanhovell Why hive is so popular? because hive is compatible and stable. 
From the user's point of view, hive is easy to use. Users need not care about 
types all the time. I agree that hive compatibility is not the goal, but making 
spark-sql easier to use is my goal, are you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14737: [SPARK-17171][WEB UI] DAG will list all partitions in th...

2016-09-08 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/14737
  
@srowen If it is ok,can you merge this pr to master?thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14969: [SPARK-17406][WEB UI] limit timeline executor events

2016-09-08 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/14969
  
@srowen I remove parallel maps, please review the latest codes.Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15014: [SPARK-17429][SQL] use ImplicitCastInputTypes wit...

2016-09-08 Thread cenyuhai
GitHub user cenyuhai opened a pull request:

https://github.com/apache/spark/pull/15014

[SPARK-17429][SQL] use ImplicitCastInputTypes with function Length

## What changes were proposed in this pull request?
select length(11);
select length(2.0);
these sql will return errors, but hive is ok.
this PR will support casting input types implicitly for function length
the correct result is:
select length(11) return 2
select length(2.0) return 3



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cenyuhai/spark SPARK-17429

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15014.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15014


commit 869eaaf23f79eefbc6a8ff7a7b9efbc4a9f8c6b7
Author: 岑玉海 <261810...@qq.com>
Date:   2016-08-21T03:55:04Z

Merge pull request #8 from apache/master

merge latest code to my fork

commit b6b0d0a41c1aa59bc97a0aa438619d903b78b108
Author: 岑玉海 <261810...@qq.com>
Date:   2016-09-06T03:03:08Z

Merge pull request #9 from apache/master

Merge latest code to my fork

commit abd7924eab25b6dfdfd78c23a78dadcb3b9fbe1e
Author: 岑玉海 <261810...@qq.com>
Date:   2016-09-08T17:10:12Z

Merge pull request #10 from apache/master

Merge latest code to my fork

commit 51fe8a1d141f700a2b417878c1c19af25d922198
Author: cenyuhai <cenyu...@didichuxing.com>
Date:   2016-09-08T17:46:37Z

use ImplicitCastInputTypes for  Length




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14969: [SPARK-17406][WEB UI] limit timeline executor eve...

2016-09-06 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14969#discussion_r77652830
  
--- Diff: core/src/main/scala/org/apache/spark/ui/exec/ExecutorsTab.scala 
---
@@ -70,15 +72,33 @@ class ExecutorsListener(storageStatusListener: 
StorageStatusListener, conf: Spar
 executorToLogUrls(eid) = executorAdded.executorInfo.logUrlMap
 executorToTotalCores(eid) = executorAdded.executorInfo.totalCores
 executorToTasksMax(eid) = executorToTotalCores(eid) / 
conf.getInt("spark.task.cpus", 1)
-executorIdToData(eid) = new ExecutorUIData(executorAdded.time)
+executorEvents += executorAdded
+if (executorEvents.size > MAX_EXECUTOR_LIMIT) {
+  executorEvents = executorEvents.drop(1)
+}
   }
 
   override def onExecutorRemoved(
   executorRemoved: SparkListenerExecutorRemoved): Unit = synchronized {
+executorEvents += executorRemoved
+if (executorEvents.size > MAX_EXECUTOR_LIMIT) {
+  executorEvents = executorEvents.drop(1)
+}
 val eid = executorRemoved.executorId
-val uiData = executorIdToData(eid)
-uiData.finishTime = Some(executorRemoved.time)
-uiData.finishReason = Some(executorRemoved.reason)
+executorToTotalCores.remove(eid)
--- End diff --

let me see.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14969: [SPARK-17406][WEB UI] limit timeline executor eve...

2016-09-06 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14969#discussion_r77652206
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala ---
@@ -123,55 +123,55 @@ private[ui] class AllJobsPage(parent: JobsTab) 
extends WebUIPage("") {
 }
   }
 
-  private def makeExecutorEvent(executorUIDatas: HashMap[String, 
ExecutorUIData]): Seq[String] = {
+  private def makeExecutorEvent(executorUIDatas: Seq[SparkListenerEvent]):
+  Seq[String] = {
 val events = ListBuffer[String]()
 executorUIDatas.foreach {
-  case (executorId, event) =>
+  case a: SparkListenerExecutorAdded =>
 val addedEvent =
   s"""
  |{
  |  'className': 'executor added',
  |  'group': 'executors',
- |  'start': new Date(${event.startTime}),
+ |  'start': new Date(${a.time}),
  |  'content': 'Executor ${executorId} added'
+ |'data-title="Executor ${a.executorId}' +
+ |'Added at ${UIUtils.formatDate(new Date(a.time))}"' +
+ |'data-html="true">Executor ${a.executorId} added'
  |}
""".stripMargin
 events += addedEvent
+  case e: SparkListenerExecutorRemoved =>
--- End diff --

yes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14969: [SPARK-17406][WEB UI] limit timeline executor eve...

2016-09-06 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14969#discussion_r77651971
  
--- Diff: core/src/main/scala/org/apache/spark/ui/exec/ExecutorsTab.scala 
---
@@ -70,15 +72,33 @@ class ExecutorsListener(storageStatusListener: 
StorageStatusListener, conf: Spar
 executorToLogUrls(eid) = executorAdded.executorInfo.logUrlMap
 executorToTotalCores(eid) = executorAdded.executorInfo.totalCores
 executorToTasksMax(eid) = executorToTotalCores(eid) / 
conf.getInt("spark.task.cpus", 1)
-executorIdToData(eid) = new ExecutorUIData(executorAdded.time)
+executorEvents += executorAdded
+if (executorEvents.size > MAX_EXECUTOR_LIMIT) {
+  executorEvents = executorEvents.drop(1)
--- End diff --

Because drop function don't really remove element, it just return a new 
collection.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14969: [SPARK-17406][WEB UI] limit timeline executor eve...

2016-09-06 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14969#discussion_r77651759
  
--- Diff: core/src/main/scala/org/apache/spark/ui/exec/ExecutorsTab.scala 
---
@@ -59,7 +59,9 @@ class ExecutorsListener(storageStatusListener: 
StorageStatusListener, conf: Spar
   val executorToShuffleRead = HashMap[String, Long]()
   val executorToShuffleWrite = HashMap[String, Long]()
   val executorToLogUrls = HashMap[String, Map[String, String]]()
-  val executorIdToData = HashMap[String, ExecutorUIData]()
+  var executorEvents = new mutable.ListBuffer[SparkListenerEvent]()
+
+  val MAX_EXECUTOR_LIMIT = 
conf.getInt("spark.ui.timeline.executors.maximum", 1000)
--- End diff --

No, it is abount executors(SparkListenerExecutorAdded and 
SparkListenerExecutorRemoved). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14969: [SPARK-17406][WEB-UI] limit timeline executor events

2016-09-06 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/14969
  
[error]  * method executorIdToData()scala.collection.mutable.HashMap in 
class org.apache.spark.ui.exec.ExecutorsListener does not have a correspondent 
in current version
[error]filter with: 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.ui.exec.ExecutorsListener.executorIdToData")

I have remove "executorIdToData", why it will failed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14969: [SPARK-17406][WEB-UI] limit timeline executor eve...

2016-09-05 Thread cenyuhai
GitHub user cenyuhai opened a pull request:

https://github.com/apache/spark/pull/14969

[SPARK-17406][WEB-UI] limit timeline executor events

## What changes were proposed in this pull request?
The job page will be too slow to open when there are thousands of executor 
events(added or removed). I found that in ExecutorsTab file, executorIdToData 
will not remove elements, it will increase all the time.Before this pr, it 
looks like 
[timeline1.png](https://issues.apache.org/jira/secure/attachment/12827112/timeline1.png).
 After this pr, it looks like 
[timeline2.png](https://issues.apache.org/jira/secure/attachment/12827113/timeline2.png)(we
 can set how many events will be displayed)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cenyuhai/spark SPARK-17406

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14969.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14969


commit c368f885aa539da622f95093c51205af11c9d7a1
Author: cenyuhai <cenyu...@didichuxing.com>
Date:   2016-09-06T05:25:53Z

limit timeline executor events




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14966: Merge pull request #8 from apache/master

2016-09-05 Thread cenyuhai
Github user cenyuhai closed the pull request at:

https://github.com/apache/spark/pull/14966


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14966: Merge pull request #8 from apache/master

2016-09-05 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/14966
  
Sorry, I make a mistake... I want to merge pull request to my fork.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14966: Merge pull request #8 from apache/master

2016-09-05 Thread cenyuhai
GitHub user cenyuhai opened a pull request:

https://github.com/apache/spark/pull/14966

Merge pull request #8 from apache/master

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)


## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)


merge latest code to my fork

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cenyuhai/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14966.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14966


commit 869eaaf23f79eefbc6a8ff7a7b9efbc4a9f8c6b7
Author: 岑玉海 <261810...@qq.com>
Date:   2016-08-21T03:55:04Z

Merge pull request #8 from apache/master

merge latest code to my fork




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14737: [SPARK-17171][WEB UI] DAG will list all partitions in th...

2016-08-25 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/14737
  
@srowen Why I set this value 2, because a "JOIN" action needs 2 
elements.Because Users always don't care about how many partitions the graphs 
has, they just want to know the relations from DAG graphs. I have changed the 
codes as @markhamstra said, it will not remove elements by default.  Users can 
set this value as they like. But I recommend that 2 is better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14737: [SPARK-17171][WEB UI] DAG will list all partitions in th...

2016-08-25 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/14737
  
I am very sorry about, the first picture is  for stage, the second picture 
is for job, but it is the same job "select count(1) from partitionedTables "


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14737: [SPARK-17171][WEB UI] DAG will list all partitions in th...

2016-08-25 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/14737
  
@srowen please review the latest codes, thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14739: [SPARK-17176][WEB UI]set default task sort column to "St...

2016-08-23 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/14739
  
@srowen can we make it an option, default by "Index", users can choose 
"Status" or anything else? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14737: [Spark-17171][WEB UI] DAG will list all partitions in th...

2016-08-21 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/14737
  
@srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14737: [Spark-17171][WEB UI] DAG will list all partition...

2016-08-21 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14737#discussion_r75593904
  
--- Diff: 
core/src/main/scala/org/apache/spark/ui/scope/RDDOperationGraph.scala ---
@@ -119,18 +119,47 @@ private[ui] object RDDOperationGraph extends Logging {
   { if (stage.attemptId == 0) "" else s" (attempt ${stage.attemptId})" 
}
 val rootCluster = new RDDOperationCluster(stageClusterId, 
stageClusterName)
 
+var rootNodeCount = 0
+val addRDDIds = new mutable.HashSet[Int]()
+val dropRDDIds = new mutable.HashSet[Int]()
+
+def isAllowed(ids: mutable.HashSet[Int], rdd: RDDInfo): Boolean = {
+  val parentIds = rdd.parentIds
+  if (parentIds.size == 0) {
+rootNodeCount < retainedNodes
+  } else {
+if (ids.size > 0) {
--- End diff --

yes, you are right...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14737: [Spark-17171][WEB UI] DAG will list all partition...

2016-08-21 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14737#discussion_r75593902
  
--- Diff: 
core/src/main/scala/org/apache/spark/ui/scope/RDDOperationGraph.scala ---
@@ -119,18 +119,47 @@ private[ui] object RDDOperationGraph extends Logging {
   { if (stage.attemptId == 0) "" else s" (attempt ${stage.attemptId})" 
}
 val rootCluster = new RDDOperationCluster(stageClusterId, 
stageClusterName)
 
+var rootNodeCount = 0
+val addRDDIds = new mutable.HashSet[Int]()
+val dropRDDIds = new mutable.HashSet[Int]()
+
+def isAllowed(ids: mutable.HashSet[Int], rdd: RDDInfo): Boolean = {
+  val parentIds = rdd.parentIds
+  if (parentIds.size == 0) {
+rootNodeCount < retainedNodes
+  } else {
+if (ids.size > 0) {
+parentIds.exists(id => ids.contains(id) || 
!dropRDDIds.contains(id))
+} else {
+true
+}
+  }
+}
+
 // Find nodes, edges, and operation scopes that belong to this stage
-stage.rddInfos.foreach { rdd =>
-  edges ++= rdd.parentIds.map { parentId => RDDOperationEdge(parentId, 
rdd.id) }
+stage.rddInfos.sortBy(_.id).foreach { rdd =>
+  val keepNode: Boolean = isAllowed(addRDDIds, rdd)
--- End diff --

OK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14739: [SPARK-17176][WEB UI]set default task sort column to "St...

2016-08-21 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/14739
  
YES, "FAILED" will come before "RUNNING".That is what I want, because we 
want to know why task will fail more than the need to sort by ID. 
ID is just a unique identifier for task, in the most cases, we don't case 
about it.But we will care about why tasks will fail, why tasks are running such 
a long time. 
When there are too many tasks, it is not easy to sort by status, it is very 
slow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14739: [SPARK-17176][WEB UI]set default task sort column...

2016-08-21 Thread cenyuhai
GitHub user cenyuhai opened a pull request:

https://github.com/apache/spark/pull/14739

[SPARK-17176][WEB UI]set default task sort column to "Status"

## What changes were proposed in this pull request?
Task are sorted by "Index" in Stage Page, but user are always concerned 
about tasks which are failed(see error messages) or still running (maybe it is 
skewed). When there are too many tasks, it is too slow to sort. So it is better 
to set the default sort column to ”Status“.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cenyuhai/spark SPARK-17176

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14739.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14739


commit c1f3c9e90d9c465eb356b01d136a963ff1e75fc3
Author: cenyuhai <cenyu...@didichuxing.com>
Date:   2016-08-21T06:49:15Z

set default task sort column to "Status"




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14737: [Spark-17171][WEB UI] DAG will list all partition...

2016-08-20 Thread cenyuhai
GitHub user cenyuhai opened a pull request:

https://github.com/apache/spark/pull/14737

[Spark-17171][WEB UI] DAG will list all partitions in the graph

## What changes were proposed in this pull request?
DAG will list all partitions in the graph, it is too slow and hard to see 
all graph.
Always we don't want to see all partitions,we just want to see the 
relations of DAG graph.
So I just show 2 root nodes for Rdds.

Before this PR, the DAG graph looks like 
[dag1.png](https://issues.apache.org/jira/secure/attachment/12824702/dag1.png), 
after this PR, the DAG graph looks like 
[dag2.png](https://issues.apache.org/jira/secure/attachment/12824703/dag2.png)






You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cenyuhai/spark SPARK-17171

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14737.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14737


commit 7991d7622260bc8e65ee9b934d376df2597c9a11
Author: cenyuhai <cenyu...@didichuxing.com>
Date:   2016-08-20T15:44:38Z

Just show 2 root partitions for a stage

commit 869eaaf23f79eefbc6a8ff7a7b9efbc4a9f8c6b7
Author: 岑玉海 <261810...@qq.com>
Date:   2016-08-21T03:55:04Z

Merge pull request #8 from apache/master

merge latest code to my fork

commit 595453fbb2ccdd4009821724adefb829a13890c7
Author: cenyuhai <cenyu...@didichuxing.com>
Date:   2016-08-21T04:06:06Z

Merge remote-tracking branch 'remotes/origin/master' into SPARK-17171




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13566][CORE] Avoid deadlock between Blo...

2016-05-07 Thread cenyuhai
Github user cenyuhai closed the pull request at:

https://github.com/apache/spark/pull/11546


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13566][CORE] Avoid deadlock between Blo...

2016-05-06 Thread cenyuhai
Github user cenyuhai commented on the pull request:

https://github.com/apache/spark/pull/11546#issuecomment-217467789
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13566][CORE] Avoid deadlock between Blo...

2016-05-06 Thread cenyuhai
Github user cenyuhai commented on the pull request:

https://github.com/apache/spark/pull/11546#issuecomment-217466944
  
@andrewor14 I alter the code as what you said, but the test failed because 
of timeout. It seems like that it is none of my business...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13566][CORE] Avoid deadlock between Blo...

2016-04-19 Thread cenyuhai
Github user cenyuhai commented on the pull request:

https://github.com/apache/spark/pull/11546#issuecomment-212208592
  
@jamesecahill I don't know whether @JoshRosen will provide any other patch 
for this issue. But I have fixed this bug in my production environment by this 
PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-13772][SQL] fix data type mismatch for ...

2016-04-15 Thread cenyuhai
Github user cenyuhai closed the pull request at:

https://github.com/apache/spark/pull/11605


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13566] Avoid deadlock between BlockMana...

2016-03-10 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/11546#discussion_r55668498
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -227,6 +228,17 @@ private[spark] class Executor(
   logError(errMsg)
 }
   }
+
+  if (releasedLocks.nonEmpty) {
+val errMsg =
+  s"${releasedLocks.size} block locks were not released by TID 
= $taskId:\n" +
--- End diff --

In my production environment,  when the storage memory is full,  there is a 
great probability of deadlock.  It is a temporary patch because JoshRosen add a 
read/write lock for block in https://github.com/apache/spark/pull/10705 for 
Spark 2.0. 

Two theads are removing the same block which result in deadlock.
BlockManager will first lock MemoryManager and wait to lock BlockInfo in 
function 'dropFromMemory', Execturo task lock BlockInfo and wait to lock 
MemoryManager calling 'memstore.remove(block)' in function 'removeBlock' or 
function 'removeOldBlocks'.

So just a ConcurrentHashMap to record the locks by tasks. In case of 
failure, release all lock after task complete. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-13772] fix data type mismatch for decim...

2016-03-10 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/11605#discussion_r55665217
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercionSuite.scala
 ---
@@ -299,6 +299,19 @@ class HiveTypeCoercionSuite extends PlanTest {
 )
   }
 
+  test("test for SPARK-13772") {
+val rule = HiveTypeCoercion.IfCoercion
+ruleTest(rule,
+  If(Literal(true), Literal(1.0), Cast(Literal(1.0), DecimalType(19, 
0))),
--- End diff --

It is ok in hive 1.2.1 and spark 1.4.1. 
test case:
select if(1=1, cast(1 as double), cast(1 as decimal)) from test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13566] Avoid deadlock between BlockMana...

2016-03-10 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/11546#discussion_r55664945
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -227,6 +228,17 @@ private[spark] class Executor(
   logError(errMsg)
 }
   }
+
+  if (releasedLocks.nonEmpty) {
+val errMsg =
+  s"${releasedLocks.size} block locks were not released by TID 
= $taskId:\n" +
--- End diff --

These codes are from https://github.com/apache/spark/pull/10705 by JoshRosen


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-13772] fix data type mismatch for decim...

2016-03-09 Thread cenyuhai
GitHub user cenyuhai opened a pull request:

https://github.com/apache/spark/pull/11605

[Spark-13772] fix data type mismatch for decimal

fix data type mismatch for decimal, patch for branch 1.6

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cenyuhai/spark SPARK-13772

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11605.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11605


commit 5236dcbc4afc31293a550d2fd87419bcc4ba7e61
Author: cenyuhai <cenyu...@didichuxing.com>
Date:   2016-03-06T11:30:43Z

temp patch for SPARK-13566

commit 8d539df190a21e7d3d93ab078866267ece2f1df0
Author: cenyuhai <cenyu...@didichuxing.com>
Date:   2016-03-09T12:52:19Z

fix data type mismatch for decimal

commit 42addd64bfb864ff59fecc5c4c11852d7cd49f60
Author: cenyuhai <cenyu...@didichuxing.com>
Date:   2016-03-09T12:57:26Z

rebase to branch 1.6




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13566] Avoid deadlock between BlockMana...

2016-03-06 Thread cenyuhai
GitHub user cenyuhai opened a pull request:

https://github.com/apache/spark/pull/11546

[SPARK-13566] Avoid deadlock between BlockManager and Executor Thread

Temp patch for branch 1.6, avoid deadlock between BlockManager and 
Executor Thread.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cenyuhai/spark SPARK-13566

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11546.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11546


commit 3f2ac8d6d977da2577c56f3bfcd51e8b053d952d
Author: cenyuhai <cenyu...@didichuxing.com>
Date:   2016-03-06T11:30:43Z

temp patch for SPARK-13566




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Multi user

2015-06-14 Thread cenyuhai
Github user cenyuhai closed the pull request at:

https://github.com/apache/spark/pull/6812


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Multi user

2015-06-14 Thread cenyuhai
Github user cenyuhai commented on the pull request:

https://github.com/apache/spark/pull/6812#issuecomment-111811208
  
I am so sorry, I just push my commits to my branch. I don't know it will 
happen. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Multi user

2015-06-14 Thread cenyuhai
GitHub user cenyuhai opened a pull request:

https://github.com/apache/spark/pull/6812

Multi user



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cenyuhai/spark MultiUser

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6812.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6812


commit e18d623d93505bc5fddeec0281ee3baef3638c3e
Author: Santiago M. Mola sa...@mola.io
Date:   2015-05-22T22:10:27Z

[SPARK-7724] [SQL] Support Intersect/Except in Catalyst DSL.

Author: Santiago M. Mola sa...@mola.io

Closes #6327 from smola/feature/catalyst-dsl-set-ops and squashes the 
following commits:

11db778 [Santiago M. Mola] [SPARK-7724] [SQL] Support Intersect/Except in 
Catalyst DSL.

(cherry picked from commit e4aef91fe70d6c9765d530b913a9d79103fc27ce)
Signed-off-by: Michael Armbrust mich...@databricks.com

commit d6cb0446304c5cc438e2bcabd8b39ea4c408a2da
Author: Liang-Chi Hsieh vii...@gmail.com
Date:   2015-05-22T22:39:58Z

[SPARK-7270] [SQL] Consider dynamic partition when inserting into hive table

JIRA: https://issues.apache.org/jira/browse/SPARK-7270

Author: Liang-Chi Hsieh vii...@gmail.com

Closes #5864 from viirya/dyn_partition_insert and squashes the following 
commits:

b5627df [Liang-Chi Hsieh] For comments.
3b21e4b [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' 
into dyn_partition_insert
8a4352d [Liang-Chi Hsieh] Consider dynamic partition when inserting into 
hive table.

(cherry picked from commit 126d7235de649ea5619dee6ad3a70970ee90df93)
Signed-off-by: Michael Armbrust mich...@databricks.com

commit afde4019b81b6feb57f0374f2cb097844ea4c04b
Author: Imran Rashid iras...@cloudera.com
Date:   2015-05-22T23:05:07Z

[SPARK-7760] add /json back into master  worker pages; add test

Author: Imran Rashid iras...@cloudera.com

Closes #6284 from squito/SPARK-7760 and squashes the following commits:

5e02d8a [Imran Rashid] style; increase timeout
9987399 [Imran Rashid] comment
8c7ed63 [Imran Rashid] add /json back into master  worker pages; add test

(cherry picked from commit 821254fb945c3e19540eb57fff1f656737ef484b)
Signed-off-by: Josh Rosen joshro...@databricks.com

commit d7660dc2f5c53dd6b3ffc57b05c0daa67a16f5f3
Author: Michael Armbrust mich...@databricks.com
Date:   2015-05-23T00:23:12Z

[SPARK-7834] [SQL] Better window error messages

Author: Michael Armbrust mich...@databricks.com

Closes #6363 from marmbrus/windowErrors and squashes the following commits:

516b02d [Michael Armbrust] [SPARK-7834] [SQL] Better window error messages

(cherry picked from commit 3c1305107a2d6d2de862e8b41dbad0e85585b1ef)
Signed-off-by: Michael Armbrust mich...@databricks.com

commit 0be6e3b3e60768012e2337d1cbf2967275007a11
Author: Andrew Or and...@databricks.com
Date:   2015-05-23T00:37:38Z

[SPARK-7771] [SPARK-7779] Dynamic allocation: lower default timeouts further

The default add time of 5s is still too slow for small jobs. Also, the 
current default remove time of 10 minutes seem rather high. This patch lowers 
both and rephrases a few log messages.

Author: Andrew Or and...@databricks.com

Closes #6301 from andrewor14/da-minor and squashes the following commits:

6d614a6 [Andrew Or] Lower log level
2811492 [Andrew Or] Log information when requests are canceled
5fcd3eb [Andrew Or] Fix tests
3320710 [Andrew Or] Lower timeouts + rephrase a few log messages

(cherry picked from commit 3d8760d76eae41dcaab8e9aeda19619f3d5f1596)
Signed-off-by: Andrew Or and...@databricks.com

commit 130ec219aa40cd8cebf4105053d4c92d840e127e
Author: Tathagata Das tathagata.das1...@gmail.com
Date:   2015-05-23T00:39:01Z

[SPARK-7788] Made KinesisReceiver.onStart() non-blocking

KinesisReceiver calls worker.run() which is a blocking call (while loop) as 
per source code of kinesis-client library - 
https://github.com/awslabs/amazon-kinesis-client/blob/v1.2.1/src/main/java/com/amazonaws/services/kinesis/clientlibrary/lib/worker/Worker.java.
This results in infinite loop while calling 
sparkStreamingContext.stop(stopSparkContext = false, stopGracefully = true) 
perhaps because ReceiverTracker is never able to register the receiver (it's 
receiverInfo field is a empty map) causing it to be stuck in infinite loop 
while waiting for running flag to be set to false.

Author: Tathagata Das tathagata.das1...@gmail.com

Closes #6348 from tdas/SPARK-7788 and squashes the following commits:

2584683 [Tathagata Das] Added receiver id in thread name
6cf1cd4 [Tathagata Das] Made KinesisReceiver.onStart non-blocking

(cherry

  1   2   >