[jira] [Commented] (SPARK-6910) Support for pushing predicates down to metastore for partition pruning

2016-04-11 Thread Mick Davies (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15235370#comment-15235370
 ] 

Mick Davies commented on SPARK-6910:


Hi, 

We are seeing something similar, but in our case subsequent queries are still 
expensive. Looking at HiveMetastoreCatalog.lookupRelation (we are using 1.5, 
but 1.6 looks the same) we seem to create a new MetastoreRelation for each 
query. Part of the analysis phase tries to convert this to a ParquetRelation 
using convertToParquetRelation which always calls 
metastoreRelation.getHiveQlPartitions() which gets all partition information. 
So every query incurs the cost of retrieving all partition info.

We don't understand how the code can use the cachedDataSourceTables effectively 
in the circumstances just described.

We changed HiveMetastoreCatalog.lookupRelation to use cache even if Hive table 
property "spark.sql.sources.provider" is unset which caused subsequent queries 
to use cached relation and therfore run more quickly.

Eg, changed 
{code}
if (table.properties.get("spark.sql.sources.provider").isDefined) 
{code}

to 
{code}
if (cachedDataSourceTables.getIfPresent(QualifiedTableName(databaseName, 
tblName).toLowerCase) != null ||
  table.properties.get("spark.sql.sources.provider").isDefined) 
{code}

Are we doing something wrong?




> Support for pushing predicates down to metastore for partition pruning
> --
>
> Key: SPARK-6910
> URL: https://issues.apache.org/jira/browse/SPARK-6910
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Cheolsoo Park
>Priority: Critical
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6910) Support for pushing predicates down to metastore for partition pruning

2015-10-14 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957194#comment-14957194
 ] 

Cheolsoo Park commented on SPARK-6910:
--

You're right that 2nd query is faster because the table/partition metadata is 
cached. Particularly, if you set {{spark.sql.hive.metastorePartitionPruning}} 
false (by default, false), Spark will cache metadata for all the partitions and 
any query against the same table will run faster even with a different 
predicate. See relevant code 
[here|https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L830-L839].

> Support for pushing predicates down to metastore for partition pruning
> --
>
> Key: SPARK-6910
> URL: https://issues.apache.org/jira/browse/SPARK-6910
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Cheolsoo Park
>Priority: Critical
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6910) Support for pushing predicates down to metastore for partition pruning

2015-10-14 Thread qian, chen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956508#comment-14956508
 ] 

qian, chen commented on SPARK-6910:
---

I'm using spark-sql (spark version 1.5.1 && hadoop 2.4.0) and found a very 
interesting thing:
in spark-sql shell:
at first I ran this, it took about 3 minutes
select * from table1 where date='20151010' and hour='12' and name='x' limit 5;
Time taken: 164.502 seconds

then I ran this, it only took 10s. date, hour and name are partition columns in 
this hive table. this table has >4000 partitions
select * from table1 where date='20151010' and hour='13' limit 5;
Time taken: 10.881 seconds
is it because that the first time I need to download all partition information 
from hive metastore? the second query is faster because all partitions are 
cached in memory now?

> Support for pushing predicates down to metastore for partition pruning
> --
>
> Key: SPARK-6910
> URL: https://issues.apache.org/jira/browse/SPARK-6910
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Cheolsoo Park
>Priority: Critical
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6910) Support for pushing predicates down to metastore for partition pruning

2015-07-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632373#comment-14632373
 ] 

Apache Spark commented on SPARK-6910:
-

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/7492

 Support for pushing predicates down to metastore for partition pruning
 --

 Key: SPARK-6910
 URL: https://issues.apache.org/jira/browse/SPARK-6910
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Michael Armbrust
Assignee: Cheolsoo Park
Priority: Critical
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6910) Support for pushing predicates down to metastore for partition pruning

2015-07-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628273#comment-14628273
 ] 

Apache Spark commented on SPARK-6910:
-

User 'piaozhexiu' has created a pull request for this issue:
https://github.com/apache/spark/pull/7421

 Support for pushing predicates down to metastore for partition pruning
 --

 Key: SPARK-6910
 URL: https://issues.apache.org/jira/browse/SPARK-6910
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Michael Armbrust
Assignee: Cheolsoo Park
Priority: Critical
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6910) Support for pushing predicates down to metastore for partition pruning

2015-07-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627409#comment-14627409
 ] 

Apache Spark commented on SPARK-6910:
-

User 'marmbrus' has created a pull request for this issue:
https://github.com/apache/spark/pull/7409

 Support for pushing predicates down to metastore for partition pruning
 --

 Key: SPARK-6910
 URL: https://issues.apache.org/jira/browse/SPARK-6910
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Michael Armbrust
Assignee: Cheolsoo Park
Priority: Critical
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6910) Support for pushing predicates down to metastore for partition pruning

2015-07-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613468#comment-14613468
 ] 

Apache Spark commented on SPARK-6910:
-

User 'piaozhexiu' has created a pull request for this issue:
https://github.com/apache/spark/pull/7216

 Support for pushing predicates down to metastore for partition pruning
 --

 Key: SPARK-6910
 URL: https://issues.apache.org/jira/browse/SPARK-6910
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Michael Armbrust
Assignee: Cheolsoo Park
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6910) Support for pushing predicates down to metastore for partition pruning

2015-06-26 Thread Olivier Toupin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603625#comment-14603625
 ] 

Olivier Toupin commented on SPARK-6910:
---

Anyone interested. I posted a workaround to this. It doesn't actually prune 
anything, but fix the latency issues. Tested in pre-production.

 Support for pushing predicates down to metastore for partition pruning
 --

 Key: SPARK-6910
 URL: https://issues.apache.org/jira/browse/SPARK-6910
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Michael Armbrust
Assignee: Cheolsoo Park
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6910) Support for pushing predicates down to metastore for partition pruning

2015-06-26 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603606#comment-14603606
 ] 

Apache Spark commented on SPARK-6910:
-

User 'oliviertoupin' has created a pull request for this issue:
https://github.com/apache/spark/pull/7049

 Support for pushing predicates down to metastore for partition pruning
 --

 Key: SPARK-6910
 URL: https://issues.apache.org/jira/browse/SPARK-6910
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Michael Armbrust
Assignee: Cheolsoo Park
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6910) Support for pushing predicates down to metastore for partition pruning

2015-06-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594706#comment-14594706
 ] 

Apache Spark commented on SPARK-6910:
-

User 'piaozhexiu' has created a pull request for this issue:
https://github.com/apache/spark/pull/6921

 Support for pushing predicates down to metastore for partition pruning
 --

 Key: SPARK-6910
 URL: https://issues.apache.org/jira/browse/SPARK-6910
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Michael Armbrust
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6910) Support for pushing predicates down to metastore for partition pruning

2015-05-23 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557628#comment-14557628
 ] 

Cheolsoo Park commented on SPARK-6910:
--

I don't know what's the best way to fix this, but here is how I work around it 
at work.

*Fortunately*, I have an internal metadata service (similar to HCat) at work 
that takes a string representation of filter expression and returns a list of 
partitions. So I can simply push down predicates in Analyzer, convert them to a 
string, and call my metadata service. This approach only involves isolated 
changes to Analyzer and Catalog.

*Unfortunately*, I couldn't make my solution fully work with Hive metastore. 
Even though Hive metastore client api has a similar method called 
{{getPartitionsByFilter()}}, it only works with string type partition keys, 
which is not so practical in real world.

Although this issue is no longer blocking me, I am curious how it can be fixed 
with Hive metastore. The following are code snippets that are not specific to 
my internal metastore service. Feel free to use it if it helps anyhow-
# I added predicate pushdown in Analyzer.ReloveRelations.
{code}
object ResolveRelations extends Rule[LogicalPlan] {
  def getTable(u: UnresolvedRelation, predicates: Seq[Expression] = Nil): 
LogicalPlan = {
try {
  catalog.lookupRelation(u.tableIdentifier, u.alias, predicates) // push 
down predicates into catalog
} catch {
  case _: NoSuchTableException =
u.failAnalysis(sno such table ${u.tableName})
}
  }

  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case i @ InsertIntoTable(u: UnresolvedRelation, _, _, _, _) =
  i.copy(table = EliminateSubQueries(getTable(u)))
case f @ Filter(condition, u: UnresolvedRelation) = { // find any filter 
operator that has a unresolved relation child.
  f match {
case FilteredOperation(predicates, _) = // extract conditions from 
filter operator
  f.copy(condition, getTable(u, predicates))
  }
}
case u: UnresolvedRelation =
  getTable(u)
  }
}
{code}
# In HiveMetastoreCatalog, I extract partition columns from predicates and 
build a string that represents the filter expression.
{code}
val partitionFilter: String =
  if (predicates.nonEmpty) {
val partitionKeys: Set[String] = franklinTable.partition_keys().toSet
val (pruningPredicates, _) = predicates.partition { predicate =
val refs: Set[String] = predicate.references.map(_.name).toSet
refs.nonEmpty  refs.subsetOf(partitionKeys) 
predicate.isInstanceOf[BinaryComparison]
}

pruningPredicates.foldLeft() { (str, expr) =
  toFranklinExpression(str, expr, fieldType) // convert predicates to a 
string representation
}
  } else {

  }
{code}


 Support for pushing predicates down to metastore for partition pruning
 --

 Key: SPARK-6910
 URL: https://issues.apache.org/jira/browse/SPARK-6910
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Michael Armbrust





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6910) Support for pushing predicates down to metastore for partition pruning

2015-05-06 Thread Ashwin Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531325#comment-14531325
 ] 

Ashwin Shankar commented on SPARK-6910:
---

+1, this is causing pretty bad user experience for us too. Hope we can get this 
in soon, thanks !

 Support for pushing predicates down to metastore for partition pruning
 --

 Key: SPARK-6910
 URL: https://issues.apache.org/jira/browse/SPARK-6910
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Michael Armbrust





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6910) Support for pushing predicates down to metastore for partition pruning

2015-05-05 Thread Yana Kadiyska (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529639#comment-14529639
 ] 

Yana Kadiyska commented on SPARK-6910:
--

Can you please provide an update on this -- status/maybe a target release? 
Spark-6904 was closed as a duplicate of this but this seems like a critical 
bug? We have a pretty large metastore (partitions are per month per customer 
with a few years of data) and Shark works OK but I cannot take advantage of the 
new cool versions of Spark until the Metastore interaction improves.

Any advice on a workaround would also be great...
I opened https://issues.apache.org/jira/browse/SPARK-6984 which is probably a 
dup of this.

 Support for pushing predicates down to metastore for partition pruning
 --

 Key: SPARK-6910
 URL: https://issues.apache.org/jira/browse/SPARK-6910
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Michael Armbrust





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6910) Support for pushing predicates down to metastore for partition pruning

2015-05-05 Thread Michael Armbrust (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529666#comment-14529666
 ] 

Michael Armbrust commented on SPARK-6910:
-

Target is Spark 1.5.  In Spark 1.3 a workaround is to do something like: 
{{sqlContext.table(tableName).registerTempTable(...)}} which caches the list 
of partitions in memory on the driver.  The initial pull is expensive but it is 
much faster after that.

 Support for pushing predicates down to metastore for partition pruning
 --

 Key: SPARK-6910
 URL: https://issues.apache.org/jira/browse/SPARK-6910
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Michael Armbrust





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org