[GitHub] spark pull request: SPARK-1487 [SQL] Support record filtering via ...

marmbrus Fri, 25 Apr 2014 12:07:51 -0700

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/511#issuecomment-41428607
  
    Cool, thanks for renaming it.
    
    @mateiz I don't think we should even include these hints in the docs 
(unless we find particularly useful ones) as I agree presenting too much 
complexity to users is a bad idea.  However, even just for our own 
benchmarking, recompiling to change these settings is just not feasible and 
it's really hard to predict performance without actually running things.  Also 
when I've talked about building catalyst to experienced database people, 
basically everyone said, "No matter how good you think your optimizer is, 
always make sure you have knobs to control it because it is going to be wrong."
    
    Having these hints in the language could maybe be nice, but I really don't 
think that is worth the engineering effort of not only changing the parser, but 
also making sure they get threaded through analysis, optimization and planning 
correctly.  Using language based hints would also would change if you are using 
`sql`, `hql`, or the DSL.
    
    Having a special conf mechanism that lets you set them on a per query basis 
would be nice.  I'm not sure how flexible the SparkConf infrastructure is in 
this regard, but might be something to consider.  I can imagine cases where 
this might even be useful for standard spark jobs.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1487 [SQL] Support record filtering via ...

Reply via email to