On 28 Jan 2013, at 14:29, Edward Capriolo wrote:

Iirc hive.mapred.mode strict should prevent this. If not we should add it.

hi Edward,

Yes, that's indeed what the book claims (quoting):

  hive> SELECT * FROM fracture_act JOIN fracture_ads
 > WHERE fracture_act.planner_id = fracture_ads.planner_id;
FAILED: Error in semantic analysis: In strict mode, cartesian product is not allowed. If you really want to perform the operation,
  +set hive.mapred.mode=nonstrict+

I am about to re-enable this setting on my cluster (after fixing all the
queries that it broke, especially all the ORDER BY ones :-) but I hoped
it was visible right there in the query plan, or in some other way. If
Hive can detect it, it should be visible somewhere, right?

Thanks!

david


On Monday, January 28, 2013, David Morel <dmore...@gmail.com> wrote:
Hi everyone,

I had to kill some queries that were taking forever, and it turns out
they were doing cartesian products (missing ON clause on a JOIN).

I wonder how I could see that in the EXPLAIN output (which I still find a bit cryptic). Specifically, the stage that it was stuck in was this:

Stage: Stage-7
Map Reduce
Alias -> Map Operator Tree:
  $INTNAME
      Reduce Output Operator
        sort order:
        tag: 1
        value expressions:
              expr: _col1
              type: int
  $INTNAME1
      Reduce Output Operator
        sort order:
        tag: 0
        value expressions:
              expr: _col0
              type: bigint
              expr: _col1
              type: string
Reduce Operator Tree:
  Join Operator
    condition map:
         Inner Join 0 to 1
    condition expressions:
      0 {VALUE._col0} {VALUE._col1}
      1 {VALUE._col1}
    handleSkewJoin: false
    outputColumnNames: _col0, _col1, _col3
    File Output Operator
      compressed: true
      GlobalTableId: 0
      table:
          input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
          output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat

Is there anything in there that should have alerted me?

I found out by looking at the query, but I wonder if the query plan (if
I could read it) would have given me that information.

Thanks a lot

David Morel

Reply via email to