turboFei removed a comment on issue #24992: [SPARK-28194][SQL] Judge whether to reorder joinKeys to prevent None.get in EnsureRequirements URL: https://github.com/apache/spark/pull/24992#issuecomment-507584403 @maropu A brief query: ``` SELECT TO_DATE(goodsid2buname.day) day, add_cart.goods_id, goodsid2buname.bu_name, add_cart.add_cart_uv FROM ( SELECT DISTINCT day, CAST(goods_id AS STRING) goods_id, bu_name FROM db.tbla ) goodsid2buname JOIN ( SELECT day, TO_DATE(cart_datetime) add_cart_day, goods_id, COUNT(DISTINCT account_id) add_cart_uv FROM db.tblb WHERE day = DATE_ADD(CURRENT_DATE(), -1) GROUP BY day, goods_id, TO_DATE(cart_datetime) ) add_cart ON goodsid2buname.day = add_cart.day AND goodsid2buname.goods_id = add_cart.goods_id AND goodsid2buname.day = add_cart.add_cart_day; ``` And the relative table `db.tbla` and `db.tblb` are all parquet tables created by `stored as parquet`. Their columns information are shown below. ``` > describe db.tbla; 19/07/02 16:45:29 INFO CodeGenerator: Code generated in 207.305477 ms load_time bigint NULL product_id bigint NULL goods_id bigint NULL bu_id bigint NULL bu_name string NULL is_virtual_bu bigint NULL day string NULL bu_type bigint NULL # Partition Information # col_name data_type comment day string NULL bu_type bigint NULL Time taken: 0.593 seconds, Fetched 12 row(s) 19/07/02 16:45:29 INFO SparkSQLCLIDriver: Time taken: 0.593 seconds, Fetched 12 row(s) > describe db.tblb; account_id string NULL sku_id string NULL goods_id string NULL backend_sku_id string NULL backend_goods_id string NULL bu_cat_id string NULL goods_num int NULL backend_goods_num int NULL price double NULL backend_price double NULL cart_alg_price double NULL cart_datetime timestamp NULL id string NULL day string NULL # Partition Information # col_name data_type comment day string NULL Time taken: 0.246 seconds, Fetched 17 row(s) ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
