And SPARK even reads ORC data very slowly. And in case the HIVE table is
partitioned, then it just hangs.


Regards,
Gourav

On Thu, Aug 11, 2016 at 6:02 PM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

>
>
> This does not work with CLUSTERED BY clause in Spark 2 now!
>
> CREATE TABLE test.dummy2
>  (
>      ID INT
>    , CLUSTERED INT
>    , SCATTERED INT
>    , RANDOMISED INT
>    , RANDOM_STRING VARCHAR(50)
>    , SMALL_VC VARCHAR(10)
>    , PADDING  VARCHAR(10)
> )
> CLUSTERED BY (ID) INTO 256 BUCKETS
> STORED AS ORC
> TBLPROPERTIES ( "orc.compress"="SNAPPY",
> "orc.create.index"="true",
> "orc.bloom.filter.columns"="ID",
> "orc.bloom.filter.fpp"="0.05",
> "orc.stripe.size"="268435456",
> "orc.row.index.stride"="10000" )
> scala> HiveContext.sql(sqltext)
> org.apache.spark.sql.catalyst.parser.ParseException:
> *Operation not allowed: CREATE TABLE ... CLUSTERED BY(line 2, pos 0)*
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

Reply via email to