And SPARK even reads ORC data very slowly. And in case the HIVE table is partitioned, then it just hangs.
Regards, Gourav On Thu, Aug 11, 2016 at 6:02 PM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > > > This does not work with CLUSTERED BY clause in Spark 2 now! > > CREATE TABLE test.dummy2 > ( > ID INT > , CLUSTERED INT > , SCATTERED INT > , RANDOMISED INT > , RANDOM_STRING VARCHAR(50) > , SMALL_VC VARCHAR(10) > , PADDING VARCHAR(10) > ) > CLUSTERED BY (ID) INTO 256 BUCKETS > STORED AS ORC > TBLPROPERTIES ( "orc.compress"="SNAPPY", > "orc.create.index"="true", > "orc.bloom.filter.columns"="ID", > "orc.bloom.filter.fpp"="0.05", > "orc.stripe.size"="268435456", > "orc.row.index.stride"="10000" ) > scala> HiveContext.sql(sqltext) > org.apache.spark.sql.catalyst.parser.ParseException: > *Operation not allowed: CREATE TABLE ... CLUSTERED BY(line 2, pos 0)* > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > >