Hi (new here), I have a plan to use Drill to provide a sql abstraction layer (as an alternative to Hive). I like what I see so far, but I am a bit in the dark on Avro support. Whilst support for Avro is mentioned (almost in passing) in the documentation, there is very little details on its use in practice as opposed to Parquet references. I am using Apache NiFi to move data around and as final resting place Avro data on HDFS (as Nifi supports this nicely out of the box). I therefore want to use Drill to query this, but the tests I have done so far seem very slow when querying any substantial amount of avro data directly with Drill.
I am looking for some pointers on how best to do this – my idea was to have my data in avro (well defined schema), partitioned into HDFS directory/ sub directories but simple select * from `/location` limit 100 takes forever (many minutes). Am I to assume that I need to create tables/ views on top of the raw data for Drill to optimise its queries and if so, it doesn’t need to re-run these as batch jobs to update them? Any pointers/ documentations/ blog links that would be welcome. Thanks Conrad SecureData, combating cyber threats ______________________________________________________________________ The information contained in this message or any of its attachments may be privileged and confidential and intended for the exclusive use of the intended recipient. If you are not the intended recipient any disclosure, reproduction, distribution or other dissemination or use of this communications is strictly prohibited. The views expressed in this email are those of the individual and not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if followed up by a formal written quote. SecureData Europe Limited. Registered in England & Wales 04365896. Registered Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, ME16 9NT
