Hi (new here),
I have a plan to use Drill to provide a sql abstraction layer (as an 
alternative to Hive). I like what I see so far, but I am a bit in the dark on 
Avro support. Whilst support for Avro is mentioned (almost in passing) in the 
documentation, there is very little details on its use in practice as opposed 
to Parquet references. I am using Apache NiFi to move data around and as final 
resting place Avro data on HDFS (as Nifi supports this nicely out of the box). 
I therefore want to use Drill to query this, but the tests I have done so far 
seem very slow when querying any substantial amount of avro data directly with 
Drill.

I am looking for some pointers on how best to do this – my idea was to have my 
data in avro (well defined schema), partitioned into HDFS directory/ sub 
directories but simple select * from `/location` limit 100 takes forever (many 
minutes). Am I to assume that I need to create tables/ views on top of the raw 
data for Drill to optimise its queries and if so, it doesn’t need to re-run 
these as batch jobs to update them?

Any pointers/ documentations/ blog links that would be welcome.

Thanks
Conrad


SecureData, combating cyber threats
______________________________________________________________________ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT

Reply via email to