I don’t how the Impala external DataSource works. However, I am familiar with 
how Netezza works with EXTERNAL TABLE and REMOTESOURCE.

Efficient and easy data ingestion to Impala (via ODBC) is something our R, 
Python and SAS users miss.
You can get an idea at:
https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.load.doc/t_load_loading_data_remote_client_sys.html
 
<https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.load.doc/t_load_loading_data_remote_client_sys.html>
To ingest data today, users need to manage amongst others: Kerberos 
authentication, hdfs data copy, hdfs access rights and Sentry url privileges.

From subsequent discussion on this thread I understand what was done will not 
help here. However, I wanted to highlight this use case so you can still 
consider it in the future (in addition to access to “information schema”) and 
not close the "external source” case forever.


> On 7 Feb 2018, at 05:47, Jim Apple <[email protected]> wrote:
> 
> Is there an argument for documenting it and keeping it? Did it not meet the 
> need it was added for in the first place, or has that need deceased in 
> importance?
> 
> On Tue, Feb 6, 2018 at 7:29 PM Philip Zeyliger <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi folks,
> 
> I want to bring your attention to http://gerrit.cloudera.org:8080/9192 
> <http://gerrit.cloudera.org:8080/9192>, "IMPALA-6204: Remove external 
> DataSource". This is functionality that was never publicly documented and, to 
> my knowledge, is not in use by anyone. We'd like to remove it to reduce 
> complexity.
> 
> Please let me know if you've got concerns!
> 
> Thanks,
> 
> -- Philip

Reply via email to