Thank you for clarification. What the performance implications for search queries of using HDFS vs. Kudu if storing large datasets ( ~10,000 records) per table? Does storing large datasets in Kudu improve search performance? Thanks again, -V
On Fri, Dec 6, 2019 at 12:51 PM Thomas Tauber-Marshall < tmarsh...@cloudera.com> wrote: > Yes, you can use Impala to run queries against data in HDFS. Kudu is not > required. > > By default, new tables will be created for HDFS. To create Kudu tables, or > control the file format that data is saved in HDFS for the table as, you > can use the "STORED AS" clause with CREATE TABLE. To control where in HDFS > the data is stored, you can use the LOCATION clause with CREATE TABLE. To > query data that is already in HDFS (rather than creating a new, empty > table) you can use EXTERNAL and LOCATION with CREATE TABLE. > > There are a bunch more details in the documentation: > http://impala.apache.org/docs/build/html/topics/impala_create_table.html > > > On Fri, Dec 6, 2019 at 9:43 AM l vic <lvic4...@gmail.com> wrote: > >> After first look at documentation and tutorial i am still confused with >> how to use/ configure storage backend for impala... Can I use impala sql to >> run queries against data in hdfs, or do i need backend data server like >> "kudu"? How to specify data storage in "create table" statement? >> Thank you, >> -V >> >