We use a large Impala/Kudu cluster for our analytic reporting. HDFS is not on the critical path for this, and after experimenting with a cluster configuration without it, we simply added a co-located HDFS cluster. It turns out we use HDFS for our dimension staging, and I'm guessing Impala uses it for things like exporting to parquet. In all, the HDFS component is stable, a seldom used but still essential part of the cluster. My advice? Don't fight it, and you will appreciate it later.
Cliff On Tue, Sep 10, 2019, 12:42 AM Dinushka <dinushk...@yahoo.com> wrote: > Hi, > I'm using in a clustered environment. > In my testing, I found that Impala only need HDFS on startup. > what kinds of dependency does Impala have on HDFS? > Thanks > > On Mon, Sep 9, 2019 at 9:20 PM, Jeszy > <jes...@gmail.com> wrote: > Hey, > > Impala right now needs HDFS out of the box. It would probably be a lot of > work to remove that dependency. > Kudu doesn't have an HDFS dependency, you could use Spark standalone or > Kudu's API to query it. > > HTH > > On 2019. Sep 5., Thu at 6:31, Dinushka <dinushk...@yahoo.com> wrote: > > Hi.. > I'm trying to only using Impala and Kudu without HDFS. But i get an error > saying "Currently configured default filesystem: ProxyLocalFileSystem. > fs.defaultFS (file:///) is not supported" only goes away when i install and > start HDFS. can Impala and Kudu work without HDFS? > >