We use a large Impala/Kudu cluster for our analytic reporting. HDFS is not
on the critical path for this, and after experimenting with a cluster
configuration without it, we simply added a co-located HDFS cluster. It
turns out we use HDFS for our dimension staging, and I'm guessing Impala
uses it for things like exporting to parquet. In all, the HDFS component is
stable, a seldom used but still essential part of the cluster. My advice?
Don't fight it, and you will appreciate it later.

Cliff

On Tue, Sep 10, 2019, 12:42 AM Dinushka <dinushk...@yahoo.com> wrote:

> Hi,
> I'm using in a clustered environment.
> In my testing, I found that Impala only need HDFS on startup.
> what kinds of dependency does Impala have on HDFS?
> Thanks
>
> On Mon, Sep 9, 2019 at 9:20 PM, Jeszy
> <jes...@gmail.com> wrote:
> Hey,
>
> Impala right now needs HDFS out of the box. It would probably be a lot of
> work to remove that dependency.
> Kudu doesn't have an HDFS dependency, you could use Spark standalone or
> Kudu's API to query it.
>
> HTH
>
> On 2019. Sep 5., Thu at 6:31, Dinushka <dinushk...@yahoo.com> wrote:
>
> Hi..
> I'm trying to only using Impala and Kudu without HDFS. But i get an error
> saying "Currently configured default filesystem: ProxyLocalFileSystem.
> fs.defaultFS (file:///) is not supported" only goes away when i install and
> start HDFS. can Impala and Kudu work without HDFS?
>
>

Reply via email to