Re: Setting up HDFS access on EMR

2016-10-13 Thread David Kincaid
Yes. That is the page I found it on. I am working with it tonight and am
able to get the dfs plugin working on EMR using the setup you described.
Thanks again for the help. We will be deploying Drill on both our always on
MapR cluster as well as on-demand AWS EMR clusters, so have both use cases.

- Dave

On Thu, Oct 13, 2016 at 9:25 PM, Ted Dunning  wrote:

> David,
>
> It isn't misleading on a MapR cluster. To limit the number of problems with
> programs that hard-code hdfs: at the beginning of file names, the default
> is to assign "hdfs" to the MapR FS driver. As such, on such a system you
> don't need a namenode address in the URL. My guess is that a MapR user
> wrote (and possibly even tested) wrote that sentence.
>
> A clarification is clearly needed.
>
> Btw... is this the page you found that info on?
>
> https://drill.apache.org/docs/file-system-storage-plugin/
>
>
>
>
> On Thu, Oct 13, 2016 at 7:16 AM, David Kincaid 
> wrote:
>
> > I understood what Ted said and saw that in the documentation. I was
> > pointing out that right after that it says on the local cluster you
> simply
> > need to change file:/// to hdfs://. That is a bit misleading.
> >
> > On Thu, Oct 13, 2016 at 9:12 AM, Andries Engelbrecht <
> > aengelbre...@maprtech.com> wrote:
> >
> > > See
> > >
> > > http://drill.apache.org/docs/file-system-storage-plugin/ <
> > > http://drill.apache.org/docs/file-system-storage-plugin/>
> > >
> > > "connection": "hdfs://:/"
> > >
> > >
> > > As Ted stated you need to point to the name node for HDFS.
> > >
> > >
> > > --Andries
> > >
> > >
> > > > On Oct 13, 2016, at 5:04 AM, David Kincaid 
> > > wrote:
> > > >
> > > > Thanks, Ted. The full URL I was using was http://. I'll give your
> > > > suggestion a try when I'm able to work on this again tonight. I
> guess I
> > > > took the documentation too literally when it said "To query a file on
> > > HDFS
> > > > from a node on the cluster, you can simply change the connection from
> > > > file:/// to hdfs:// in the dfs storage plugin."
> > > > Thanks again,
> > > > Dave
> > > >
> > > >
> > > > On Thu, Oct 13, 2016 at 12:39 AM, Ted Dunning  >
> > > wrote:
> > > >
> > > >> What is the full URL you used?
> > > >>
> > > >> With hdfs://, you need to supply a name node address.
> > > >>
> > > >> With file://, you don't.
> > > >>
> > > >> Contrarily, with maprfs:// you don't need an address since it is
> > > implied in
> > > >> the client connection.
> > > >>
> > > >>
> > > >>
> > > >> On Wed, Oct 12, 2016 at 6:29 PM, David Kincaid <
> > kincaid.d...@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> I have an Amazon EMR cluster launched with Drill loaded. I'm trying
> > to
> > > >>> configure the dfs storage plugin to use HDFS. The docs say that I
> > > should
> > > >>> simply need to change the "connection" setting from "file:///" to
> > > >> "hdfs://"
> > > >>> in order to use HDFS on the cluster that Drill is running on.
> > However,
> > > >> when
> > > >>> I do this and try to run a query I get an error that says
> > > >>> "org.apache.drill.common.exceptions.UserRemoteException:
> > > >>> SYSTEM ERROR: URISyntaxException: Expected authority at index 7:
> > > hdfs://
> > > >>> [Error Id: f9e6c674-4dd7-4c5d-b9a8-95b64b9dbaa3"
> > > >>>
> > > >>> Am I doing something wrong or is there an issue here?
> > > >>>
> > > >>> Thanks,
> > > >>>
> > > >>> Dave
> > > >>>
> > > >>
> > >
> > >
> >
>


Re: Setting up HDFS access on EMR

2016-10-13 Thread Ted Dunning
David,

It isn't misleading on a MapR cluster. To limit the number of problems with
programs that hard-code hdfs: at the beginning of file names, the default
is to assign "hdfs" to the MapR FS driver. As such, on such a system you
don't need a namenode address in the URL. My guess is that a MapR user
wrote (and possibly even tested) wrote that sentence.

A clarification is clearly needed.

Btw... is this the page you found that info on?

https://drill.apache.org/docs/file-system-storage-plugin/




On Thu, Oct 13, 2016 at 7:16 AM, David Kincaid 
wrote:

> I understood what Ted said and saw that in the documentation. I was
> pointing out that right after that it says on the local cluster you simply
> need to change file:/// to hdfs://. That is a bit misleading.
>
> On Thu, Oct 13, 2016 at 9:12 AM, Andries Engelbrecht <
> aengelbre...@maprtech.com> wrote:
>
> > See
> >
> > http://drill.apache.org/docs/file-system-storage-plugin/ <
> > http://drill.apache.org/docs/file-system-storage-plugin/>
> >
> > "connection": "hdfs://:/"
> >
> >
> > As Ted stated you need to point to the name node for HDFS.
> >
> >
> > --Andries
> >
> >
> > > On Oct 13, 2016, at 5:04 AM, David Kincaid 
> > wrote:
> > >
> > > Thanks, Ted. The full URL I was using was http://. I'll give your
> > > suggestion a try when I'm able to work on this again tonight. I guess I
> > > took the documentation too literally when it said "To query a file on
> > HDFS
> > > from a node on the cluster, you can simply change the connection from
> > > file:/// to hdfs:// in the dfs storage plugin."
> > > Thanks again,
> > > Dave
> > >
> > >
> > > On Thu, Oct 13, 2016 at 12:39 AM, Ted Dunning 
> > wrote:
> > >
> > >> What is the full URL you used?
> > >>
> > >> With hdfs://, you need to supply a name node address.
> > >>
> > >> With file://, you don't.
> > >>
> > >> Contrarily, with maprfs:// you don't need an address since it is
> > implied in
> > >> the client connection.
> > >>
> > >>
> > >>
> > >> On Wed, Oct 12, 2016 at 6:29 PM, David Kincaid <
> kincaid.d...@gmail.com>
> > >> wrote:
> > >>
> > >>> I have an Amazon EMR cluster launched with Drill loaded. I'm trying
> to
> > >>> configure the dfs storage plugin to use HDFS. The docs say that I
> > should
> > >>> simply need to change the "connection" setting from "file:///" to
> > >> "hdfs://"
> > >>> in order to use HDFS on the cluster that Drill is running on.
> However,
> > >> when
> > >>> I do this and try to run a query I get an error that says
> > >>> "org.apache.drill.common.exceptions.UserRemoteException:
> > >>> SYSTEM ERROR: URISyntaxException: Expected authority at index 7:
> > hdfs://
> > >>> [Error Id: f9e6c674-4dd7-4c5d-b9a8-95b64b9dbaa3"
> > >>>
> > >>> Am I doing something wrong or is there an issue here?
> > >>>
> > >>> Thanks,
> > >>>
> > >>> Dave
> > >>>
> > >>
> >
> >
>


Re: Setting up HDFS access on EMR

2016-10-13 Thread Andries Engelbrecht
See

http://drill.apache.org/docs/file-system-storage-plugin/ 


"connection": "hdfs://:/"


As Ted stated you need to point to the name node for HDFS.


--Andries


> On Oct 13, 2016, at 5:04 AM, David Kincaid  wrote:
> 
> Thanks, Ted. The full URL I was using was http://. I'll give your
> suggestion a try when I'm able to work on this again tonight. I guess I
> took the documentation too literally when it said "To query a file on HDFS
> from a node on the cluster, you can simply change the connection from
> file:/// to hdfs:// in the dfs storage plugin."
> Thanks again,
> Dave
> 
> 
> On Thu, Oct 13, 2016 at 12:39 AM, Ted Dunning  wrote:
> 
>> What is the full URL you used?
>> 
>> With hdfs://, you need to supply a name node address.
>> 
>> With file://, you don't.
>> 
>> Contrarily, with maprfs:// you don't need an address since it is implied in
>> the client connection.
>> 
>> 
>> 
>> On Wed, Oct 12, 2016 at 6:29 PM, David Kincaid 
>> wrote:
>> 
>>> I have an Amazon EMR cluster launched with Drill loaded. I'm trying to
>>> configure the dfs storage plugin to use HDFS. The docs say that I should
>>> simply need to change the "connection" setting from "file:///" to
>> "hdfs://"
>>> in order to use HDFS on the cluster that Drill is running on. However,
>> when
>>> I do this and try to run a query I get an error that says
>>> "org.apache.drill.common.exceptions.UserRemoteException:
>>> SYSTEM ERROR: URISyntaxException: Expected authority at index 7: hdfs://
>>> [Error Id: f9e6c674-4dd7-4c5d-b9a8-95b64b9dbaa3"
>>> 
>>> Am I doing something wrong or is there an issue here?
>>> 
>>> Thanks,
>>> 
>>> Dave
>>> 
>> 



Re: Setting up HDFS access on EMR

2016-10-13 Thread David Kincaid
I understood what Ted said and saw that in the documentation. I was
pointing out that right after that it says on the local cluster you simply
need to change file:/// to hdfs://. That is a bit misleading.

On Thu, Oct 13, 2016 at 9:12 AM, Andries Engelbrecht <
aengelbre...@maprtech.com> wrote:

> See
>
> http://drill.apache.org/docs/file-system-storage-plugin/ <
> http://drill.apache.org/docs/file-system-storage-plugin/>
>
> "connection": "hdfs://:/"
>
>
> As Ted stated you need to point to the name node for HDFS.
>
>
> --Andries
>
>
> > On Oct 13, 2016, at 5:04 AM, David Kincaid 
> wrote:
> >
> > Thanks, Ted. The full URL I was using was http://. I'll give your
> > suggestion a try when I'm able to work on this again tonight. I guess I
> > took the documentation too literally when it said "To query a file on
> HDFS
> > from a node on the cluster, you can simply change the connection from
> > file:/// to hdfs:// in the dfs storage plugin."
> > Thanks again,
> > Dave
> >
> >
> > On Thu, Oct 13, 2016 at 12:39 AM, Ted Dunning 
> wrote:
> >
> >> What is the full URL you used?
> >>
> >> With hdfs://, you need to supply a name node address.
> >>
> >> With file://, you don't.
> >>
> >> Contrarily, with maprfs:// you don't need an address since it is
> implied in
> >> the client connection.
> >>
> >>
> >>
> >> On Wed, Oct 12, 2016 at 6:29 PM, David Kincaid 
> >> wrote:
> >>
> >>> I have an Amazon EMR cluster launched with Drill loaded. I'm trying to
> >>> configure the dfs storage plugin to use HDFS. The docs say that I
> should
> >>> simply need to change the "connection" setting from "file:///" to
> >> "hdfs://"
> >>> in order to use HDFS on the cluster that Drill is running on. However,
> >> when
> >>> I do this and try to run a query I get an error that says
> >>> "org.apache.drill.common.exceptions.UserRemoteException:
> >>> SYSTEM ERROR: URISyntaxException: Expected authority at index 7:
> hdfs://
> >>> [Error Id: f9e6c674-4dd7-4c5d-b9a8-95b64b9dbaa3"
> >>>
> >>> Am I doing something wrong or is there an issue here?
> >>>
> >>> Thanks,
> >>>
> >>> Dave
> >>>
> >>
>
>


Re: Setting up HDFS access on EMR

2016-10-13 Thread David Kincaid
Thanks, Ted. The full URL I was using was http://. I'll give your
suggestion a try when I'm able to work on this again tonight. I guess I
took the documentation too literally when it said "To query a file on HDFS
from a node on the cluster, you can simply change the connection from
file:/// to hdfs:// in the dfs storage plugin."
Thanks again,
Dave


On Thu, Oct 13, 2016 at 12:39 AM, Ted Dunning  wrote:

> What is the full URL you used?
>
> With hdfs://, you need to supply a name node address.
>
> With file://, you don't.
>
> Contrarily, with maprfs:// you don't need an address since it is implied in
> the client connection.
>
>
>
> On Wed, Oct 12, 2016 at 6:29 PM, David Kincaid 
> wrote:
>
> > I have an Amazon EMR cluster launched with Drill loaded. I'm trying to
> > configure the dfs storage plugin to use HDFS. The docs say that I should
> > simply need to change the "connection" setting from "file:///" to
> "hdfs://"
> > in order to use HDFS on the cluster that Drill is running on. However,
> when
> > I do this and try to run a query I get an error that says
> > "org.apache.drill.common.exceptions.UserRemoteException:
> > SYSTEM ERROR: URISyntaxException: Expected authority at index 7: hdfs://
> > [Error Id: f9e6c674-4dd7-4c5d-b9a8-95b64b9dbaa3"
> >
> > Am I doing something wrong or is there an issue here?
> >
> > Thanks,
> >
> > Dave
> >
>


Re: Setting up HDFS access on EMR

2016-10-12 Thread Ted Dunning
What is the full URL you used?

With hdfs://, you need to supply a name node address.

With file://, you don't.

Contrarily, with maprfs:// you don't need an address since it is implied in
the client connection.



On Wed, Oct 12, 2016 at 6:29 PM, David Kincaid 
wrote:

> I have an Amazon EMR cluster launched with Drill loaded. I'm trying to
> configure the dfs storage plugin to use HDFS. The docs say that I should
> simply need to change the "connection" setting from "file:///" to "hdfs://"
> in order to use HDFS on the cluster that Drill is running on. However, when
> I do this and try to run a query I get an error that says
> "org.apache.drill.common.exceptions.UserRemoteException:
> SYSTEM ERROR: URISyntaxException: Expected authority at index 7: hdfs://
> [Error Id: f9e6c674-4dd7-4c5d-b9a8-95b64b9dbaa3"
>
> Am I doing something wrong or is there an issue here?
>
> Thanks,
>
> Dave
>