Re: Apache Drill querying IGFS-accelerated (H)DFS?

2016-02-07 Thread Vladimir Ozerov
Yep, will do.

On Fri, Feb 5, 2016 at 11:17 PM, Dmitriy Setrakyan <dsetrak...@apache.org>
wrote:

> Great to hear. Would be nice to document this process. Vladimir, do you
> mind taking a first crack at it and post it here for review?
>
> On Fri, Feb 5, 2016 at 12:02 PM, pshomov <pe...@activitystream.com> wrote:
>
>> Hi Jason,
>>
>> adding
>>
>> 
>>
>> fs.igfs.impl
>>
>>
>> org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem
>>
>> 
>>
>>
>> 
>>
>> fs.AbstractFileSystem.igfs.impl
>>
>>
>> org.apache.ignite.hadoop.fs.v2.IgniteHadoopFileSystem
>>
>> 
>>
>> to the core-site.xml in the Drill conf/ folder tight the whole thing
>> together ;)
>> Now Drill is able to use the igfs:// schema.
>>
>> Thank you guys so much for the tremendous help, I am amazed by this
>> community!
>> Keep it up!
>>
>> Best regards,
>>
>> Petar
>> ​
>>
>> --
>> View this message in context: Re: Apache Drill querying IGFS-accelerated
>> (H)DFS?
>> <http://apache-ignite-users.70518.x6.nabble.com/Apache-Drill-querying-IGFS-accelerated-H-DFS-tp2840p2866.html>
>> Sent from the Apache Ignite Users mailing list archive
>> <http://apache-ignite-users.70518.x6.nabble.com/> at Nabble.com.
>>
>
>


Re: Apache Drill querying IGFS-accelerated (H)DFS?

2016-02-05 Thread Vladimir Ozerov
ecutor.java:617)
> [na:1.8.0_40-ea]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40-ea]
>
>
>
> On Fri, Feb 5, 2016 at 4:18 PM, pshomov <pe...@activitystream.com> wrote:
>
>>
>> ​Hi Vladimir,
>>
>> My bad about that ifgs://, fixed it but it changed nothing.
>>
>> I don’t think Drill cares much about Hadoop settings. It never asked me
>> to point it to an installation or configuration of Hadoop. I believe they
>> have their own storage plugin mechanism and one of their built-in plugins
>> happens to be the HDFS one.
>>
>> Here is (part of) the Drill log
>>
>> 2016-02-05 13:14:03,507 [294b5fe3-8f63-2134-67e0-42f7111ead44:foreman]
>> ERROR o.a.d.exec.util.ImpersonationUtil - Failed to create DrillFileSystem
>> for proxy user: No FileSystem for scheme: igfs
>> java.io.IOException: No FileSystem for scheme: igfs
>> at
>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2644)
>> ~[hadoop-common-2.7.1.jar:na]
>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651)
>> ~[hadoop-common-2.7.1.jar:na]
>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
>> ~[hadoop-common-2.7.1.jar:na]
>> at
>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
>> ~[hadoop-common-2.7.1.jar:na]
>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
>> ~[hadoop-common-2.7.1.jar:na]
>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
>> ~[hadoop-common-2.7.1.jar:na]
>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
>> ~[hadoop-common-2.7.1.jar:na]
>> at
>> org.apache.drill.exec.store.dfs.DrillFileSystem.(DrillFileSystem.java:92)
>> ~[drill-java-exec-1.4.0.jar:1.4.0]
>> at
>> org.apache.drill.exec.util.ImpersonationUtil$2.run(ImpersonationUtil.java:213)
>> ~[drill-java-exec-1.4.0.jar:1.4.0]
>> at
>> org.apache.drill.exec.util.ImpersonationUtil$2.run(ImpersonationUtil.java:210)
>> ~[drill-java-exec-1.4.0.jar:1.4.0]
>> at java.security.AccessController.doPrivileged(Native Method)
>> ~[na:1.8.0_40-ea]
>> at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_40-ea]
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>> ~[hadoop-common-2.7.1.jar:na]
>> at
>> org.apache.drill.exec.util.ImpersonationUtil.createFileSystem(ImpersonationUtil.java:210)
>> [drill-java-exec-1.4.0.jar:1.4.0]
>> at
>> org.apache.drill.exec.util.ImpersonationUtil.createFileSystem(ImpersonationUtil.java:202)
>> [drill-java-exec-1.4.0.jar:1.4.0]
>> at
>> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory.accessible(WorkspaceSchemaFactory.java:150)
>> [drill-java-exec-1.4.0.jar:1.4.0]
>> at
>> org.apache.drill.exec.store.dfs.FileSystemSchemaFactory$FileSystemSchema.(FileSystemSchemaFactory.java:78)
>> [drill-java-exec-1.4.0.jar:1.4.0]
>> at
>> org.apache.drill.exec.store.dfs.FileSystemSchemaFactory.registerSchemas(FileSystemSchemaFactory.java:65)
>> [drill-java-exec-1.4.0.jar:1.4.0]
>> at
>> org.apache.drill.exec.store.dfs.FileSystemPlugin.registerSchemas(FileSystemPlugin.java:131)
>> [drill-java-exec-1.4.0.jar:1.4.0]
>> at
>> org.apache.drill.exec.store.StoragePluginRegistry$DrillSchemaFactory.registerSchemas(StoragePluginRegistry.java:403)
>> [drill-java-exec-1.4.0.jar:1.4.0]
>> at
>> org.apache.drill.exec.ops.QueryContext.getRootSchema(QueryContext.java:166)
>> [drill-java-exec-1.4.0.jar:1.4.0]
>> at
>> org.apache.drill.exec.ops.QueryContext.getRootSchema(QueryContext.java:155)
>> [drill-java-exec-1.4.0.jar:1.4.0]
>> at
>> org.apache.drill.exec.ops.QueryContext.getRootSchema(QueryContext.java:143)
>> [drill-java-exec-1.4.0.jar:1.4.0]
>> at
>> org.apache.drill.exec.ops.QueryContext.getNewDefaultSchema(QueryContext.java:129)
>> [drill-java-exec-1.4.0.jar:1.4.0]
>> at
>> org.apache.drill.exec.planner.sql.DrillSqlWorker.(DrillSqlWorker.java:93)
>> [drill-java-exec-1.4.0.jar:1.4.0]
>> at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:907)
>> [drill-java-exec-1.4.0.jar:1.4.0]
>> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:244)
>> [drill-java-exec-1.4.0.jar:1.4.0]
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> [na:1.8.0_40-ea]
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> [na:1.8.0_40-ea]
>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40-ea]
>> 2016-02-05 13:14:03,556 [294b5fe3-8f63-213

Re: Apache Drill querying IGFS-accelerated (H)DFS?

2016-02-05 Thread Jason Altekruse
oreman.runSQL(Foreman.java:907)
> > [drill-java-exec-1.4.0.jar:1.4.0]
> > at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:244)
> > [drill-java-exec-1.4.0.jar:1.4.0]
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > [na:1.8.0_40-ea]
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > [na:1.8.0_40-ea]
> > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40-ea]
> > 2016-02-05 13:14:03,556 [294b5fe3-8f63-2134-67e0-42f7111ead44:foreman]
> > ERROR o.a.drill.exec.work.foreman.Foreman - SYSTEM ERROR: IOException: No
> > FileSystem for scheme: igfs
> >
> >
> > [Error Id: 6c95179a-6d26-498c-905f-dc18509c1651 on 192.168.1.42:31010]
> > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
> > IOException: No FileSystem for scheme: igfs
> >
> >
> > I copied the same ignite jars that go into Hadoop to Drill just in case
> > but that did not help either.
> > I think the only way is to write a Drill storage plugin for Ignite. Or
> > somehow make the Ignite caching happen inside Hadoop and be totally
> > transparent to Drill.
> >
> > Thank you for detailed help, any further ideas are as always welcome ;)
> >
> > Best regards,
> >
> > Petar
> >
> > --
> > View this message in context: Re: Apache Drill querying IGFS-accelerated
> > (H)DFS?
> > <
> http://apache-ignite-users.70518.x6.nabble.com/Apache-Drill-querying-IGFS-accelerated-H-DFS-tp2840p2859.html
> >
> > Sent from the Apache Ignite Users mailing list archive
> > <http://apache-ignite-users.70518.x6.nabble.com/> at Nabble.com.
> >
>


Re: Apache Drill querying IGFS-accelerated (H)DFS?

2016-02-05 Thread Vladimir Ozerov
: igfs
> at
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2644)
> ~[hadoop-common-2.7.1.jar:na]
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651)
> ~[hadoop-common-2.7.1.jar:na]
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
> ~[hadoop-common-2.7.1.jar:na]
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
> ~[hadoop-common-2.7.1.jar:na]
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
> ~[hadoop-common-2.7.1.jar:na]
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
> ~[hadoop-common-2.7.1.jar:na]
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
> ~[hadoop-common-2.7.1.jar:na]
> at
> org.apache.drill.exec.store.dfs.DrillFileSystem.(DrillFileSystem.java:92)
> ~[drill-java-exec-1.4.0.jar:1.4.0]
> at
> org.apache.drill.exec.util.ImpersonationUtil$2.run(ImpersonationUtil.java:213)
> ~[drill-java-exec-1.4.0.jar:1.4.0]
> at
> org.apache.drill.exec.util.ImpersonationUtil$2.run(ImpersonationUtil.java:210)
> ~[drill-java-exec-1.4.0.jar:1.4.0]
> at java.security.AccessController.doPrivileged(Native Method)
> ~[na:1.8.0_40-ea]
> at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_40-ea]
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> ~[hadoop-common-2.7.1.jar:na]
> at
> org.apache.drill.exec.util.ImpersonationUtil.createFileSystem(ImpersonationUtil.java:210)
> [drill-java-exec-1.4.0.jar:1.4.0]
> at
> org.apache.drill.exec.util.ImpersonationUtil.createFileSystem(ImpersonationUtil.java:202)
> [drill-java-exec-1.4.0.jar:1.4.0]
> at
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory.accessible(WorkspaceSchemaFactory.java:150)
> [drill-java-exec-1.4.0.jar:1.4.0]
> at
> org.apache.drill.exec.store.dfs.FileSystemSchemaFactory$FileSystemSchema.(FileSystemSchemaFactory.java:78)
> [drill-java-exec-1.4.0.jar:1.4.0]
> at
> org.apache.drill.exec.store.dfs.FileSystemSchemaFactory.registerSchemas(FileSystemSchemaFactory.java:65)
> [drill-java-exec-1.4.0.jar:1.4.0]
> at
> org.apache.drill.exec.store.dfs.FileSystemPlugin.registerSchemas(FileSystemPlugin.java:131)
> [drill-java-exec-1.4.0.jar:1.4.0]
> at
> org.apache.drill.exec.store.StoragePluginRegistry$DrillSchemaFactory.registerSchemas(StoragePluginRegistry.java:403)
> [drill-java-exec-1.4.0.jar:1.4.0]
> at
> org.apache.drill.exec.ops.QueryContext.getRootSchema(QueryContext.java:166)
> [drill-java-exec-1.4.0.jar:1.4.0]
> at
> org.apache.drill.exec.ops.QueryContext.getRootSchema(QueryContext.java:155)
> [drill-java-exec-1.4.0.jar:1.4.0]
> at
> org.apache.drill.exec.ops.QueryContext.getRootSchema(QueryContext.java:143)
> [drill-java-exec-1.4.0.jar:1.4.0]
> at
> org.apache.drill.exec.ops.QueryContext.getNewDefaultSchema(QueryContext.java:129)
> [drill-java-exec-1.4.0.jar:1.4.0]
> at
> org.apache.drill.exec.planner.sql.DrillSqlWorker.(DrillSqlWorker.java:93)
> [drill-java-exec-1.4.0.jar:1.4.0]
> at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:907)
> [drill-java-exec-1.4.0.jar:1.4.0]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:244)
> [drill-java-exec-1.4.0.jar:1.4.0]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_40-ea]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_40-ea]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40-ea]
> 2016-02-05 13:14:03,556 [294b5fe3-8f63-2134-67e0-42f7111ead44:foreman]
> ERROR o.a.drill.exec.work.foreman.Foreman - SYSTEM ERROR: IOException: No
> FileSystem for scheme: igfs
>
>
> [Error Id: 6c95179a-6d26-498c-905f-dc18509c1651 on 192.168.1.42:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
> IOException: No FileSystem for scheme: igfs
>
>
> I copied the same ignite jars that go into Hadoop to Drill just in case
> but that did not help either.
> I think the only way is to write a Drill storage plugin for Ignite. Or
> somehow make the Ignite caching happen inside Hadoop and be totally
> transparent to Drill.
>
> Thank you for detailed help, any further ideas are as always welcome ;)
>
> Best regards,
>
> Petar
>
> --
> View this message in context: Re: Apache Drill querying IGFS-accelerated
> (H)DFS?
> <http://apache-ignite-users.70518.x6.nabble.com/Apache-Drill-querying-IGFS-accelerated-H-DFS-tp2840p2859.html>
> Sent from the Apache Ignite Users mailing list archive
> <http://apache-ignite-users.70518.x6.nabble.com/> at Nabble.com.
>


Re: Apache Drill querying IGFS-accelerated (H)DFS?

2016-02-05 Thread pshomov
Hi Jason,

adding



fs.igfs.impl

org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem






fs.AbstractFileSystem.igfs.impl

org.apache.ignite.hadoop.fs.v2.IgniteHadoopFileSystem



to the core-site.xml in the Drill conf/ folder tight the whole thing
together ;)
Now Drill is able to use the igfs:// schema.

Thank you guys so much for the tremendous help, I am amazed by this
community!
Keep it up!

Best regards,

Petar
​




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Apache-Drill-querying-IGFS-accelerated-H-DFS-tp2840p2866.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Apache Drill querying IGFS-accelerated (H)DFS?

2016-02-05 Thread Vladimir Ozerov
Petar,

IGFS configuration consists of two steps: starting Ignite node and
adjusting Hadoop configuration.

*1) Starting Ignite node:*

   - Download Apache Ignite Hadoop Accelerator (
   http://ignite.apache.org/download.cgi#binaries) and unpack it.
   - If you want to link IGFS and HDFS, please add the following property
   to the bean *org.apache.ignite.configuration.FileSystemConfiguration*
inside *config/default-config.xml*:








*/path/to/your/core-site.xml*









   - Run bin/ignite.sh

At this point you will have Ignite node which is able to accept IGFS
requests at port 10500 (see *config/default-config.xml*, section
*ipcEndpointConfiguration*).

*2) Configuring Hadoop:*

   - Copy the following 3 Ignite JARS to
   *${HADOOP_HOME}/share/hadoop/common/lib* so that Hadoop is able to
   instantiate IGFS client:
  - libs/ignite-core-[VERSION].jar
  - libs/ignite-shmem-1.0.0.jar
  - libs/ignite-hadoop/ignite-hadoop-[VERSION].jar


   - Go to your *core-site.xml* and set map "igfs" URL to IGFS client class:
   -

   
   fs.igfs.impl
   
org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem



   - At this point you should be able to start querying IGFS. E.g. "hadoop
   -fs ls igfs://igfs@/"

Once link between IGFS and HDFS is set, you can add IGFS URL "igfs://igfs@/"
to Apache Drill configuration and try quering data.

Please let me know if you have any further problems during setup.

Vladimir.


On Fri, Feb 5, 2016 at 12:34 PM, pshomov <pe...@activitystream.com> wrote:

> Hi Vladimir,
>
> Changed back fs.defaultFS (in my case to hdfs://localhost:9000) and the
> HDFS service is back on its feet ;)
> My problem is that Apache Drill does not accept igfs:// scheme (No
> FileSystem for scheme: figs . It knows pretty well the hdfs:// schema
> though. I am guessing that accessing the hdfs:// bypasses the IGFS
> acceleration altogether, right?
>
> Best regards,
>
> Petar
>
>
>
> ----------
> View this message in context: Re: Apache Drill querying IGFS-accelerated
> (H)DFS?
> <http://apache-ignite-users.70518.x6.nabble.com/Apache-Drill-querying-IGFS-accelerated-H-DFS-tp2840p2850.html>
>
> Sent from the Apache Ignite Users mailing list archive
> <http://apache-ignite-users.70518.x6.nabble.com/> at Nabble.com.
>


Re: Apache Drill querying IGFS-accelerated (H)DFS?

2016-02-05 Thread pshomov
Hi Vladimir,

Changed back fs.defaultFS (in my case to hdfs://localhost:9000) and the
HDFS service is back on its feet ;)
My problem is that Apache Drill does not accept igfs:// scheme (No
FileSystem for scheme: figs . It knows pretty well the hdfs:// schema
though. I am guessing that accessing the hdfs:// bypasses the IGFS
acceleration altogether, right?

Best regards,

Petar




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Apache-Drill-querying-IGFS-accelerated-H-DFS-tp2840p2850.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Apache Drill querying IGFS-accelerated (H)DFS?

2016-02-05 Thread pshomov
​Hi Vladimir,

Thank you for keeping very speedy responses to my questions! Much, much
appreciated!

I apparently missed the point where I am supposed to run an ignite node
outside of hadoop, I thought it would spin one in process. Anyways I
followed your instructions and setup a secondaryFileSystem (btw, your
sample was for code that is not released yet, so I used this instead






I think this should be fine, right?
), ran an ignite node and then started hadoop and was able to list my files
using

bin/hadoop fs -ls igfs://igfs@localhost:10500/


However the last part did not happen:

>Once link between IGFS and HDFS is set, you can add IGFS URL "igfs://igfs@/"
to Apache Drill configuration and try quering data.

Drill keeps insisting it knows nothing about igfs:// schema. When you say
apache Drill configuration do you mean opening
http://localhost:8047/storage/dfs and modifying that to be like this

{
  "type": "file",
  "enabled": true,
  "connection": "ifgs://igfs@localhost:10500",
  "workspaces": {
"petar": {
  "location": "/",
  "writable": false,
  "defaultInputFormat": null
}
  },
 ….
}


Or do mean some .conf file in the config folder of Drill?
Thank you for your tremendous help once again!

Best regards,

Petar




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Apache-Drill-querying-IGFS-accelerated-H-DFS-tp2840p2855.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Apache Drill querying IGFS-accelerated (H)DFS?

2016-02-05 Thread Vladimir Ozerov
Petar,

Yes, I mean setting igfs://igfs@localhost:10500 to Drill's config. I see in
your email that you typed "ifgs" instead of "igfs". Is it a typo in email
or in Drill configuration as well? Please try changing it to "igfs" and
possibly restart Drill instance because may be it simply didn't pick latest
changes yet.

Anyways, the general rule is that you need to register IGFS file system in
some Hadoop configuration file (usually this is core-site.xml):


fs.igfs.impl

org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem


... and then "feed" this configuration file to Drill somehow. As you tried
to execute a command over IGFS and it worked fine, it means that *Hadoop is
already configured correctly*. The main question is why Drill do not
respect Hadoop settings.

I do not have much experience with Drill, so could please try to looking
for any Drill properties which point to Hadoop configuration file(s) and
then check if these files really contain mentioned "fs.igfs.impl" property?

Vladimir.

On Fri, Feb 5, 2016 at 3:14 PM, pshomov <pe...@activitystream.com> wrote:

>
> ​Hi Vladimir,
>
> Thank you for keeping very speedy responses to my questions! Much, much
> appreciated!
>
> I apparently missed the point where I am supposed to run an ignite node
> outside of hadoop, I thought it would spin one in process. Anyways I
> followed your instructions and setup a secondaryFileSystem (btw, your
> sample was for code that is not released yet, so I used this instead
> 
>  class="org.apache.ignite.hadoop.fs.IgniteHadoopIgfsSecondaryFileSystem">
>  value="hdfs://localhost:9000"/>
>  value="/Users/petar/src/as/igfs/hadoop-2.7.1/etc/hadoop/core-site.xml"/>
> 
> 
> I think this should be fine, right?
> ), ran an ignite node and then started hadoop and was able to list my
> files using
>
> bin/hadoop fs -ls igfs://igfs@localhost:10500/
>
>
> However the last part did not happen:
>
> >Once link between IGFS and HDFS is set, you can add IGFS URL
> "igfs://igfs@/" to Apache Drill configuration and try quering data.
>
> Drill keeps insisting it knows nothing about igfs:// schema. When you say
> apache Drill configuration do you mean opening
> http://localhost:8047/storage/dfs and modifying that to be like this
>
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "ifgs://igfs@localhost:10500",
>   "workspaces": {
> "petar": {
>   "location": "/",
>   "writable": false,
>   "defaultInputFormat": null
> }
>   },
>  ….
> }
>
>
> Or do mean some .conf file in the config folder of Drill?
> Thank you for your tremendous help once again!
>
> Best regards,
>
> Petar
>
> --
> View this message in context: Re: Apache Drill querying IGFS-accelerated
> (H)DFS?
> <http://apache-ignite-users.70518.x6.nabble.com/Apache-Drill-querying-IGFS-accelerated-H-DFS-tp2840p2855.html>
> Sent from the Apache Ignite Users mailing list archive
> <http://apache-ignite-users.70518.x6.nabble.com/> at Nabble.com.
>


Re: Apache Drill querying IGFS-accelerated (H)DFS?

2016-02-05 Thread Vladimir Ozerov
Petar,

Looks like this could be what we need - Storage Plugin -
https://drill.apache.org/docs/plugin-configuration-basics/
Could you please try configuring new plugin for IGFS?

If the problem is still there, could you please provide detailed error
description and possibly logs?

Vladimir.

On Fri, Feb 5, 2016 at 3:40 PM, Vladimir Ozerov <voze...@gridgain.com>
wrote:

> Petar,
>
> Yes, I mean setting igfs://igfs@localhost:10500 to Drill's config. I see
> in your email that you typed "ifgs" instead of "igfs". Is it a typo in
> email or in Drill configuration as well? Please try changing it to "igfs"
> and possibly restart Drill instance because may be it simply didn't pick
> latest changes yet.
>
> Anyways, the general rule is that you need to register IGFS file system in
> some Hadoop configuration file (usually this is core-site.xml):
>
> 
> fs.igfs.impl
> 
> org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem
>
>
> ... and then "feed" this configuration file to Drill somehow. As you tried
> to execute a command over IGFS and it worked fine, it means that *Hadoop
> is already configured correctly*. The main question is why Drill do not
> respect Hadoop settings.
>
> I do not have much experience with Drill, so could please try to looking
> for any Drill properties which point to Hadoop configuration file(s) and
> then check if these files really contain mentioned "fs.igfs.impl" property?
>
> Vladimir.
>
> On Fri, Feb 5, 2016 at 3:14 PM, pshomov <pe...@activitystream.com> wrote:
>
>>
>> ​Hi Vladimir,
>>
>> Thank you for keeping very speedy responses to my questions! Much, much
>> appreciated!
>>
>> I apparently missed the point where I am supposed to run an ignite node
>> outside of hadoop, I thought it would spin one in process. Anyways I
>> followed your instructions and setup a secondaryFileSystem (btw, your
>> sample was for code that is not released yet, so I used this instead
>> 
>> > class="org.apache.ignite.hadoop.fs.IgniteHadoopIgfsSecondaryFileSystem">
>> > value="hdfs://localhost:9000"/>
>> > value="/Users/petar/src/as/igfs/hadoop-2.7.1/etc/hadoop/core-site.xml"/>
>> 
>> 
>> I think this should be fine, right?
>> ), ran an ignite node and then started hadoop and was able to list my
>> files using
>>
>> bin/hadoop fs -ls igfs://igfs@localhost:10500/
>>
>>
>> However the last part did not happen:
>>
>> >Once link between IGFS and HDFS is set, you can add IGFS URL
>> "igfs://igfs@/" to Apache Drill configuration and try quering data.
>>
>> Drill keeps insisting it knows nothing about igfs:// schema. When you say
>> apache Drill configuration do you mean opening
>> http://localhost:8047/storage/dfs and modifying that to be like this
>>
>> {
>>   "type": "file",
>>   "enabled": true,
>>   "connection": "ifgs://igfs@localhost:10500",
>>   "workspaces": {
>> "petar": {
>>   "location": "/",
>>   "writable": false,
>>   "defaultInputFormat": null
>> }
>>   },
>>  ….
>> }
>>
>>
>> Or do mean some .conf file in the config folder of Drill?
>> Thank you for your tremendous help once again!
>>
>> Best regards,
>>
>> Petar
>>
>> --
>> View this message in context: Re: Apache Drill querying IGFS-accelerated
>> (H)DFS?
>> <http://apache-ignite-users.70518.x6.nabble.com/Apache-Drill-querying-IGFS-accelerated-H-DFS-tp2840p2855.html>
>> Sent from the Apache Ignite Users mailing list archive
>> <http://apache-ignite-users.70518.x6.nabble.com/> at Nabble.com.
>>
>
>


Re: Apache Drill querying IGFS-accelerated (H)DFS?

2016-02-05 Thread pshomov
​Hi Vladimir,

My bad about that ifgs://, fixed it but it changed nothing.

I don’t think Drill cares much about Hadoop settings. It never asked me to
point it to an installation or configuration of Hadoop. I believe they have
their own storage plugin mechanism and one of their built-in plugins
happens to be the HDFS one.

Here is (part of) the Drill log

2016-02-05 13:14:03,507 [294b5fe3-8f63-2134-67e0-42f7111ead44:foreman]
ERROR o.a.d.exec.util.ImpersonationUtil - Failed to create DrillFileSystem
for proxy user: No FileSystem for scheme: igfs
java.io.IOException: No FileSystem for scheme: igfs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2644)
~[hadoop-common-2.7.1.jar:na]
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651)
~[hadoop-common-2.7.1.jar:na]
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
~[hadoop-common-2.7.1.jar:na]
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
~[hadoop-common-2.7.1.jar:na]
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
~[hadoop-common-2.7.1.jar:na]
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
~[hadoop-common-2.7.1.jar:na]
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
~[hadoop-common-2.7.1.jar:na]
at
org.apache.drill.exec.store.dfs.DrillFileSystem.(DrillFileSystem.java:92)
~[drill-java-exec-1.4.0.jar:1.4.0]
at
org.apache.drill.exec.util.ImpersonationUtil$2.run(ImpersonationUtil.java:213)
~[drill-java-exec-1.4.0.jar:1.4.0]
at
org.apache.drill.exec.util.ImpersonationUtil$2.run(ImpersonationUtil.java:210)
~[drill-java-exec-1.4.0.jar:1.4.0]
at java.security.AccessController.doPrivileged(Native Method)
~[na:1.8.0_40-ea]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_40-ea]
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
~[hadoop-common-2.7.1.jar:na]
at
org.apache.drill.exec.util.ImpersonationUtil.createFileSystem(ImpersonationUtil.java:210)
[drill-java-exec-1.4.0.jar:1.4.0]
at
org.apache.drill.exec.util.ImpersonationUtil.createFileSystem(ImpersonationUtil.java:202)
[drill-java-exec-1.4.0.jar:1.4.0]
at
org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory.accessible(WorkspaceSchemaFactory.java:150)
[drill-java-exec-1.4.0.jar:1.4.0]
at
org.apache.drill.exec.store.dfs.FileSystemSchemaFactory$FileSystemSchema.(FileSystemSchemaFactory.java:78)
[drill-java-exec-1.4.0.jar:1.4.0]
at
org.apache.drill.exec.store.dfs.FileSystemSchemaFactory.registerSchemas(FileSystemSchemaFactory.java:65)
[drill-java-exec-1.4.0.jar:1.4.0]
at
org.apache.drill.exec.store.dfs.FileSystemPlugin.registerSchemas(FileSystemPlugin.java:131)
[drill-java-exec-1.4.0.jar:1.4.0]
at
org.apache.drill.exec.store.StoragePluginRegistry$DrillSchemaFactory.registerSchemas(StoragePluginRegistry.java:403)
[drill-java-exec-1.4.0.jar:1.4.0]
at
org.apache.drill.exec.ops.QueryContext.getRootSchema(QueryContext.java:166)
[drill-java-exec-1.4.0.jar:1.4.0]
at
org.apache.drill.exec.ops.QueryContext.getRootSchema(QueryContext.java:155)
[drill-java-exec-1.4.0.jar:1.4.0]
at
org.apache.drill.exec.ops.QueryContext.getRootSchema(QueryContext.java:143)
[drill-java-exec-1.4.0.jar:1.4.0]
at
org.apache.drill.exec.ops.QueryContext.getNewDefaultSchema(QueryContext.java:129)
[drill-java-exec-1.4.0.jar:1.4.0]
at
org.apache.drill.exec.planner.sql.DrillSqlWorker.(DrillSqlWorker.java:93)
[drill-java-exec-1.4.0.jar:1.4.0]
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:907)
[drill-java-exec-1.4.0.jar:1.4.0]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:244)
[drill-java-exec-1.4.0.jar:1.4.0]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_40-ea]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_40-ea]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40-ea]
2016-02-05 13:14:03,556 [294b5fe3-8f63-2134-67e0-42f7111ead44:foreman]
ERROR o.a.drill.exec.work.foreman.Foreman - SYSTEM ERROR: IOException: No
FileSystem for scheme: igfs


[Error Id: 6c95179a-6d26-498c-905f-dc18509c1651 on 192.168.1.42:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
IOException: No FileSystem for scheme: igfs


I copied the same ignite jars that go into Hadoop to Drill just in case but
that did not help either.
I think the only way is to write a Drill storage plugin for Ignite. Or
somehow make the Ignite caching happen inside Hadoop and be totally
transparent to Drill.

Thank you for detailed help, any further ideas are as always welcome ;)

Best regards,

Petar




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Apache-Drill-querying-IGFS-accelerated-H-DFS-tp2840p2859.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Apache Drill querying IGFS-accelerated (H)DFS?

2016-02-04 Thread pshomov
Hi Val,

Thank you for you quick response! 

>> I think you should download Hadoop Accelerator edition [1] and refer to
>> [2] for instructions on how to install it. It will plug into your
>> existing Hadoop installation and switch it to IGFS and Ignite's
>> map-reduce engine.

It is exactly what I tried to do. Ran sbin/start-dfs.sh and got this

Exception in thread "main" java.lang.IllegalArgumentException: Invalid URI
for NameNode address (check fs.defaultFS): igfs://igfs@localhost is not of
scheme 'hdfs'. 

Found this - 
http://apache-ignite-users.70518.x6.nabble.com/IllegalArgumentException-Invalid-URI-for-NameNode-address-check-fs-defaultFS-igfs-igfs-localhost-is--td1978.html

  
where I see again that HDFS service stops being available (which is all we
care about).

>> You can also configure HDFS as a secondary file system for IGFS [3], so
>> you don't need to preload the data to IGFS - it will act as a caching
>> layer between you application and your data. As a result, Drill
>> application should be able to work with in-memory data without any code
>> changes. 

Are you implying that I might leave Hadoop as is and rather integrate IGFS
in Apache Drill instead?


Thank you once again for taking the time to share with me! Much appreciated!

Best regards,

Petar




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Apache-Drill-querying-IGFS-accelerated-H-DFS-tp2840p2842.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.