Re: EMC ECS Configuration with Apache Drill

Paul Rogers Wed, 21 Aug 2019 17:15:47 -0700

Hi Prabu & Ted,

Ted is right, the next step to track this down is via debugging. As large 
projects go, Drill is actually easier to debug than most. (Hat's off the the 
team for achieving this valuable goal!)

1. Fork, clone and build Drill: [1]
2. In your IDE (both Eclipse and Intellij work) We used to have info, but can't 
find it now. [2] gives an overview. I think you can just import drill/pom.xml 
as a Maven project (in Eclipse).
3. Find the test TestCsvWithHeaders.java [3]. Run it to verify things work.
4. Create an ad-hoc test in this same package. You really just need a setup and 
a test method:

  @BeforeClass
  public static void setup() throws Exception {
   startCluster(
        ClusterFixture.builder(dirTestWatcher)
        .maxParallelization(1));
  }

@Test
  public void adHocTest() throws IOException {
    String sql = "SELECT * FROM ...";
    RowSet actual = client.queryBuilder().sql(sql).rowSet();
    actual.print();
    actual.clear()
  }

The setup method starts your cluster. The test just runs a query and will print 
the results. Put your SQL here. Works best if the file is small.

You'll need to configure your data source; the test does not hit Zookeeper 
where we store the definitions you set in the Drill web UI. My tests tend to do 
the setup in code, but this gets pretty messy.

Anyone know how to do the storage plugin setup in some file so it works for a 
unit test? Maybe edit bootstrap-storage-plugins.json [4] for a quick & dirty 
solution?

Once you get past this, run the test. It will fail and print a big nasty stack 
dump. If you look carefully (ignore the first few stacks, they are on the 
client), you should see a stack trace on the server (which is running in the 
same process) where Drill is trying to open your file. You can set a breakpoint 
here and start poking around to see what's what.

Quite a bit to get right, so feel free to ask here (or on dev) to get help. 
Note also that there is detailed info in the "Learning Apache Drill" book for 
setting up your development environment.

Thanks,
- Paul

[1] http://drill.apache.org/docs/compiling-drill-from-source/

[2] https://github.com/apache/drill/tree/master/docs/dev

[3] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/store/easy/text/compliant/TestCsvWithHeaders.java

[4] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/resources/bootstrap-storage-plugins.json

    On Wednesday, August 21, 2019, 02:10:17 PM PDT, Ted Dunning 
<[email protected]> wrote:  

 Prabu,

Yes. You can debug the code. It is a large codebase so that can be a bit of
a trick to get started.

I think that one of the most stable approaches is to build a test case that
accesses the data you want (this doesn't have to become a public test case,
it just makes debugging easier by being very repeatable).

I am not up to speed on how to do this, however.

Is there somebody else on the list who could advise on this?

On Wed, Aug 21, 2019 at 1:08 PM Prabu Mohan <[email protected]> wrote:

> Thanks Ted.
>
> This is getting complex now, I thought that I might be missing something
> simple while configuring drill, but this seems to be far beyond that.
>
> I'm not sure whether I can get a proxy and also just in case if any other
> issues occur as well, is there a way I can debug the code to understand
> what values are being passed ?
>
> On Tue, Aug 20, 2019 at 12:22 AM Ted Dunning <[email protected]>
> wrote:
>
> > On Mon, Aug 19, 2019 at 11:33 AM Prabu Mohan <[email protected]>
> > wrote:
> >
> > > but i am able to connect to ECS via python using boto3 libraries
> without
> > > any issues, I am able to write files to the bucket and read them back
> ..
> > >
> > > not sure why i am facing issues with drill though with the same
> > credentials
> > >
> >
> >
> > The key here is your assumption that the same credentials are being
> passed
> > through Drill to AWS and that there isn't some other consideration that
> > keeps S3 from believing whatever credentials it is getting.
> >
> > That assumption has to be attacked by figuring out experiments that can
> > prove or disprove aspects of it. For instance, if you can get a proxy in
> > the middle of the connection, you should be able to see *exactly* what is
> > on the wire. Likewise if you can get better logging out of Drill.
> >
>

Re: EMC ECS Configuration with Apache Drill

Reply via email to