Hi Prabu & Ted,
Ted is right, the next step to track this down is via debugging. As large
projects go, Drill is actually easier to debug than most. (Hat's off the the
team for achieving this valuable goal!)
1. Fork, clone and build Drill: [1]
2. In your IDE (both Eclipse and Intellij work) We used to have info, but can't
find it now. [2] gives an overview. I think you can just import drill/pom.xml
as a Maven project (in Eclipse).
3. Find the test TestCsvWithHeaders.java [3]. Run it to verify things work.
4. Create an ad-hoc test in this same package. You really just need a setup and
a test method:
@BeforeClass
public static void setup() throws Exception {
startCluster(
ClusterFixture.builder(dirTestWatcher)
.maxParallelization(1));
}
@Test
public void adHocTest() throws IOException {
String sql = "SELECT * FROM ...";
RowSet actual = client.queryBuilder().sql(sql).rowSet();
actual.print();
actual.clear()
}
The setup method starts your cluster. The test just runs a query and will print
the results. Put your SQL here. Works best if the file is small.
You'll need to configure your data source; the test does not hit Zookeeper
where we store the definitions you set in the Drill web UI. My tests tend to do
the setup in code, but this gets pretty messy.
Anyone know how to do the storage plugin setup in some file so it works for a
unit test? Maybe edit bootstrap-storage-plugins.json [4] for a quick & dirty
solution?
Once you get past this, run the test. It will fail and print a big nasty stack
dump. If you look carefully (ignore the first few stacks, they are on the
client), you should see a stack trace on the server (which is running in the
same process) where Drill is trying to open your file. You can set a breakpoint
here and start poking around to see what's what.
Quite a bit to get right, so feel free to ask here (or on dev) to get help.
Note also that there is detailed info in the "Learning Apache Drill" book for
setting up your development environment.
Thanks,
- Paul
[1] http://drill.apache.org/docs/compiling-drill-from-source/
[2] https://github.com/apache/drill/tree/master/docs/dev
[3]
https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/store/easy/text/compliant/TestCsvWithHeaders.java
[4]
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/resources/bootstrap-storage-plugins.json
On Wednesday, August 21, 2019, 02:10:17 PM PDT, Ted Dunning
<[email protected]> wrote:
Prabu,
Yes. You can debug the code. It is a large codebase so that can be a bit of
a trick to get started.
I think that one of the most stable approaches is to build a test case that
accesses the data you want (this doesn't have to become a public test case,
it just makes debugging easier by being very repeatable).
I am not up to speed on how to do this, however.
Is there somebody else on the list who could advise on this?
On Wed, Aug 21, 2019 at 1:08 PM Prabu Mohan <[email protected]> wrote:
> Thanks Ted.
>
> This is getting complex now, I thought that I might be missing something
> simple while configuring drill, but this seems to be far beyond that.
>
> I'm not sure whether I can get a proxy and also just in case if any other
> issues occur as well, is there a way I can debug the code to understand
> what values are being passed ?
>
> On Tue, Aug 20, 2019 at 12:22 AM Ted Dunning <[email protected]>
> wrote:
>
> > On Mon, Aug 19, 2019 at 11:33 AM Prabu Mohan <[email protected]>
> > wrote:
> >
> > > but i am able to connect to ECS via python using boto3 libraries
> without
> > > any issues, I am able to write files to the bucket and read them back
> ..
> > >
> > > not sure why i am facing issues with drill though with the same
> > credentials
> > >
> >
> >
> > The key here is your assumption that the same credentials are being
> passed
> > through Drill to AWS and that there isn't some other consideration that
> > keeps S3 from believing whatever credentials it is getting.
> >
> > That assumption has to be attacked by figuring out experiments that can
> > prove or disprove aspects of it. For instance, if you can get a proxy in
> > the middle of the connection, you should be able to see *exactly* what is
> > on the wire. Likewise if you can get better logging out of Drill.
> >
>