Re: REPEATED_CONTAINS

2016-03-30 Thread Jean-Claude Cote
the methodology that works. If we don't see the > performance to satisfy your usecase, we can see if we can suggest some > things. (For example, supporting operation pushdowns that push through > FLATTEN would probably be very helpful.) > > > > -- > Jacques Nadeau > CTO and

Jira DRILL-4573

2016-04-08 Thread Jean-Claude Cote
Hi, I've created a pull request for issue DRILL-4573. I'm wondering if it's in the queue of pull request to be reviewed? Thanks Jean-Claude

detecting corrupted parquet files

2016-03-23 Thread Jean-Claude Cote
Whenever drill encounters a corrupted parquet file it will stop processing a query. To work around this issue I'm trying to write a simple tool to detect corrupted parquet files so that we can remove them from the pool of files drill will query on. I'm basically doing a HEAD command like was

Re: REST API with Basic Auth

2016-03-24 Thread Jean-Claude Cote
Works great. Thanks John. On Wed, 23 Mar 2016 at 20:53 Jean-Claude Cote <jcc...@gmail.com> wrote: > Hey John, > > I looked at the Drill code and it does use the Jetty FormAuthenticator and > not the BasicAuthenticator. So what I was trying will not work. > > I'll do a

Re: detecting corrupted parquet files

2016-03-23 Thread Jean-Claude Cote
found in drill? On Wed, Mar 23, 2016 at 7:21 AM, Jean-Claude Cote <jcc...@gmail.com> wrote: > Whenever drill encounters a corrupted parquet file it will stop processing > a query. > > To work around this issue I'm trying to write a simple tool to detect > corrupted par

Re: [GitHub] drill pull request: DRILL-4573: Zero copy LIKE, REGEXP_MATCHES, SU...

2016-04-04 Thread Jean-Claude Cote
573 > > Alternatively you can review and apply these changes as the patch at: > > https://github.com/apache/drill/pull/458.patch > > To close this pull request, make a commit to your master/trunk branch > with (at least) the following in the commit message: > > This close

REPEATED_CONTAINS

2016-03-29 Thread Jean-Claude Cote
I've noticed drill offers a REPEATED_CONTAINS which can be applied to fields which are arrays. https://drill.apache.org/docs/repeated-contains/ I have a schema stored in parquet files which contain a repeated field containing a key and a value. However such structures can't be queried using the

Re: Code Review Requested for DRILL-4858

2016-09-06 Thread Jean-Claude Cote
ok let me know when you do so. On Fri, Sep 2, 2016 at 12:03 AM, Jinfeng Ni <j...@apache.org> wrote: > Thanks for submitting the PR for that issue. I'll take a look at the PR. > > On Thu, Sep 1, 2016 at 6:36 PM, Jean-Claude Cote <jcc...@gmail.com> wrote: > > Please >

Code Review Requested for DRILL-4858

2016-09-01 Thread Jean-Claude Cote
Please

Re: Code Review Requested for DRILL-4858

2016-09-14 Thread Jean-Claude Cote
Please review DRILL-4858 On Tue, Sep 6, 2016 at 6:49 PM, Jean-Claude Cote <jcc...@gmail.com> wrote: > ok let me know when you do so. > > On Fri, Sep 2, 2016 at 12:03 AM, Jinfeng Ni <j...@apache.org> wrote: > >> Thanks for submitting the PR for that issu

Re: configure logback to trace level in junit tests

2018-10-12 Thread Jean-Claude Cote
t; > You should use logback-test.xml for tests. > Paul has documented it very well in [1] and [2]. > > [1] > > https://github.com/apache/drill/blob/master/docs/dev/TestLogging.md#default-test-log-levels > [2] > > https://github.com/apache/drill/blob/master/docs/dev/ClusterF

Re: msgpack format reader with schema learning feature

2018-10-11 Thread Jean-Claude Cote
rks against a memory > limit (20 MB, say) and automatically limits records per batch to that > memory limit. > > > Thanks for doing the PR. Will be great to see what you've created. > > Thanks, > - Paul > > > > On Wednesday, October 10, 2018, 7:59:06 PM PDT, Jean-

Unsupported type LIST when CTAS arrayOfArray (JSON or Msgpack) into Parquet

2018-10-11 Thread Jean-Claude Cote
I'm trying to write the following JSON file into a parquet file. However my CTAS query returns an error Unsupported type LIST. Any ideas why, I'm pretty sure parquet support array of array. Thanks jc cat /tmp/json1/0_0_0.json { "arrayOfArray" : [ [ 1, 1, 1 ], [ 1, 1, 1, 1, 1 ] ] } 0:

msgpack read batch size larger than 4096 causes assertion error

2018-10-12 Thread Jean-Claude Cote
I've changed my batch size record reader to be larger. All my test cases still work as I would expect them, except for 1 and I have no idea why? I'v turned on tracing in the hopes of getting a hint. I now see it is in a generated projection class but I'm not sure why.. Can anyone speculate why a

Re: msgpack read batch size larger than 4096 causes assertion error

2018-10-14 Thread Jean-Claude Cote
est if you include a few columns > in each of several files, rather than one big file with all column types.) > This will give you a record batch with what was read. > > Then, use the RowSet mechanisms to build up an expected record batch, then > compare the expected value with your act

Re: msgpack read batch size larger than 4096 causes assertion error

2018-10-15 Thread Jean-Claude Cote
maller batch size. > > Thanks, > - Paul > > > > On Sunday, October 14, 2018, 6:14:46 PM PDT, Jean-Claude Cote < > jcc...@gmail.com> wrote: > > Hey Paul, we think alike ;-) that's exactly what I was doing the past > couple of days. I was simplifying my test cas

msgpack test case fails same as with json, problem with testing framework?

2018-10-21 Thread Jean-Claude Cote
I trying to write a test case for a repeated map scenario. However testing framework is unable to infer the schema of the resultset? I'm using the API correctly? Thanks jc // create a test file try (OutputStreamWriter w = new OutputStreamWriter(new FileOutputStream(new File(testDir,

Re: msgpack handling lists with elements of different types

2018-10-17 Thread Jean-Claude Cote
ich is why > some code will be need (in Spark/Hive or in a Drill plugin of some kind.) > > Charles Givre makes a very good point: he suggests that Drill's unique > opportunity is to handle such odd files clearly, avoiding the need for ETL. > That is, rather than thinking of Drill as

configure logback to trace level in junit tests

2018-10-12 Thread Jean-Claude Cote
I'm trying to output trace information in my junit test cases. I'm using the ClusterFixture startCluster(ClusterFixture.builder(dirTestWatcher).maxParallelization(1)); I've put my logback.xml in src/test/resources and feed in these environment variables at the command mvn launch. mvn --offline

msgpack handling lists with elements of different types

2018-10-17 Thread Jean-Claude Cote
I'm writing a msgpack reader and have encountered datasets where an array contains different types for example a VARCHAR and a BINARY. Turns out the BINARY is actually a string. I know this is probably just not modeled correctly in the first place but I'll still going to modify the reading of list

msgpack reading schema files checksum error

2018-10-30 Thread Jean-Claude Cote
I'm writing a msgpack reader which supports schema validation. The msgpack reader is able to discover the schema and store the result in a file named .schema.proto along side the data files. There is also an additional ..schema.proto.crc file created by the hadoop file system I believe. However

Re: msgpack reading schema files checksum error

2018-10-30 Thread Jean-Claude Cote
ks.com/questions/19449/hadoop-localfilesystem-checksum-calculation.html > > > > On Tuesday, October 30, 2018, 5:45:11 PM PDT, Jean-Claude Cote < > jcc...@gmail.com> wrote: > > I'm writing a msgpack reader which supports schema validation. The msgpack > reader is able

logging in test cases produces two outputs

2018-11-03 Thread Jean-Claude Cote
I'm using the LogFixture LogFixtureBuilder logBuilder = LogFixture.builder() // Log to the console for debugging convenience .toConsole().logger("org.apache.drill.exec.store.msgpack", Level.DEBUG); try (LogFixture logs = logBuilder.build()) { Basice logback.xml file is However when I

msgpack pull request

2018-11-09 Thread Jean-Claude Cote
Hey Paul, In my pull request you mentioned handling splits.. I put a comment in the pull request but essentially msgpack files are a list of records so technically they can be split. However I'm not sure if that's beneficial because I'm not sure how the splitting process works.. The other thing

Re: logging in test cases produces two outputs

2018-11-04 Thread Jean-Claude Cote
perhaps modify your default > file has to turn off console logging completely, so that only the > LogFixture controls the console. > > Thanks, > - Paul > > > > On Saturday, November 3, 2018, 12:17:28 PM PDT, Jean-Claude Cote < > jcc...@gmail.com> wrote:

show files in

2018-11-02 Thread Jean-Claude Cote
Hi, I'm running the show files in dfs.root.`subdir1/subdir2` query. And got the error "To SHOW FILES in specific directory, enable option storage.list_files_recursively" I've turn that on with alter session set storage.list_files_recursively=true; However when I now run the query it seems like

Re: How to use alter session to configure contributed format plugins

2018-10-08 Thread Jean-Claude Cote
global: they must be defined in the one big file you > > > discovered, and default values must be listed in the master > > > drill-module.conf file. > > > > > > It would be a handy feature to modify this to allow modules to add > > > options. Easy to def

Re: Possible way to specify column types in query

2018-10-02 Thread Jean-Claude Cote
nnected. Arina or Aman, do you know > how to connect up a Drill table with DESCRIBE? > > Thanks, > - Paul > > > > On Tuesday, October 2, 2018, 6:12:30 PM PDT, Jean-Claude Cote < > jcc...@gmail.com> wrote: > > I've been looking at the pcap reader > > htt

Re: Possible way to specify column types in query

2018-10-02 Thread Jean-Claude Cote
LECT all/some/none projection. Another is > using "column state" classes to perform type-specific handling instead of a > big (slow) switch statement. That new JSON reader I mentioned takes the > idea a step further and has the column state classes handle data > transla

Re: msgpack format reader with schema learning feature

2018-10-10 Thread Jean-Claude Cote
query on that same directory. > > Drill has no good synchronization solution. Since this seems to not be a > > problem for views, perhaps things will work for schemas. (Both are a > single > > file.) We have had problem with metadata because refreshing that updates > >

Re: Possible way to specify column types in query

2018-10-01 Thread Jean-Claude Cote
RILL-6552 > [2] https://github.com/msgpack/msgpack/blob/master/spec.md > > [3] JSON Schema > > [4] > https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/log > > [5] > https://github.com/paul-rogers/drill/tree/RowSetRev4/exec/

Re: Possible way to specify column types in query

2018-10-01 Thread Jean-Claude Cote
I'm implementing a msgpack reader and use the JSON reader as inspiration. I've noticed that in the JSON reader there's code to detect if rows were written but with no columns it will actually add one row with columns of type INT. The comment in the code is "if we had no columns, create one empty

How to use alter session to configure contributed format plugins

2018-10-07 Thread Jean-Claude Cote
Hi I'm writing a new msgpack data source for Drill. I would like to be able to configure the reader using the alter session mechanism using sqlline, something like "alter session set `store.msgpack.reader.learnschema` = true". However I'm unable to make this work. In my format plugin I have

Re: storage plugin test case

2018-09-26 Thread Jean-Claude Cote
talii Diravka > wrote: > > > Hi Jean-Claude > > > > BaseTestQuery is deprecated. Please use ClusterTest instead. > > See TestCsv.java for example. > > > > You can find more info about Drill Cluster-Fixture-Framework here: > > https://github.com/paul-roger

Re: storage plugin test case

2018-09-26 Thread Jean-Claude Cote
gt; boostrap-storage-plugins.json does not make any sense. > If you need to add your own format in unit tests, as Vitalii pointed out, > TestCsv is a good example for this. > > Kind regards, > Arina > > On Wed, Sep 26, 2018 at 7:07 PM Jean-Claude Cote wrote: > > &

storage plugin test case

2018-09-25 Thread Jean-Claude Cote
I have writing a msgpack storage plugin from drill. https://github.com/jcmcote/drill/tree/master/contrib/storage-msgpack I'm now trying to write test cases like testBuilder() .sqlQuery("select * from cp.`msgpack/testBasic.mp`") .ordered()

Re: storage plugin test case

2018-09-28 Thread Jean-Claude Cote
-override.conf *for configuring plugins > > configs during start-up [1]. > > > > [1] > > > > https://drill.apache.org/docs/configuring-storage-plugins/# > configuring-storage-plugins-with-the-storage-plugins-override.conf-file > > > > > > On Wed

Re: Oracle JDBC storage plugin

2019-01-10 Thread Jean-Claude Cote
nk the PR for the fix you have made. > > Thanks, > Pritesh > > On Mon, Jan 7, 2019 at 12:44 PM Jean-Claude Cote wrote: > > > I have made some fixes to the JDBC storage plugin in order to make it > work > > with an Oracle database. I can now obtain the table names. As stat

Re: Oracle JDBC storage plugin

2019-01-07 Thread Jean-Claude Cote
understand how to construct the internal objects drill uses to keep track of the schemas and tables. Should I report this as a bug? Thank you jc On Wed, Dec 19, 2018 at 9:50 AM Jean-Claude Cote wrote: > I've configured drill to use an JDBC storage plugin. My connection string > is for an

Oracle JDBC storage plugin

2018-12-19 Thread Jean-Claude Cote
I've configured drill to use an JDBC storage plugin. My connection string is for an Oracle database. I have included the Oracle JDBC driver to my drill deployment. The connection is established correctly. However the storage plugin fails to retrieve the schema of the database. The JDBC API