the methodology that works. If we don't see the
> performance to satisfy your usecase, we can see if we can suggest some
> things. (For example, supporting operation pushdowns that push through
> FLATTEN would probably be very helpful.)
>
>
>
> --
> Jacques Nadeau
> CTO and
Hi,
I've created a pull request for issue DRILL-4573. I'm wondering if it's in
the queue of pull request to be reviewed?
Thanks
Jean-Claude
Whenever drill encounters a corrupted parquet file it will stop processing
a query.
To work around this issue I'm trying to write a simple tool to detect
corrupted parquet files so that we can remove them from the pool of files
drill will query on.
I'm basically doing a HEAD command like was
Works great. Thanks John.
On Wed, 23 Mar 2016 at 20:53 Jean-Claude Cote <jcc...@gmail.com> wrote:
> Hey John,
>
> I looked at the Drill code and it does use the Jetty FormAuthenticator and
> not the BasicAuthenticator. So what I was trying will not work.
>
> I'll do a
found in drill?
On Wed, Mar 23, 2016 at 7:21 AM, Jean-Claude Cote <jcc...@gmail.com> wrote:
> Whenever drill encounters a corrupted parquet file it will stop processing
> a query.
>
> To work around this issue I'm trying to write a simple tool to detect
> corrupted par
573
>
> Alternatively you can review and apply these changes as the patch at:
>
> https://github.com/apache/drill/pull/458.patch
>
> To close this pull request, make a commit to your master/trunk branch
> with (at least) the following in the commit message:
>
> This close
I've noticed drill offers a REPEATED_CONTAINS which can be applied to
fields which are arrays.
https://drill.apache.org/docs/repeated-contains/
I have a schema stored in parquet files which contain a repeated field
containing a key and a value. However such structures can't be queried
using the
ok let me know when you do so.
On Fri, Sep 2, 2016 at 12:03 AM, Jinfeng Ni <j...@apache.org> wrote:
> Thanks for submitting the PR for that issue. I'll take a look at the PR.
>
> On Thu, Sep 1, 2016 at 6:36 PM, Jean-Claude Cote <jcc...@gmail.com> wrote:
> > Please
>
Please
Please review DRILL-4858
On Tue, Sep 6, 2016 at 6:49 PM, Jean-Claude Cote <jcc...@gmail.com> wrote:
> ok let me know when you do so.
>
> On Fri, Sep 2, 2016 at 12:03 AM, Jinfeng Ni <j...@apache.org> wrote:
>
>> Thanks for submitting the PR for that issu
t;
> You should use logback-test.xml for tests.
> Paul has documented it very well in [1] and [2].
>
> [1]
>
> https://github.com/apache/drill/blob/master/docs/dev/TestLogging.md#default-test-log-levels
> [2]
>
> https://github.com/apache/drill/blob/master/docs/dev/ClusterF
rks against a memory
> limit (20 MB, say) and automatically limits records per batch to that
> memory limit.
>
>
> Thanks for doing the PR. Will be great to see what you've created.
>
> Thanks,
> - Paul
>
>
>
> On Wednesday, October 10, 2018, 7:59:06 PM PDT, Jean-
I'm trying to write the following JSON file into a parquet file. However my
CTAS query returns an error Unsupported type LIST. Any ideas why, I'm
pretty sure parquet support array of array.
Thanks
jc
cat /tmp/json1/0_0_0.json
{
"arrayOfArray" : [ [ 1, 1, 1 ], [ 1, 1, 1, 1, 1 ] ]
}
0:
I've changed my batch size record reader to be larger. All my test cases
still work as I would expect them, except for 1 and I have no idea why? I'v
turned on tracing in the hopes of getting a hint. I now see it is in a
generated projection class but I'm not sure why.. Can anyone speculate why
a
est if you include a few columns
> in each of several files, rather than one big file with all column types.)
> This will give you a record batch with what was read.
>
> Then, use the RowSet mechanisms to build up an expected record batch, then
> compare the expected value with your act
maller batch size.
>
> Thanks,
> - Paul
>
>
>
> On Sunday, October 14, 2018, 6:14:46 PM PDT, Jean-Claude Cote <
> jcc...@gmail.com> wrote:
>
> Hey Paul, we think alike ;-) that's exactly what I was doing the past
> couple of days. I was simplifying my test cas
I trying to write a test case for a repeated map scenario. However testing
framework is unable to infer the schema of the resultset? I'm using the API
correctly?
Thanks
jc
// create a test file
try (OutputStreamWriter w = new OutputStreamWriter(new
FileOutputStream(new File(testDir,
ich is why
> some code will be need (in Spark/Hive or in a Drill plugin of some kind.)
>
> Charles Givre makes a very good point: he suggests that Drill's unique
> opportunity is to handle such odd files clearly, avoiding the need for ETL.
> That is, rather than thinking of Drill as
I'm trying to output trace information in my junit test cases. I'm using
the ClusterFixture
startCluster(ClusterFixture.builder(dirTestWatcher).maxParallelization(1));
I've put my logback.xml in src/test/resources
and feed in these environment variables at the command mvn launch.
mvn --offline
I'm writing a msgpack reader and have encountered datasets where an array
contains different types for example a VARCHAR and a BINARY. Turns out the
BINARY is actually a string. I know this is probably just not modeled
correctly in the first place but I'll still going to modify the reading of
list
I'm writing a msgpack reader which supports schema validation. The msgpack
reader is able to discover the schema and store the result in a file
named .schema.proto along side the data files. There is also an additional
..schema.proto.crc file created by the hadoop file system I believe.
However
ks.com/questions/19449/hadoop-localfilesystem-checksum-calculation.html
>
>
>
> On Tuesday, October 30, 2018, 5:45:11 PM PDT, Jean-Claude Cote <
> jcc...@gmail.com> wrote:
>
> I'm writing a msgpack reader which supports schema validation. The msgpack
> reader is able
I'm using the LogFixture
LogFixtureBuilder logBuilder = LogFixture.builder()
// Log to the console for debugging convenience
.toConsole().logger("org.apache.drill.exec.store.msgpack", Level.DEBUG);
try (LogFixture logs = logBuilder.build()) {
Basice logback.xml file is
However when I
Hey Paul,
In my pull request you mentioned handling splits.. I put a comment in the
pull request but essentially msgpack files are a list of records so
technically they can be split. However I'm not sure if that's beneficial
because I'm not sure how the splitting process works..
The other thing
perhaps modify your default
> file has to turn off console logging completely, so that only the
> LogFixture controls the console.
>
> Thanks,
> - Paul
>
>
>
> On Saturday, November 3, 2018, 12:17:28 PM PDT, Jean-Claude Cote <
> jcc...@gmail.com> wrote:
Hi,
I'm running the show files in dfs.root.`subdir1/subdir2` query.
And got the error "To SHOW FILES in specific directory, enable option
storage.list_files_recursively"
I've turn that on with alter session set
storage.list_files_recursively=true;
However when I now run the query it seems like
global: they must be defined in the one big file you
> > > discovered, and default values must be listed in the master
> > > drill-module.conf file.
> > >
> > > It would be a handy feature to modify this to allow modules to add
> > > options. Easy to def
nnected. Arina or Aman, do you know
> how to connect up a Drill table with DESCRIBE?
>
> Thanks,
> - Paul
>
>
>
> On Tuesday, October 2, 2018, 6:12:30 PM PDT, Jean-Claude Cote <
> jcc...@gmail.com> wrote:
>
> I've been looking at the pcap reader
>
> htt
LECT all/some/none projection. Another is
> using "column state" classes to perform type-specific handling instead of a
> big (slow) switch statement. That new JSON reader I mentioned takes the
> idea a step further and has the column state classes handle data
> transla
query on that same directory.
> > Drill has no good synchronization solution. Since this seems to not be a
> > problem for views, perhaps things will work for schemas. (Both are a
> single
> > file.) We have had problem with metadata because refreshing that updates
> >
RILL-6552
> [2] https://github.com/msgpack/msgpack/blob/master/spec.md
>
> [3] JSON Schema
>
> [4]
> https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/log
>
> [5]
> https://github.com/paul-rogers/drill/tree/RowSetRev4/exec/
I'm implementing a msgpack reader and use the JSON reader as inspiration.
I've noticed that in the JSON reader there's code to detect if rows were
written but with no columns it will actually add one row with columns of
type INT. The comment in the code is
"if we had no columns, create one empty
Hi
I'm writing a new msgpack data source for Drill. I would like to be able to
configure the reader using the alter session mechanism using sqlline,
something like "alter session set `store.msgpack.reader.learnschema` =
true".
However I'm unable to make this work. In my format plugin I have
talii Diravka
> wrote:
>
> > Hi Jean-Claude
> >
> > BaseTestQuery is deprecated. Please use ClusterTest instead.
> > See TestCsv.java for example.
> >
> > You can find more info about Drill Cluster-Fixture-Framework here:
> > https://github.com/paul-roger
gt; boostrap-storage-plugins.json does not make any sense.
> If you need to add your own format in unit tests, as Vitalii pointed out,
> TestCsv is a good example for this.
>
> Kind regards,
> Arina
>
> On Wed, Sep 26, 2018 at 7:07 PM Jean-Claude Cote wrote:
>
> &
I have writing a msgpack storage plugin from drill.
https://github.com/jcmcote/drill/tree/master/contrib/storage-msgpack
I'm now trying to write test cases like
testBuilder()
.sqlQuery("select * from cp.`msgpack/testBasic.mp`")
.ordered()
-override.conf *for configuring plugins
> > configs during start-up [1].
> >
> > [1]
> >
> > https://drill.apache.org/docs/configuring-storage-plugins/#
> configuring-storage-plugins-with-the-storage-plugins-override.conf-file
> >
> >
> > On Wed
nk the PR for the fix you have made.
>
> Thanks,
> Pritesh
>
> On Mon, Jan 7, 2019 at 12:44 PM Jean-Claude Cote wrote:
>
> > I have made some fixes to the JDBC storage plugin in order to make it
> work
> > with an Oracle database. I can now obtain the table names. As stat
understand how to
construct the internal objects drill uses to keep track of the schemas and
tables. Should I report this as a bug?
Thank you
jc
On Wed, Dec 19, 2018 at 9:50 AM Jean-Claude Cote wrote:
> I've configured drill to use an JDBC storage plugin. My connection string
> is for an
I've configured drill to use an JDBC storage plugin. My connection string
is for an Oracle database. I have included the Oracle JDBC driver to my
drill deployment.
The connection is established correctly. However the storage plugin fails
to retrieve the schema of the database.
The JDBC API
40 matches
Mail list logo