Hi Weijie,
Thanks much for the update on your Gandiva work. It is great work.
Can you say more about how you are doing the integration?
As you mentioned the memory layout of Arrow's null vector differs from the "is
set" vector in Drill. How did you work around that?
The Project operator is
On Thu, Apr 11, 2019 at 6:37 AM Charles Givre wrote:
>
> > That’s a good idea. I’ll work on a equivalent ZIP() function and submit
> > as a separate PR.
> > — C
> >
> > > On Apr 10, 2019, at 20:44, Paul Rogers
> > wrote:
> > >
> > > Hi Charles,
Hi Charles,
In Python [1], the "zip" function does this task:
zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
When you gathered the list of functions for the Drill book, did you come across
anything like this in Drill? I presume you didn't, hence the question. I did a
quick
Congratulations Sorabh, well deserved!
- Paul
On Friday, April 5, 2019, 9:06:37 AM PDT, Arina Ielchiieva
wrote:
I am pleased to announce that Drill PMC invited Sorabh Hamirwasia to
the PMC and
he has accepted the invitation.
Congratulations Sorabh and welcome!
- Arina
(on behalf
Hi All,
Note that Hive 3 has introduced Hive ACID: an innovative way to handle
transactional data on a traditional big data warehouse. Some distros appear to
be talking about enabling ACID by default for all Hive-managed tables. In order
for Drill to continue to work with such tables, Drill
Hi All,
Daffodil is an interesting project as is the DFDLSchemas project. Thanks for
sharing!
An interesting challenge is how these libraries load data: what is their
internal format, or what API do they use for the application to consume data?
Found this for Daffodil, it will "parse data
Hi,
The queue documentation can be a bit hard to find, but it is available at [1].
However, it appears that either a) this information is out of date, or b) the
feature has changed. About 18 months ago we added additional options to make it
easier to tune the queues, but that information is
Paul Rogers created DRILL-7143:
--
Summary: Enforce column-level constraints when using a schema
Key: DRILL-7143
URL: https://issues.apache.org/jira/browse/DRILL-7143
Project: Apache Drill
Issue
Paul Rogers created DRILL-7086:
--
Summary: Enhance row-set scan framework for to use external schema
Key: DRILL-7086
URL: https://issues.apache.org/jira/browse/DRILL-7086
Project: Apache Drill
[
https://issues.apache.org/jira/browse/DRILL-5954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers resolved DRILL-5954.
Resolution: Fixed
Fixed in a prior commit.
> ListVector shadows "offse
Paul Rogers created DRILL-7083:
--
Summary: Wrong data type for explicit partition column beyond file
depth
Key: DRILL-7083
URL: https://issues.apache.org/jira/browse/DRILL-7083
Project: Apache Drill
Paul Rogers created DRILL-7082:
--
Summary: Inconsistent results with implicit partition columns,
multi scans
Key: DRILL-7082
URL: https://issues.apache.org/jira/browse/DRILL-7082
Project: Apache Drill
Paul Rogers created DRILL-7080:
--
Summary: Inconsistent behavior with wildcard and partition columns
Key: DRILL-7080
URL: https://issues.apache.org/jira/browse/DRILL-7080
Project: Apache Drill
Hi Charles,
As someone who struggled though learning these topics over the last few years,
I'd point out that there is no right way to do this stuff. You can use the Git
command line tools, You can use a UI. You can keep branches locally, or publish
everything to GitHub. As Parth wisely noted
Paul Rogers created DRILL-7074:
--
Summary: Fixes and improvements to the scan framework for CSV
Key: DRILL-7074
URL: https://issues.apache.org/jira/browse/DRILL-7074
Project: Apache Drill
Issue
[
https://issues.apache.org/jira/browse/DRILL-5265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers resolved DRILL-5265.
Resolution: Fixed
> External Sort consumes more memory than alloca
[
https://issues.apache.org/jira/browse/DRILL-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers resolved DRILL-5805.
Resolution: Fixed
> External Sort runs out of mem
Paul Rogers created DRILL-7055:
--
Summary: Project operator cannot handle wildcard + implicit cols
Key: DRILL-7055
URL: https://issues.apache.org/jira/browse/DRILL-7055
Project: Apache Drill
Paul Rogers created DRILL-7053:
--
Summary: Benign, but unexpected, failure in CsvTest
Key: DRILL-7053
URL: https://issues.apache.org/jira/browse/DRILL-7053
Project: Apache Drill
Issue Type: Bug
+1
Moving forward, we'd like to evolve the format plugin API to use the new scan
framework based on the result set loader. Doing so will abstract away all the
vector-twiddling headaches that several people have had fun with over the last
couple of years. The framework will enable integration
Hi Igor,
Hive complex type integration will be a valuable addition to Drill. You
mentioned running into issues with List vector. I believe you will find that
you'll encounter four separate issues.
First, the List vector is "experimental": the core functionality exists, but
there are holes.
Paul Rogers created DRILL-7024:
--
Summary: Refactor ColumnWriter to simplify type-conversion shim
Key: DRILL-7024
URL: https://issues.apache.org/jira/browse/DRILL-7024
Project: Apache Drill
this is well isolated and not hard if you take it step-by-step.
That's why it seemed a good Summer of Code project for an enterprising student
interested in networking and data munging.
Thanks,
- Paul
[1] https://github.com/paul-rogers/drill-jig
On Wednesday, January 30, 2019, 10:18:47 AM PST
/17I2jZq2HdDwUDXFOIg1Vecry8yGTDWhn
Aman
On Tue, Jan 29, 2019 at 12:08 AM Paul Rogers
wrote:
> Hi Charles,
> I didn't see anything on this on the public mailing list. Haven't seen any
> commits related to it either. My guess is that this kind of interface is
> not important for the kind of data warehou
018, at 13:51, Paul Rogers wrote:
>
> Hi Ted,
>
> We may be confusing two very different ideas. The one is a Drill-to-Arrow
> adapter on Drill's periphery, this is the "crude-but-effective" integration
> suggestion. On the periphery we are not changing existing code,
Paul Rogers created DRILL-7007:
--
Summary: Revise row-set based tests to use simplified verify method
Key: DRILL-7007
URL: https://issues.apache.org/jira/browse/DRILL-7007
Project: Apache Drill
Paul Rogers created DRILL-7006:
--
Summary: Support type conversion shims in RowSetWriter
Key: DRILL-7006
URL: https://issues.apache.org/jira/browse/DRILL-7006
Project: Apache Drill
Issue Type
Hi Charles,
A managed buffer is just a DrillBuf that the execution framework will free for
you when the query fragment shuts down.
However, nothing can determine when you write past the end of the buffer and
automatically resize it. You still must do the reallocation yourself.
You probably
Hi All,
Wanted to pass along some good foundational material about databases. We find
ourselves immersed day-to-day in the details of Drill's implementation. It is
helpful to occasionally step back and look at the larger DB tradition in which
Drill resides. This material is especially good for
:
https://github.com/apache/drill/tree/master/docs/dev
https://github.com/paul-rogers/drill/wiki
Kind regards
Vitalii
On Fri, Jan 18, 2019 at 12:31 PM srungarapu vamsi
wrote:
> Hi,
>
> I find Drill Apache project interesting and i want to contribute to the
> project. I have cloned the
Paul Rogers created DRILL-6953:
--
Summary: Merge row set-based JSON reader
Key: DRILL-6953
URL: https://issues.apache.org/jira/browse/DRILL-6953
Project: Apache Drill
Issue Type: Improvement
Paul Rogers created DRILL-6951:
--
Summary: Merge row set based mock data source
Key: DRILL-6951
URL: https://issues.apache.org/jira/browse/DRILL-6951
Project: Apache Drill
Issue Type
Paul Rogers created DRILL-6952:
--
Summary: Merge row set based "compliant" text reader
Key: DRILL-6952
URL: https://issues.apache.org/jira/browse/DRILL-6952
Project: Apache Drill
Paul Rogers created DRILL-6950:
--
Summary: Pull request for row set-based scan framework
Key: DRILL-6950
URL: https://issues.apache.org/jira/browse/DRILL-6950
Project: Apache Drill
Issue Type
Hi Charles,
I'm not quite sure what "dynamic queue allocation" means: all YARN containers
are allocated dynamically through YARN via queues.
It may be helpful to review how Drill-on-YARN (DoY) works. DoY does NOT attempt
to use YARN for each query. Impala tried that with Llama and discovered
Paul Rogers created DRILL-6901:
--
Summary: Move SchemaBuilder from test to main for use outside tests
Key: DRILL-6901
URL: https://issues.apache.org/jira/browse/DRILL-6901
Project: Apache Drill
Thanks! Glad you found the book useful.
- Paul
On Friday, December 7, 2018, 8:00:29 PM PST, 王亮
wrote:
Thanks, I have one "Learning Apache Drill:Query and Analyze Distributed
Data Sources with SQL" , wonderful book: )
Congrats Karthik!
- Paul
Sent from my iPhone
> On Dec 7, 2018, at 11:12 AM, Abhishek Girish wrote:
>
> Congratulations Karthik!
>
>> On Fri, Dec 7, 2018 at 11:11 AM Arina Ielchiieva wrote:
>>
>> The Project Management Committee (PMC) for Apache Drill has invited
>> Karthikeyan
>>
java:89)
Results :
Tests in error:
TestSyslogFormat>ClusterTest.shutdown:89 » Runtime Exception while closing
> On Nov 9, 2018, at 16:09, Paul Rogers wrote:
>
> Hi Charles,
>
> Thanks for the PR. Two suggestions for your test. First, use TupleSchema:
>
> TupleSchema sch
Hi Gautam,
You touched on the key issue: storage. You mention that the Drill stats
implementation learned from Oracle. Very wise: Oracle is the clear expert in
this space.
There is a very important difference, however, between Drill and Oracle. Oracle
is a complete database including both
Hi JC,
Thanks much for the updates. I’ll take another look over the weekend.
- Paul
Sent from my iPhone
> On Nov 9, 2018, at 2:02 PM, Jean-Claude Cote wrote:
>
> Hey Paul,
>
> In my pull request you mentioned handling splits.. I put a comment in the
> pull request but essentially msgpack
Hi Charles,
Thanks for the PR. Two suggestions for your test. First, use TupleSchema:
TupleSchema schema = new SchemaBuilder() ... .buildSchema().
BatchSchema has some limitations that TupleSchema overcomes.
Second, when I did a PR that added unions, I normalized the "buildFoo()"
methods.
, would be good to get the existing version into the
code base so folks can play with it.
Thanks,
- Paul
On Thursday, November 8, 2018, 3:57:35 PM PST, Paul Rogers
wrote:
Hi Gautam,
Thanks much for the explanations. You raise some interesting points. I noticed
that Boaz has just filed
Hi Gautam,
Thanks much for the explanations. You raise some interesting points. I noticed
that Boaz has just filed a JIRA ticket to tackle the inefficient count distinct
case.
To take a step back, recall that Arina is working on a metadata proposal. A key
aspect of that proposal is that it
possibility for the Hash operators is to have some hash function
compatibility, like HashFunc( INT 567 ) == HashFunc( BIGINT 567 ), to
simplify (and avoid rehashing).
Thanks,
Boaz
On 11/6/18 12:25 PM, Paul Rogers wrote:
> HI Aman,
>
> I would completely agree with the
Paul Rogers created DRILL-6832:
--
Summary: Remove old "unmanaged" sort implementation
Key: DRILL-6832
URL: https://issues.apache.org/jira/browse/DRILL-6832
Project: Apache Drill
s it means that as soon as the schema changes, it
emits the previous Record Batch and starts a new output batch. For the
blocking operators, there's more things to take care of and I created
DRILL-6829 <https://issues.apache.org/jira/browse/DRILL-6829> to capture
that.
Aman
On Mon, Nov 5, 2018 at 8:50
Hi All,
Stats would be a great addition. Here are a couple of issues that came up in
the earlier code review, revisited in light of recent proposed work.
First, the code to gather the stats is rather complex; it is the evolution of
some work an intern did way back when. We'd be advised to find
Hi Aman,
Thanks much for the write-up. My two cents, FWIW.
As the history of this list has shown, I've fought with the schema change issue
multiple times: in sort, in JSON, in the row set loader framework, and in
writing the "Data Engineering" chapter in the Learning Drill book.
What I have
ore logback.xml.
https://github.com/apache/drill/blob/7b0c9034753a8c5035fd1c0f1f84a37b376e6748/common/src/test/resources/logback-test.xml
Should I be using a logback-text.xml in my personal project or should that
common logback-test.xml be removed ?
Thanks Paul
jc
On Sat, Nov 3, 2018 at 3:39 PM Paul
Hi JC,
Your code looks fine. I usually start with the default log level (ERROR), then
turn on DEBUG for specific modules, as you do. I then see my INFO or DEBUG
messages. My code looks like yours, so I'm not sure why you are seeing two
messages. Perhaps you are logging ERROR level messages?
Looks like Google found a couple of hits: [1] and [2]
I'm not an expert here, but I wonder if you can just remove the file. Never had
Drill or HDFS complain when asking it to read a local file without the .crc
file...
Thanks,
- Paul
[1]
-storage-plugin/
[3] https://drill.apache.org/docs/connect-a-data-source-introduction/
[4] https://github.com/apache/drill/tree/master/contrib
[5]
https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store
[6] https://github.com/paul-rogers/drill/wiki
Kind
Congrats Guatam!
- Paul
Sent from my iPhone
> On Oct 22, 2018, at 8:46 AM, salim achouche wrote:
>
> Congrats Gautam!
>
>> On Mon, Oct 22, 2018 at 7:25 AM Arina Ielchiieva wrote:
>>
>> The Project Management Committee (PMC) for Apache Drill has invited Gautam
>> Parai to become a
May be a bug in my code. Please create a JIRA ticket and attach your input file
and test code so I can reproduce the problem.
Thanks,
- Paul
On Sunday, October 21, 2018, 6:24:18 AM PDT, Jean-Claude Cote
wrote:
I trying to write a test case for a repeated map scenario. However
Hi JC,
Bingo, you just hit the core problem with schema-on-read: there is no "right"
rule for how to handle ambiguous or inconsistent schemas. Take your
string/binary example. You determined that the binary fields were actually
strings (encoded in what, UTF-8? ASCII? Host's native codeset?)
}
The test passes. Then I change
public static final long DEFAULT_ROWS_PER_BATCH =
BaseValueVector.INITIAL_VALUE_ALLOCATION ;
to be
public static final long DEFAULT_ROWS_PER_BATCH =
BaseValueVector.INITIAL_VALUE_ALLOCATION + 1;
and the test case fails.
I can attach the whole trace outpu
, 2018, 6:22:40 PM PDT, Paul Rogers
wrote:
Drill enforces two hard limits:
1. The maximum number of rows in a batch is 64K.
2. The maximum size of any vector is 4 GB.
We have found, however, that fragmentation occurs in our memory allocator for
any vector larger than 16 MB. (This is, in fact
Drill enforces two hard limits:
1. The maximum number of rows in a batch is 64K.
2. The maximum size of any vector is 4 GB.
We have found, however, that fragmentation occurs in our memory allocator for
any vector larger than 16 MB. (This is, in fact the original reason for the
result set loader
Hi JC,
Your are asking how to use logs with unit tests. Let's talk about the two ways
you might be using logging, because each has a different answer.
In general, a unit test should use JUnit assert calls to verify that behavior
is as expected. No-one ever looks at output from tests unless a
ofit other readers as the need arises.
The entire mechanism, and the design goals behind it, are documented in [1].
Thanks,
- Paul
[1] https://github.com/paul-rogers/drill/wiki/Batch-Handling-Upgrades
On Thursday, October 11, 2018, 2:51:22 AM PDT, Arina Yelchiyeva
wrote:
Paul,
Paul Rogers created DRILL-6791:
--
Summary: Merge scan projection framework into master
Key: DRILL-6791
URL: https://issues.apache.org/jira/browse/DRILL-6791
Project: Apache Drill
Issue Type
I don't believe Parquet supports 2D arrays, does it?
Thanks,
- Paul
On Thursday, October 11, 2018, 7:52:38 PM PDT, Jean-Claude Cote
wrote:
I'm trying to write the following JSON file into a parquet file. However my
CTAS query returns an error Unsupported type LIST. Any ideas why,
any ETA when you will be able to submit the PRs? Maybe
> also do some presentation? Can you please share Jira number(-s) as well?
>
> Kind regards,
> Arina
>
> On Wed, Oct 10, 2018 at 7:31 AM Paul Rogers
> wrote:
>
> > Hi JC,
> >
> > Very cool indeed. You
Maybe
also do some presentation? Can you please share Jira number(-s) as well?
Kind regards,
Arina
On Wed, Oct 10, 2018 at 7:31 AM Paul Rogers
wrote:
> Hi JC,
>
> Very cool indeed. You are the man!
>
> Ted's been advocating for this approach for as long as I can remember (2+
>
Hi JC,
Very cool indeed. You are the man!
Ted's been advocating for this approach for as long as I can remember (2+
years). You're well on your way to solving the JSON problems that I documented
a while back in DRILL-4710 and summarize as "Drill can't predict the future."
Basically, without a
Hi JC,
Unless something has changed recently, it turns out that system/session options
are global: they must be defined in the one big file you discovered, and
default values must be listed in the master drill-module.conf file.
It would be a handy feature to modify this to allow modules to add
DESCRIBE to work I need to implement it at
the planner level?
Thanks Paul
jc
On Tue, Oct 2, 2018 at 12:54 PM Paul Rogers
wrote:
> Hi JC,
>
> Now that you have a working reader, sounds like your next task is to pass
> column schema to the reader. There are two ways to do that. There a
ea.
How would I best leverage such a file.
Thank you very much
jc
On Mon, Oct 1, 2018 at 9:51 PM Paul Rogers
wrote:
> Hi JC,
>
> One of Drill's challenges is that it cannot predict the future: it can't
> know what type your column will be in later records or in another file. Al
l/exec/store/log
[5]
https://github.com/paul-rogers/drill/tree/RowSetRev4/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json
On Monday, October 1, 2018, 6:03:38 PM PDT, Jean-Claude Cote
wrote:
I'm implementing a msgpack reader and use the JSON reader as inspi
Congrats Chunhui!
Thanks,
- Paul
On Friday, September 28, 2018, 2:17:42 AM PDT, Arina Ielchiieva
wrote:
The Project Management Committee (PMC) for Apache Drill has invited Chunhui
Shi to become a committer, and we are pleased to announce that he has
accepted.
Chunhui Shi has
Paul Rogers created DRILL-6759:
--
Summary: CSV 'columns' array is incorrectly case sensitive
Key: DRILL-6759
URL: https://issues.apache.org/jira/browse/DRILL-6759
Project: Apache Drill
Issue
Hi All,
I'm hoping someone can explain a mystery in the root pom.xml file. We have a
list of modules:
tools
protocol
common
logical
exec
drill-yarn
distribution
Note that contrib is not part of this list. The result is that, in a normal
build, the contrib
018 at 10:21 PM Paul Rogers
wrote:
> Hi All,
>
> Been reading up on distributed DB papers of late, including those passed
> along by this group. Got me thinking about Arina's question about where
> Drill might go in the long term.
>
> One thing I've noticed is that t
Hi All,
Been reading up on distributed DB papers of late, including those passed along
by this group. Got me thinking about Arina's question about where Drill might
go in the long term.
One thing I've noticed is that there are now quite a few distributed compute
frameworks, many of which
ee section 6.3. Also need to declare
column datatype before the query.
[1] http://www.vldb.org/pvldb/vol11/p1835-samwel.pdf
On Fri, Sep 7, 2018 at 9:47 AM Paul Rogers
wrote:
> Hi All,
>
> We've discussed quite a few times whether Drill should or should not
> support or require schema
Hi All,
We've discussed quite a few times whether Drill should or should not support or
require schemas, and if so, how the user might express the schema.
I came across a paper [1] that suggests a simple, elegant SQL extension:
EXTRACT [:] {,[:]}
FROM
Paraphrasing into Drill's SQL:
SELECT
I've been helping Charles with this. He's got a branch that works some times,
but not others.
* If I run his unit test from Eclipse, it works.
* If I run his unit test from the command line with Maven, it works.
* If he runs his unit test using the mechanism he is using, Drill can't find
his
Congratulations Charles! I look forward to your continued strong voice as an
expert Drill user in your new role.
- Paul
Sent from my iPhone
> On Sep 3, 2018, at 10:22 AM, Vitalii Diravka wrote:
>
> Congrats Charles!
> And thank you for your enthusiasm and work on Drill
>
>> On Mon, Sep 3,
Congratulations Weijie, thanks for your contributions to Drill.
Thanks,
- Paul
On Friday, August 31, 2018, 8:51:30 AM PDT, Arina Ielchiieva
wrote:
The Project Management Committee (PMC) for Apache Drill has invited Weijie
Tong to become a committer, and we are pleased to announce
Hi Sri,
The fact that each line can be converted, but multiple throw an error suggests
that you may have conflicting types. Drill tries to handle such cases, but
there are many holes, sounds like you are hitting one of them.
The error message mentions "SingleListWriter". The single list
Ro=
> QDZyPZEwolNN1wu5z4QMwajvdQ3iQPPQ0yycxhUUKw0=
>
>
> Kind regards
> Vitalii
>
>
> On Thu, Aug 23, 2018 at 3:02 AM Paul Rogers
> wrote:
>
> > Hi Tim,
> >
> > I don't have an answer. But, I can point out some factors to consider.
> >
> > H
Congratulations Volodymyr!
Thanks,
- Paul
On Friday, August 24, 2018, 5:53:25 AM PDT, Arina Ielchiieva
wrote:
I am pleased to announce that Drill PMC invited Volodymyr Vysotskyi to the
PMC and he has accepted the invitation.
Congratulations Vova and thanks for your contributions!
Hi Tim,
I don't have an answer. But, I can point out some factors to consider.
Hive describes a set of data in a specific file system. Would make sense to
associate that file system with the Hive configuration. Else, I could use a
Hive metastore for FS A, with a DFS configured for FS B, and
d of
> me.
>
> On Tue, Aug 21, 2018 at 4:43 PM Paul Rogers
> wrote:
>
>> Hi Chris,
>>
>
>
>> Later, when Drill sees the first Varchar, it can change the type from,
>> say, batch 3 onwards. But, JDBC and ODBC generally require the schema be
>> know
Hi All,
There is a cool new distributed framework coming out of UC Berkeley: Ray [1].
This is part of the RISE project which is the successor to the AmpLab project
that produced Spark. The Ray paper [2] provides a great overview.
(quote)
Ray is a high-performance distributed execution
ld not listen to today's
hangouts session unfortunately, sorry for possible ignorance)
Thanks,
Best Regards,
Alex
On Thu, Aug 9, 2018 at 7:51 PM Paul Rogers
wrote:
> Hi Alex,
>
> Perhaps Parth can jump in here as he has deeper knowledge of Parquet.
>
> My understanding is
Hi All,
My two cents...
The gist of the discussion is that 1) using Objects.checkNotNull() reduces the
Guava import footprint, vs. 2) we are not removing the Guava dependency, so
switching to Objects.checkNotNull() is unnecessary technically and is instead a
personal preference.
We make
:55 PM PDT, Chris Cunningham
wrote:
Hi. Mostly off topic, but reading about this issue has finally prompted a
response.
On Wed, Aug 15, 2018 at 5:46 PM Paul Rogers
wrote:
> If we provide schema hints ("field x, when it appears, will be a Double"),
> then Drill need
or Drill's internals? That's really the question the group will
want to answer.
More details below.
Thanks,
- Paul
On Monday, August 20, 2018, 9:41:49 AM PDT, Ted Dunning
wrote:
Inline.
On Mon, Aug 20, 2018 at 9:20 AM Paul Rogers
wrote:
> ...
> By contrast, migrating Drill in
ould be easier than was thought.
On Sat, Aug 18, 2018, 16:44 Paul Rogers wrote:
> Hi All,
>
> Charles recently suggested why Arrow integration could be helpful. (See
> quote below.) When we've looked at reworking Drill's internals to use
> Arrow, we found the project to be cost
we could avoid major
work to Drill.
I was concerned in reading about the ideas for Arrow integration, that it would
complicate existing UDFs and/or Format-plugins. How much of this do you
envision would be included with Drill?
—C
> On Aug 18, 2018, at 19:44, Paul Rogers wrote:
>
>
Hi All,
Charles recently suggested why Arrow integration could be helpful. (See quote
below.) When we've looked at reworking Drill's internals to use Arrow, we
found the project to be costly with little direct benefit in terms of
performance or stability. But, Charles points out that the real
; do.
>
> We also need some evangelists to broadcast the Drill project to adopt
> more contributors.
> It’s rarely to see Drill’s tech show to expand its community influence.
>
> On Wed, Aug 15, 2018 at 4:26 AM Paul Rogers
> wrote:
>
> > I wonder if we should pop th
Congratulations Boaz!
- Paul
On Friday, August 17, 2018, 2:56:27 AM PDT, Vitalii Diravka
wrote:
Congrats Boaz!
Kind regards
Vitalii
On Fri, Aug 17, 2018 at 12:51 PM Arina Ielchiieva wrote:
> I am pleased to announce that Drill PMC invited Boaz Ben-Zvi to the PMC and
> he has
; hope we move the mess schema solving logic out of Drill to let the code
> cleaner by defining the schema firstly with DDL statements. If we agree on
> this, the work should be a sub work of DRILL-6552.
>
> On Thu, Aug 16, 2018 at 8:51 AM Paul Rogers
> wrote:
>
> > Hi Ted,
&
Hi Tim,
IIRC, you have to do an initial allocation. There was a bug that, if you
didn't, the setSafe would try to double your vector from 0 items to 0 items.
This would be t0o small, so it would double again, forever.
In general, you don't want to start with an empty vector (or the default
Hi Ted,
I like the "schema auto-detect" idea.
As we discussed in a prior thread, caching of schema is a nice-add on once we
have defined the schema-on-read mechanism. Maybe we first get it to work with a
user-provided schema. Then, as an enhancement, we offer to infer the schema by
scanning
Hi Weijie,
Thanks for raising this topic. I think you've got a great suggestion.
My two cents: there is no harm in reading all manner of ugly data. But, rather
than try to process the mess throughout Drill (as we do today with schema
changes, just-in-time code generation, union vectors and the
lities like that are really needed. I’d like to see a
> generic HTTP storage plugin, a storage plugin for Google Sheets, If I can
> figure out how storage plugins work, I’ll gladly work on some of these.
>
> Just my .02.
> — C
>
>
>
>
>
> > On Aug 13, 2018, at 2
301 - 400 of 3384 matches
Mail list logo