The dictionary batches simply wrap a record batch with one “column”. There
should be no code difference (e.g. buffer layouts are the same) between the
code handling the data in a dictionary and a normal record batches. In
general, a dictionary may contain a null.
On Wed, Nov 8, 2017 at 4:05 PM
My analysis previously (if I recall) was basically (I think I did it on the
similar Parquet PR) was that no system truly supported N fields for all
operations (postgres was closest I believe). Some basic operations would
maintain them but they would quickly not behave differently (e.g. 24 hours
is
Agreed, that sounds like a great solution to this problem - the layout
information is redundant and it doesn't make sense to include it in
every schema.
Although I would argue we should write down exactly what buffers are
supposed to go on the wire in the dictionary batches (i.e. value
I have argued before on this list, and still believe, that you should
represent an interval as you would a number. If intervals are 64 bit
signed, then sure, use the 64 bit integer representation; if you were
to allow intervals with fixed precision and scale, then use the same
representation as
Makes sense. The key question is whether the data is represented as a
single 64-bit integer or as effectively a C struct
struct {
int32_t days;
int32_t milliseconds;
}
The struct representation cannot accommodate higher resolution units
like microseconds and nanoseconds. From my perspective,
I don't know many examples of interval being used in the real world.
But here's the kind of thing: the policy is that an offer is open for
60 hours, so if the offer is made to a particular customer at 12:34pm
on Sunday, you want to compute that it ends at 12:34am on Wednesday.
The interval "60
Totally awesome. Nice job Li and everyone else!
On Mon, Oct 30, 2017 at 2:22 PM, Phillip Cloud wrote:
> Congrats Li! This is awesome.
>
> On Mon, Oct 30, 2017 at 2:05 PM Wes McKinney wrote:
>
> > hi all,
> >
> > One of our newest committers, Li Jin, has
Per Jacques' comment in ARROW-1693
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244812#comment-16244812,
I think we should remove the buffer layout from the metadata. It would
be a good idea to do this for 0.8.0 since
Pleading ignorance on use of the SQL interval type, my prior would be
that many algorithms would first convert the interval components into
an absolute timedelta. Is that not the case?
My preference right now would be to have a single Interval type, where
the DAY_TIME type actually contains an
We spent a bunch of time trying to figure it out and as far as I can tell,
there is no way of supporting a larger number of users AND accepting all
entrants.
On Fri, Nov 3, 2017 at 2:58 PM, Wes McKinney wrote:
> @Jacques, is there a way the meeting can be configured so that
I'm all for moving interval to the new definition. I think we should avoid
introducing a timedelta type until it is really important. We need several
users demanding a type before we should implement it. Otherwise, we have
huge amounts of type bloat (which means nothing will fully implement the
Phillip Cloud created ARROW-1781:
Summary: [CI] OSX Builds on Travis-CI time out often
Key: ARROW-1781
URL: https://issues.apache.org/jira/browse/ARROW-1781
Project: Apache Arrow
Issue Type:
Atul Dambalkar created ARROW-1780:
-
Summary: JDBC Adapter for Apache Arrow
Key: ARROW-1780
URL: https://issues.apache.org/jira/browse/ARROW-1780
Project: Apache Arrow
Issue Type: New Feature
13 matches
Mail list logo