Since I'm reading a JSON file, I will try changing
JSONRecordReader.DEFAULT_ROWS_PER_BATCH. Thanks for the advice!
Eric
On Wed, Jul 6, 2016 at 12:42 AM, Abdel Hakim Deneche
wrote:
> It depends on the data you are querying, for .json you could change the
> value of
It depends on the data you are querying, for .json you could change the
value of JSONRecordReader.DEFAULT_ROWS_PER_BATCH, which is set by default
to 4096, but this will only affect the size of the batches produced by the
reader, other operators may still alter the batch size
On Tue, Jul 5, 2016
Thanks Abdel. Looking at the code, it looks like the maximum number of
records in a batch is 64k. I suspect the reason I'm having only 4k is that
it reached the capacity of the buffer in the batch. Is there a way to
relieve this capacity restriction? It doesn't have to be a configuration
option. I
Hi Neeraja,
Thanks for the guide!
Some strange error I see here.
The first two queries report the same error, but the third one works.
The only difference between #2 and #3 is "LIMIT X". And it does not matter
what's the number of X, it can be larger than row number of the table, and the
Unfortunately I don't think there is way to do it.
On Tue, Jul 5, 2016 at 3:58 PM, Eric Fukuda wrote:
> I'm trying to see how performance differs with different batch sizes. My
> table has 13 integer fields and 1 string field, and has 8M records.
> Following the code with
I'm trying to see how performance differs with different batch sizes. My
table has 13 integer fields and 1 string field, and has 8M records.
Following the code with a debugger, there seem to be 4096 records in a
batch. Can this be 8192 or larger?
On Tue, Jul 5, 2016 at 6:47 PM, Abdel Hakim
hey Eric,
Can you give more information about what you are trying to achieve ?
Thanks
On Tue, Jul 5, 2016 at 3:41 PM, Eric Fukuda wrote:
> Hi,
>
> Does anyone know if there is a way to increase or specify the number of
> records per batch manually?
>
> Thanks,
> Eric
>
Hi,
Does anyone know if there is a way to increase or specify the number of
records per batch manually?
Thanks,
Eric
Yes. Impersonation is enabled.:
drill.exec: {
cluster-id: "hhe",
zk.connect: "zk1:2181,zk22181,zk3:2181"
impersonation: {
enabled: true,
max_chained_user_hops: 3
}
}
On Mon, Jun 20, 2016 at 6:22 PM, Chun Chang wrote:
> Did you enable
Adding to Rahul,
You can also take a look at Calcite [2], (which Drill uses for SQL parsing
+ query optimization) and a relevant thread I found on the mailing list
[3].
[2] https://calcite.apache.org/
[3]
For a start, below is the relevant piece from the documentation [1]. You
can also prepend any query with "explain plan for" to view the exact plan
generated by Drill's Optimizer.
• Optimizer: Drill uses various standard database optimizations such as
rule based/cost based, as well as data
You might have run the two queries while the cache was still being built.
There is no concurrency control for the metadata cache at the moment (one
of the many improvements we need to make).
For metadata caching, the best practice with the current implementation is
to run a manual refresh metadata
I like the concept of logs in the web UI, however at this time, it assumes
that there will only be one directory for logfiles. The way I've set mine
up is to have different directories for logs, dcplogs, profiles, etc. That
way, I can organize them out a bit, and for those logs that are in json
Hello,
We want to test connecting Oracle BI (OBI) to Apache Drill. We saw the
JDBC/ODBC drivers option and have the following questions:
- Do you know if someone has already tested connecting OBI to Apache
Drill?
- Do you know if there is any specific requirement to use properly the
As John hinted, a session is not maintained by the UI/REST api unless
impersonation is enabled. So your alter session commands will have no
effect on the query.
That does not explain why you are not getting full results though. Is it
possible that the query is getting an error because your session
answers inline.
On Tue, Jul 5, 2016 at 8:39 AM, John Omernik wrote:
> Working with the 1.7.0, the feature that I was very interested in was the
> fixing of the Metadata Caching while using user impersonation.
>
> I have a large table, with a day directory that can contain up
Working with the 1.7.0, the feature that I was very interested in was the
fixing of the Metadata Caching while using user impersonation.
I have a large table, with a day directory that can contain up to 1000
parquet files each.
Planning was getting terrible on this table as I added new data,
Hello,
I'm new in the utilization of data virtualization and I try to understand the
running of Apache Drill.
I browsed the documentation but I didn't understand how is the running of the
optimizer.
Indeed, I learned that it's a cost-base optimizer, but nothing else.
I want to know how the
Did you get this resolved? I have the same issue
On Sun, Jul 3, 2016 at 10:49 PM, Santosh Kulkarni <
santoshskulkarn...@gmail.com> wrote:
> Hello,
>
> For a simple query select count(*) from table_name, Drill gives an error.
> Error: SYSTEM ERROR: IOException: Can't get Master Kerberos
19 matches
Mail list logo