Re: Number of records per batch

2016-07-05 Thread Eric Fukuda
Since I'm reading a JSON file, I will try changing JSONRecordReader.DEFAULT_ROWS_PER_BATCH. Thanks for the advice! Eric On Wed, Jul 6, 2016 at 12:42 AM, Abdel Hakim Deneche wrote: > It depends on the data you are querying, for .json you could change the > value of

Re: Number of records per batch

2016-07-05 Thread Abdel Hakim Deneche
It depends on the data you are querying, for .json you could change the value of JSONRecordReader.DEFAULT_ROWS_PER_BATCH, which is set by default to 4096, but this will only affect the size of the batches produced by the reader, other operators may still alter the batch size On Tue, Jul 5, 2016

Re: Number of records per batch

2016-07-05 Thread Eric Fukuda
Thanks Abdel. Looking at the code, it looks like the maximum number of records in a batch is 64k. I suspect the reason I'm having only 4k is that it reached the capacity of the buffer in the batch. Is there a way to relieve this capacity restriction? It doesn't have to be a configuration option. I

Re: How to query array in Hbase

2016-07-05 Thread GameboyNO1
Hi Neeraja, Thanks for the guide! Some strange error I see here. The first two queries report the same error, but the third one works. The only difference between #2 and #3 is "LIMIT X". And it does not matter what's the number of X, it can be larger than row number of the table, and the

Re: Number of records per batch

2016-07-05 Thread Abdel Hakim Deneche
Unfortunately I don't think there is way to do it. On Tue, Jul 5, 2016 at 3:58 PM, Eric Fukuda wrote: > I'm trying to see how performance differs with different batch sizes. My > table has 13 integer fields and 1 string field, and has 8M records. > Following the code with

Re: Number of records per batch

2016-07-05 Thread Eric Fukuda
I'm trying to see how performance differs with different batch sizes. My table has 13 integer fields and 1 string field, and has 8M records. Following the code with a debugger, there seem to be 4096 records in a batch. Can this be 8192 or larger? On Tue, Jul 5, 2016 at 6:47 PM, Abdel Hakim

Re: Number of records per batch

2016-07-05 Thread Abdel Hakim Deneche
hey Eric, Can you give more information about what you are trying to achieve ? Thanks On Tue, Jul 5, 2016 at 3:41 PM, Eric Fukuda wrote: > Hi, > > Does anyone know if there is a way to increase or specify the number of > records per batch manually? > > Thanks, > Eric >

Number of records per batch

2016-07-05 Thread Eric Fukuda
Hi, Does anyone know if there is a way to increase or specify the number of records per batch manually? Thanks, Eric

Re: Drill - Hive - Kerberos

2016-07-05 Thread Joseph Swingle
Yes. Impersonation is enabled.: drill.exec: { cluster-id: "hhe", zk.connect: "zk1:2181,zk22181,zk3:2181" impersonation: { enabled: true, max_chained_user_hops: 3 } } On Mon, Jun 20, 2016 at 6:22 PM, Chun Chang wrote: > Did you enable

Re: Help with the Optimizer of Apache Drill

2016-07-05 Thread Abhishek Girish
Adding to Rahul, You can also take a look at Calcite [2], (which Drill uses for SQL parsing + query optimization) and a relevant thread I found on the mailing list [3]. [2] https://calcite.apache.org/ [3]

Re: Help with the Optimizer of Apache Drill

2016-07-05 Thread rahul challapalli
For a start, below is the relevant piece from the documentation [1]. You can also prepend any query with "explain plan for" to view the exact plan generated by Drill's Optimizer. • Optimizer: Drill uses various standard database optimizations such as rule based/cost based, as well as data

Re: Initial Feed Back on 1.7.0 Release

2016-07-05 Thread Parth Chandra
You might have run the two queries while the cache was still being built. There is no concurrency control for the metadata cache at the moment (one of the many improvements we need to make). For metadata caching, the best practice with the current implementation is to run a manual refresh metadata

Drill1.7 Feedback - Logs in Web UI

2016-07-05 Thread John Omernik
I like the concept of logs in the web UI, however at this time, it assumes that there will only be one directory for logfiles. The way I've set mine up is to have different directories for logs, dcplogs, profiles, etc. That way, I can organize them out a bit, and for those logs that are in json

What are the JDBC/ODBC requirements to connect to Drill?

2016-07-05 Thread Juan Diego Ruiz Perea
Hello, We want to test connecting Oracle BI (OBI) to Apache Drill. We saw the JDBC/ODBC drivers option and have the following questions: - Do you know if someone has already tested connecting OBI to Apache Drill? - Do you know if there is any specific requirement to use properly the

Re: missing data in json structure when using web / api

2016-07-05 Thread Parth Chandra
As John hinted, a session is not maintained by the UI/REST api unless impersonation is enabled. So your alter session commands will have no effect on the query. That does not explain why you are not getting full results though. Is it possible that the query is getting an error because your session

Re: Initial Feed Back on 1.7.0 Release

2016-07-05 Thread Abdel Hakim Deneche
answers inline. On Tue, Jul 5, 2016 at 8:39 AM, John Omernik wrote: > Working with the 1.7.0, the feature that I was very interested in was the > fixing of the Metadata Caching while using user impersonation. > > I have a large table, with a day directory that can contain up

Initial Feed Back on 1.7.0 Release

2016-07-05 Thread John Omernik
Working with the 1.7.0, the feature that I was very interested in was the fixing of the Metadata Caching while using user impersonation. I have a large table, with a day directory that can contain up to 1000 parquet files each. Planning was getting terrible on this table as I added new data,

Help with the Optimizer of Apache Drill

2016-07-05 Thread Benamor, Adel
Hello, I'm new in the utilization of data virtualization and I try to understand the running of Apache Drill. I browsed the documentation but I didn't understand how is the running of the optimizer. Indeed, I learned that it's a cost-base optimizer, but nothing else. I want to know how the

Re: Drill on Hive table

2016-07-05 Thread Joseph Swingle
Did you get this resolved? I have the same issue On Sun, Jul 3, 2016 at 10:49 PM, Santosh Kulkarni < santoshskulkarn...@gmail.com> wrote: > Hello, > > For a simple query select count(*) from table_name, Drill gives an error. > Error: SYSTEM ERROR: IOException: Can't get Master Kerberos