Re: Avro - Let's talk Avro again

2017-08-19 Thread Stefán Baxter
may not sound nice to the ears but is exactly the kind of feedback > that will make this project truly successful. > > Best, > Saurabh > > > > I > > On Fri, Aug 18, 2017 at 1:42 PM, Stefán Baxter <ste...@activitystream.com> > wrote: > > > Hi Joh

Re: Drill schema handling [Was: Avro - Let's talk Avro again]

2017-08-19 Thread Stefán Baxter
efinitely trying to prioritize > > given what we have. But we do not have to feel constrained. We can get > more > > developers to participate in this and help out. And I am very positive > > about that approach-I know that I helped a user here to get help on using > > Apach

Re: Avro - Let's talk Avro again

2017-08-18 Thread Stefán Baxter
;j...@omernik.com> wrote: > > > > I was guessing you would chime in with a response ;) > > > > Are you still using Drill w/ Avro how has things been lately? > > > > On Thu, Aug 17, 2017 at 8:00 AM, Stefán Baxter < > ste...@activitystream.com> > > wro

Re: Avro - Let's talk Avro again

2017-08-17 Thread Stefán Baxter
woha!!! (sorry, I just had to) Best of luck with that! Regards, -Stefán On Thu, Aug 17, 2017 at 12:37 PM, John Omernik wrote: > I know Avro is the unwanted child of the Drill world. (I know others have > tried to mature the Avro support and that has been something that

Re: Parquet filter pushdown and string fields that use dictionary encoding

2017-06-01 Thread Stefán Baxter
r a filter condition's value > > is present in the subsequent data pages. This would (most likely) be done > > during execution time, and I don't believe Drill does that as yet. > > > > > > > > <http://www.mapr.com/> > > > > ___

Re: Parquet filter pushdown and string fields that use dictionary encoding

2017-05-31 Thread Stefán Baxter
The issue of correctness for comparison is what makes the dependency on > min-max statistics by the Parquet library be unreliable. > > > ____ > From: Stefán Baxter <ste...@activitystream.com> > Sent: Monday, May 29, 2017 1:41:30 PM > To:

Parquet filter pushdown and string fields that use dictionary encoding

2017-05-29 Thread Stefán Baxter
of those unique values to facilitate "segment pruning" when looking for data belonging to individual sessions/customers. Best regards, -Stefán Baxter

Re: Batch load of unstructured data in Drill

2016-12-07 Thread Stefán Baxter
Hi Alexander, Drill allows you to both a) query the data directly in json format and b) convert it to Parqet (have a look at the CTAS function) Hope that helps, -Stefán On Wed, Dec 7, 2016 at 1:08 PM, Alexander Reshetov < alexander.v.reshe...@gmail.com> wrote: > Hello, > > I want to load

Re: Reading Avro Arrays

2016-04-12 Thread Stefán Baxter
ot; : "field1", > >> > > "type" : "int" > >> > > } ] > >> > > }, > >> > > "java-class" : "java.util.List" > >> > > } > >> > >

Re: Continued Avro Frustration

2016-04-01 Thread Stefán Baxter
ant > > issues that you need to solve. From there, you can continue to expect > that > > people will help you--as they can. There are no guarantees in open > source. > > Everything comes through the kindness and shared goals of those in the > > community. > > > > t

Re: Continued Avro Frustration

2016-04-01 Thread Stefán Baxter
s of those in the > community. > > thanks, > Jacques > > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Fri, Apr 1, 2016 at 5:43 AM, Stefán Baxter <ste...@activitystream.com> > wrote: > > > Hi, > > > > Is it at all possible th

Re: Continued Avro Frustration

2016-04-01 Thread Stefán Baxter
ue to expect that > people will help you--as they can. There are no guarantees in open source. > Everything comes through the kindness and shared goals of those in the > community. > > thanks, > Jacques > > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > &

Continued Avro Frustration

2016-04-01 Thread Stefán Baxter
Hi, Is it at all possible that we are the only company trying to use Avro with Drill to some serious extent? We continue to coma across all sorts of embarrassing shortcomings like the one we are dealing with now where a schema change exception is thrown even when working with a single Avro file

Re: Drills' support for Avro Union types and

2016-03-28 Thread Stefán Baxter
It's quite plausible that this has nothing to do with union types I just assumed that simple Avro schema must fully supported but that should not be taken for granted. On Mon, Mar 28, 2016 at 3:40 PM, Stefán Baxter <ste...@activitystream.com> wrote: > Hi, > > I have reworked/refa

Drills' support for Avro Union types and

2016-03-28 Thread Stefán Baxter
Hi, I have reworked/refactored our Avro based logging system trying to make the whole Drill + Avro->Parquet experience a bit more agreeable. Long story short I'm getting this error when selecting form multiple Avro files even though these files share the EXCACT same schema: Error:

Re: Reading Avro Arrays

2016-03-25 Thread Stefán Baxter
; > accessing nested rich objects with drill; I somehow got that wrong from > the > > documentation... > > > > Cheers, > > Johannes > > > > On Thu, Mar 24, 2016 at 2:14 PM, Stefán Baxter < > ste...@activitystream.com> > > wrote: > > > > > FYI:

Re: Reading Avro Arrays

2016-03-24 Thread Stefán Baxter
reference. I tried drill 1.6, the data is an array of complex objects > though. I will try to setup a drill dev environment and see if i can modify > the tests to fail. > > Johannes > > On Wed, Mar 23, 2016 at 8:13 PM, Stefán Baxter <ste...@activitystream.com> > wrote: > &g

Professional services for Drill

2016-03-23 Thread Stefán Baxter
Hi, I there anyone here that provides professional service for Drill? We are trying to optimize our system in order to speed up smaller queries and aiming for sub second response times when dealing with a < 100 million records from Parquet. We are, for example, looking at profiles where

Re: Reading Avro Arrays

2016-03-23 Thread Stefán Baxter
FYI. this seems to be working in 1.6, at least on the Avro data that we have. On Wed, Mar 23, 2016 at 6:59 PM, Stefán Baxter <ste...@activitystream.com> wrote: > Hi again, > > What version of Drill are you using? > > Regards, > - Stefán > > On Wed, Mar 23, 2016

Re: Reading Avro Arrays

2016-03-23 Thread Stefán Baxter
Hi again, What version of Drill are you using? Regards, - Stefán On Wed, Mar 23, 2016 at 4:49 PM, Stefán Baxter <ste...@activitystream.com> wrote: > Hi Johannes, > > As great as Drill is the Avro plugin has been a source of frustration for > us @activitystream. > > We

Re: Reading Avro Arrays

2016-03-23 Thread Stefán Baxter
Hi Johannes, As great as Drill is the Avro plugin has been a source of frustration for us @activitystream. We have a small UDF library [1] (apache licensed) which contains a function can return an array (List) from Avro as a CSV list. You could use that to roll your own or provide me with a

Re: Avro storage strategy?

2016-03-08 Thread Stefán Baxter
Hi, We use Avro to store/accumulate/badge streaming data and then we migrate it to Parquet. We then use union queries to merge fresh and historical data (Avro + Parquet) Things to keep in mind (AFAIK): - Avro is a lot slower and more inefficient, storage space and performance wise, than

Avro no longer selects data correctly from a sub-structure :: 1.6-SNAPSHOT

2016-03-05 Thread Stefán Baxter
Hi, This used to work in 1.5 and I think it must be a regression. Parquet: 0: jdbc:drill:zk=local> select s.client_ip.ip from dfs.asa.`/processed/<>/transactions` as s limit 1; ++ | EXPR$0 | ++ | 87.55.171.210 | ++ 1 row selected (1.184

Re: Is using an aggregate value in a where clause not supported?

2016-03-04 Thread Stefán Baxter
; [1] > http://stackoverflow.com/questions/2068682/why-cant-i-use-alias-in-a-count-column-and-reference-it-in-a-having-clause > > On Fri, Mar 4, 2016 at 12:40 PM, Stefán Baxter > <ste...@activitystream.com> wrote: > > Having fails as well > > > > On Fri, Mar 4, 2016 at

Re: Is using an aggregate value in a where clause not supported?

2016-03-04 Thread Stefán Baxter
ving trans_count > 70; > > On Fri, Mar 4, 2016 at 11:53 AM, Stefán Baxter <ste...@activitystream.com> > wrote: > > > Hi, > > > > Having adds to the trouble and claims that the field needs to be grouped > > and then fails t

Re: Is using an aggregate value in a where clause not supported?

2016-03-04 Thread Stefán Baxter
rum...@maprtech.com> wrote: > Try using the HAVING clause. The WHERE clause cannot constrain the results > of aggregate functions. > http://drill.apache.org/docs/having-clause/ > > On Fri, Mar 4, 2016 at 11:34 AM, Stefán Baxter <ste...@activitystream.com> > wrote: > > >

Is using an aggregate value in a where clause not supported?

2016-03-04 Thread Stefán Baxter
Hi, I'm using parquet+drill and the following statement works just fine: select sold_to, count(*) as trans_count from dfs.asa.`/processed/venuepoint/transactions` where group by sold_to; When addin this where clause nothing is returned: select sold_to, count(*) as trans_count from

Avro support in Drill - Missing support for the IN operator and other frustrating things

2016-02-25 Thread Stefán Baxter
Drill website that the Avro support is experimental, at best - Stefán Baxter

Issue with compression :: Using Drill 1.5 and parquet-mr/parquet-avro 1.8.1

2016-02-08 Thread Stefán Baxter
Hi, I'm using following Avro Parquet writer to convert our Avro to Parquet: writer = AvroParquetWriter .builder( new Path( parquetFile.getPath() )) .withSchema(schema) .enableDictionaryEncoding() .withDataModel(ReflectData.get())

Re: Parquet drill date fields

2016-02-04 Thread Stefán Baxter
> - Jason > > On Thu, Feb 4, 2016 at 3:40 PM, Stefán Baxter <ste...@activitystream.com> > wrote: > > > OK, the automatic handling and encoding options improve a lot in Parquet > > 2.0. (Manual override is not an option) > > > > I'm using parquet-mr/parquet-a

Re: Parquet drill date fields

2016-02-04 Thread Stefán Baxter
Regards, -Stefán On Thu, Feb 4, 2016 at 4:51 PM, Stefán Baxter <ste...@activitystream.com> wrote: > Hi again, > > I did a little test and ~5 million fairly wide records take 791 MB in > parquet without dictionary encoding and 550MB with dictionary encoding > enabled (The n

Re: Parquet drill date fields

2016-02-04 Thread Stefán Baxter
in ~20% less time than the one that uses dictionary encoding. Regards, -Stefán On Thu, Feb 4, 2016 at 3:48 PM, Stefán Baxter <ste...@activitystream.com> wrote: > Hi Jason, > > Thank you for the explanation. > > I have several *low* cardinality fields that con

Re: Parquet drill date fields

2016-02-04 Thread Stefán Baxter
s it obviously is not a good choice, and so we actually take a > performance hit re-materializing the data out of the dictionary upon read. > > If you would be interested in trying to contribute such an enhancement I > would be willing to help you get started with it. > > - Jason > &g

Writing Drill compatible Parquet in Java using parquet-mr

2016-02-04 Thread Stefán Baxter
Hi, What things do I need to know if I want to write Drill compatible Parquet in Java using Parquet-MR? - Latest stable version of Parquet-MR is 1.8.1 is that too new? - Will the standard Parquet work? - Any specific footer information required - Are there any does and don'ts? I

Re: Convert ISO 8601 string to timestamp

2016-02-03 Thread Stefán Baxter
Hi, This small UDF project contains a asTimestamp function that you may find useful: https://github.com/activitystream/asdrill It accepts multiple value types and returns them as timestamp. Feel free to do with it what you will :) Regards, -Stefán On Wed, Feb 3, 2016 at 6:22 PM, Jason

Re: Avro reader - Possible regression in 1.5-SNAPSHOT

2016-02-02 Thread Stefán Baxter
L73 > > On Tue, Feb 2, 2016 at 9:07 AM, Abdel Hakim Deneche > <adene...@maprtech.com> wrote: > > Thanks > > > > On Tue, Feb 2, 2016 at 9:03 AM, Stefán Baxter <ste...@activitystream.com > > > > wrote: > > > >> https://issues.apache.

Re: Avro reader - Possible regression in 1.5-SNAPSHOT

2016-02-02 Thread Stefán Baxter
https://issues.apache.org/jira/browse/DRILL-4339 On Tue, Feb 2, 2016 at 4:46 PM, Abdel Hakim Deneche <adene...@maprtech.com> wrote: > Hi Stefán, > > Can you open a JIRA for this, please ? > > Thanks > > On Tue, Feb 2, 2016 at 6:21 AM, Stefán Baxter <ste...@activit

Re: Hangout Starting

2016-01-29 Thread Stefán Baxter
Hi Scot, Could you please tell me a bit more about Drill+Ignite? Regards, -Stefan On Fri, Jan 29, 2016 at 2:57 PM, scott cote wrote: > Anyone working on a DrillBit that can poke into an ignite grid? > > SCott > > > On Jan 26, 2016, at 11:58 AM, Jacques Nadeau

Re: How to keep s3 data in memory with apache drill?

2016-01-26 Thread Stefán Baxter
Hi, I think the latest version of Tachyon uses a transparent storage structure. Regards, -Stefán On Tue, Jan 26, 2016 at 10:05 AM, Stephan Kölle wrote: > Querying JSON data stored on aws s3 with apache drill works awesome, but > drill fetches the data fresh from s3 for

Re: Re: How to keep s3 data in memory with apache drill?

2016-01-26 Thread Stefán Baxter
Hi, I got an email from the Tachyon team a while back were they informed my of this change. I think you should visit their google group and check the status of this change. Regards, -Stefán On Tue, Jan 26, 2016 at 9:28 PM, Stephan Kölle wrote: > I'm working with

Re: Efficient joins in Drill - avoiding the massive overhead of scan based joins

2016-01-17 Thread Stefán Baxter
ows. > > When applying the join condition, only 10k rows are needed from the right > > side. > > > > How long does it take to read a few million records from Lucene? > (Recently > > with Elastic we've been seeing ~50-100k/second per thread when only > > retrie

Re: Efficient joins in Drill - avoiding the massive overhead of scan based joins

2016-01-17 Thread Stefán Baxter
ve been seeing ~50-100k/second per thread when only > retrieving a single stored field.) > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Sat, Jan 16, 2016 at 12:11 PM, Stefán Baxter <ste...@activitystream.com > > > wrote: > > > Hi Jacqu

Efficient joins in Drill - avoiding the massive overhead of scan based joins

2016-01-16 Thread Stefán Baxter
Hi, Can anyone point me to an implementation where joins are implemented with full support for filters and efficient handling of joins based on indexes. The only code I have come across all seems to rely on complete scan of the related table and that is not acceptable for the use case we are

Re: Efficient joins in Drill - avoiding the massive overhead of scan based joins

2016-01-16 Thread Stefán Baxter
ter idea of how to point you in > the right direction. > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Sat, Jan 16, 2016 at 5:18 AM, Stefán Baxter <ste...@activitystream.com> > wrote: > > > Hi, > > > > Can anyone point me to an i

Re: 1.5-SNAPSHOT and UDFs

2016-01-14 Thread Stefán Baxter
4f3fd-c1fa-4a45-84e7-ae31ae095427 on Lightning:31010] Please note that the construct parameters and the expected classes match 100% Regards, -Stefan On Tue, Jan 12, 2016 at 3:35 PM, Stefán Baxter <ste...@activitystream.com> wrote: > > OK, I came across this in some UDF s

Re: 1.5-SNAPSHOT and UDFs

2016-01-14 Thread Stefán Baxter
Hi again, I modified this slightly and now it works (it's accepting ValueHolder and resolving that function correctly) regards, -Stefan On Thu, Jan 14, 2016 at 8:25 AM, Stefán Baxter <ste...@activitystream.com> wrote: > Hi Jacques, > > I'm still struggling with this. The fu

Lucene Plugin :: Join Filter and pushdown

2016-01-14 Thread Stefán Baxter
Hi, I'm working on the Lucene plugin (see previous email) and the focus now is support for joins with filter push-down to avoid the default table scan that is provided by default. I'm fairly new to Drill and in over my head, to be honest, but this is fun and with this addition the Lucene plugin

Re: Community Drill UDFs

2016-01-13 Thread Stefán Baxter
Hi, This is my playgrund https://github.com/activitystream/asdrill Please feel free to do with it what you will. I'd be more than happy to participate in a community driven UDF project. Regards, Stefán On Wed, Jan 13, 2016 at 10:28 PM, Ted Dunning wrote: > There is

Re: 1.5-SNAPSHOT and UDFs

2016-01-12 Thread Stefán Baxter
g bufAsString = ByteBufUtil.decodeString(myBuf.nioBuffer(), > Charsets.UTF_8); > > > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Fri, Jan 8, 2016 at 12:04 PM, Stefán Baxter <ste...@activitystream.com> > wrote: > > > So, >

Minor assistance with a Lucene plugin (reader for now)

2016-01-11 Thread Stefán Baxter
Hi, Rahul Challapalli started to implement a Lucene reader a while back and I'm trying to pitch in. (#1/#2) I have made some progress but I could benefit from talking to some one that know his way around the implementation of a reader/writer before I continue. Discussion points: - Pest

1.5-SNAPSHOT and UDFs

2016-01-08 Thread Stefán Baxter
Hi, My UDFs have stopped working with the latest version of 1.5-SNAPSHOT (pulled just now). The error is: SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: "{DrillBuf[74], udle identityHash..."

Re: 1.5-SNAPSHOT and UDFs

2016-01-08 Thread Stefán Baxter
value but now it returns: {DrillBuf[77], udle identityHashCode == 1660956802, identityHashCode == 343154168} PT1H The value is there in the second line (Seems to include a newline character) Any ideas? Regards, -Stefan On Fri, Jan 8, 2016 at 7:24 PM, Stefán Baxter <ste...@activitystream.

Re: 1.5-SNAPSHOT and UDFs

2016-01-08 Thread Stefán Baxter
ocIfNeeded(someValue.length()); for (Byte aByte : someValue.toString().getBytes()) output.buffer.setByte(output.end ++, aByte); } } On Fri, Jan 8, 2016 at 7:43 PM, Stefán Baxter <ste...@activitystream.com> wrote: > Hi, > > This seems to have something to do with r

Re: 1.5-SNAPSHOT and UDFs

2016-01-08 Thread Stefán Baxter
Its also interesting that this signature is formatted/created every time a value is fetched. Regards, -Stefán On Fri, Jan 8, 2016 at 7:48 PM, Stefán Baxter <ste...@activitystream.com> wrote: > Hi again, > > This code can be used to reproduce this behavior: > > @Function

Re: Does drill recognize new line correctly?

2016-01-06 Thread Stefán Baxter
"test \n value"} > > => No result found. > # This is valid JSON, I believe isn't it? > > Target JSON: > Same file with copied row and edited. > > {"a": "test value"} > {"a": "test \n value"} > > =&

Re: Does drill recognize new line correctly?

2016-01-05 Thread Stefán Baxter
Hi, I'm not the right person to give you an answer from the Drill perspective but is it possible that your JSON serializer is not escaping characters that should be escaped? please see: -

Re: Avro - Schema is good - Schema validation is bad

2015-12-25 Thread Stefán Baxter
igAvroStorage ><https://cwiki.apache.org/confluence/display/PIG/AvroStorage>. >- If the schema validation flag is set, then we can consider the union >schema of all the files in a directory recursively. > > > On Fri, Dec 18, 2015 at 9:17 AM, Stefán Baxter <ste...@

Re: Join with empty table

2015-12-18 Thread Stefán Baxter
Hi Andries, Where does it say that the query for an non existing file is unions/joins should fail? I ask because I'm interested in the "basic rules of Drill". Rgards, -Stefán On Fri, Dec 18, 2015 at 4:37 PM, Andries Engelbrecht < aengelbre...@maprtech.com> wrote: > 1) When the table/file

Re: Avro - Schema is good - Schema validation is bad

2015-12-17 Thread Stefán Baxter
e JIRA. I feel, there is > already JIRA for this. > > https://issues.apache.org/jira/browse/DRILL-4120?focusedCommentId=15048070=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15048070 > > > On Thu, Dec 17, 2015 at 1:28 AM, Stefán Baxter <ste...@activit

Avro - Schema is good - Schema validation is bad

2015-12-14 Thread Stefán Baxter
Hi, I'm getting the following error when querying Avro files: Error: VALIDATION ERROR: From line 1, column 48 to line 1, column 57: Column 'some_col' not found in any table It's true that the field is in none of the tables I'm targeting, in that particular query, but that does not mean that it

Re: Avro - Schema is good - Schema validation is bad

2015-12-14 Thread Stefán Baxter
ue, Dec 15, 2015 at 1:10 AM, Ted Dunning <ted.dunn...@gmail.com> wrote: > Sigh of relief is premature. Nobody has committed to carrying this > interpretation forward. > > > > On Mon, Dec 14, 2015 at 11:44 AM, Stefán Baxter <ste...@activitystream.com > > > wrote: &g

Re: Avro - Schema is good - Schema validation is bad

2015-12-14 Thread Stefán Baxter
ppen, and at run time, it constructs the schema for them and hence nulls > for invalid fields. > > > On Mon, Dec 14, 2015 at 2:36 PM, Stefán Baxter <ste...@activitystream.com> > wrote: > > > Hi, > > > > I'm getting the following error when querying Avro files: &g

Re: Avro - Schema is good - Schema validation is bad

2015-12-14 Thread Stefán Baxter
use the partitioning to limit the files that I see could > include or exclude more recent files that have added a new field. > > That means that a query would succeed or fail according to which date range > I use for the query. > > That seems pretty radically bad. > > > &g

Re: 'dir0' not found in any table

2015-11-21 Thread Stefán Baxter
n Sat, Nov 21, 2015 at 1:44 PM, Stefán Baxter <ste...@activitystream.com> > wrote: > > > Hi, > > > > I just created this: https://issues.apache.org/jira/browse/DRILL-4120 > > > > Regards, > > -Stefan > > > > On Sat, Nov 21, 2015 at 9:28 PM,

Re: 'dir0' not found in any table

2015-11-21 Thread Stefán Baxter
Hi, I just created this: https://issues.apache.org/jira/browse/DRILL-4120 Regards, -Stefan On Sat, Nov 21, 2015 at 9:28 PM, Stefán Baxter <ste...@activitystream.com> wrote: > > After some digging around there is an explanation. > > This all works fine when the directory

CTAS - Converting Avro files to parquet - Missing timestamp datatype

2015-11-21 Thread Stefán Baxter
Hi, We are using Avro files for all our logging and they contain long timestamp_mills values. When they are converted to Parquet using CTAS we wither need a hint (or something) to ensure that these columns become Timestamp values in parquet - or - we need to create a complex select with casting.

Re: 'dir0' not found in any table

2015-11-21 Thread Stefán Baxter
select l_partkey from dfs.root.`/src/data/stuff` where dir0 = 's1' limit > 1; > ++ > | l_partkey | > ++ > | 1552 | > ++ > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Sat, Nov 21, 2015 at 3:47 AM, Stefán Baxter <ste.

Searching in Arrays - Parquet - REPEATED_CONTAINS

2015-11-20 Thread Stefán Baxter
Hi, I'm trying to use a an array in Parquet to store list of IDs (1:* scenario) as opposed to put each ID in a separate field. (array contains 1-10 values) This requires me to use REPEATED_CONTAINS to search for these values. I was expecting a performance penalty but it turns out that searching

Re: UDFs, RepeatedVarCharHolder and null values

2015-11-10 Thread Stefán Baxter
ray values and it just passes the null back. Can't say I like the approach but it works. Regards, -Stefan On Tue, Nov 10, 2015 at 1:24 PM, Stefán Baxter <ste...@activitystream.com> wrote: > Hi, > > I have a UDF that deals with arrays and concatenates their value. It's > working fin

Parquet and dictionary based encoding in Drill 1.3

2015-11-10 Thread Stefán Baxter
Hi, Is it safe switch on store.parquet.enable_dictionary_encoding and is the scanning of dictionary based columns optimized? Regards, -Stefán

Re: UDFs and 1.3

2015-11-10 Thread Stefán Baxter
However, we did get much more strict about making sure your >> drill-module.conf points at the right package names. >> >> Note here, specifically the addition of package names: >> >> >> https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main

Re: Avro deserialization bug - 1.3-SNAPSHOT

2015-11-10 Thread Stefán Baxter
d you please raise a Jira with sample schema and sample input to > reproduce it. I will look into this. > > On Tue, Nov 10, 2015 at 7:55 PM, Stefán Baxter <ste...@activitystream.com> > wrote: > > > Hi, > > > > I have an Avro file that support the following da

Avro deserialization bug - 1.3-SNAPSHOT

2015-11-10 Thread Stefán Baxter
Hi, I have an Avro file that support the following data/schema: {"field":"some", "classification":{"variant":"Gæst"}} When I select 10 rows from this file I get: +-+ | EXPR$0| +-+ | Gæst| | Voksen | | Voksen

UDFs, RepeatedVarCharHolder and null values

2015-11-10 Thread Stefán Baxter
Hi, I have a UDF that deals with arrays and concatenates their value. It's working fine with JSON but when working with Avro it returns an error. The error seems a bit misleading as it claims to be bot a schema change exception and a missing function exception. *The error is:* Error: SYSTEM

UDFs and 1.3

2015-11-09 Thread Stefán Baxter
Hi, I have a small set of UDFs that I have been running with Drill 1.1/1.2 which I'm trying to get working with 1.3 to no avail. It's as if the library is no picked up correctly even though the error I get indicates a missing function signature (variant): Error: VALIDATION ERROR: From line 1,

Re: UDFs and 1.3

2015-11-09 Thread Stefán Baxter
Hi, Now they are but the outcome remains the same. Any additional pointers? Regards, -Stefan On Mon, Nov 9, 2015 at 7:21 PM, Stefán Baxter <ste...@activitystream.com> wrote: > Hi Nathan, > > thank you for a prompt reply. > > I thought the were but they are in fact

Re: UDFs and 1.3

2015-11-09 Thread Stefán Baxter
rudimentary, but it's always > nice to start with some sanity checks :) > > Best, > Nathan > > On Mon, Nov 9, 2015 at 10:56 AM, Stefán Baxter > <ste...@activitystream.com> wrote: > > Hi, > > > > I have a small set of UDFs that I have been ru

Re: Drill and Parquet - Best practices - part 1

2015-11-05 Thread Stefán Baxter
ictionary encoded files, > because we just go ahead and materialize all of the dictionary values into > the full dataset right away at the reader, so we don't currently do any > dictionary based filtering right now. > > Looking back in this thread seems like there are a lot o

Re: Drill and Parquet - Best practices - part 1

2015-11-03 Thread Stefán Baxter
Hi again, Are incrimental timestamp values (long) being encoded in Parquet as incremental values? (This option in parquet to refrain from storing complete numbers and store only the delta between numbers to save space) Regards, -Stefan On Mon, Nov 2, 2015 at 5:54 PM, Stefán Baxter <

Re: Drill and Parquet - Best practices - part 1

2015-11-02 Thread Stefán Baxter
itive, but have involved a good amount of > > trial and error (as you can see from some of my user posts) (also, the > user > > group has been great, but to my point about user stories, my education > has > > come from posting stories and getting feedback from the community, it &g

Re: Drill and Parquet - Best practices - part 1

2015-11-01 Thread Stefán Baxter
So we are off to a flying start :) On Thu, Oct 29, 2015 at 9:50 PM, Stefán Baxter <ste...@activitystream.com> wrote: > Hi, > > We are using Avro, JSON and Parquet for collection various types of data > for analytical processing. > > I have not used Parquet before we

Drill and Parquet - Best practices - part 1

2015-10-29 Thread Stefán Baxter
Hi, We are using Avro, JSON and Parquet for collection various types of data for analytical processing. I have not used Parquet before we starting to play around with Drill and now I'm wondering if we are planing our data structures correctly and if we will be able to get the most out of

Re: directory structure containing multiple file types

2015-10-19 Thread Stefán Baxter
Hi Ted, Your approach only works for a single directory, not a directory structure. I will create an improvement request later today. I would welcome a session on "What's needed in Drill to truly eliminate ETL" (Just an idea) Regards, -Stefan On Sun, Oct 18, 2015 at 10:30 PM, Ste

Re: directory structure containing multiple file types

2015-10-18 Thread Stefán Baxter
ld be addressed. > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Sat, Oct 17, 2015 at 10:33 AM, Stefán Baxter <ste...@activitystream.com > > > wrote: > > > Thanks Abhishek, > > > > I think Drill is still quite far from eliminating ETL and

Re: directory structure containing multiple file types

2015-10-17 Thread Stefán Baxter
Thanks Abhishek, I think Drill is still quite far from eliminating ETL and the list of obstacles on the way to there seems growing. (yeah, disappointment got me for a bit) Regards, -Stefan

directory structure containing multiple file types

2015-10-17 Thread Stefán Baxter
Hi, I have a single directory structure containing both .avro and .json files. There content is the same and they use the same schema (Avro files explicitly and JSON files implicitly). When I query the directory Drill returns an error informing me that the Avro files can not be read as JSON

Re: Avro support

2015-10-16 Thread Stefán Baxter
tefan On Fri, Oct 16, 2015 at 4:31 PM, Stefán Baxter <ste...@activitystream.com> wrote: > Hi Kamesh, > > I must be lagging behind then, let me double check. > > Regards, > -Stefan > > On Fri, Oct 16, 2015 at 4:23 PM, Kamesh <kamesh.had...@gmail.com> wrote:

Re: Dates -> Avro -> Parquet

2015-10-15 Thread Stefán Baxter
Jacques Nadeau > CTO and Co-Founder, Dremio > > On Thu, Oct 15, 2015 at 7:31 AM, Stefán Baxter <ste...@activitystream.com> > wrote: > > > Hi Chris, > > > > I understand now, thank you. > > > > What threw me off was that, in our standard use-case, we

Re: Dates -> Avro -> Parquet

2015-10-15 Thread Stefán Baxter
icely to the SQL type TIMESTAMP (which is why we > cast). > > Again, hope this helps. > > Cheers -- Chris > > > On 15 Oct 2015, at 14:46, Stefán Baxter <ste...@activitystream.com> > wrote: > > > > Thank you Chris, this clarifies a whole lot :). > >

Re: Review for UDFs

2015-10-06 Thread Stefán Baxter
to put them in Drill happens? > > I can add you as a collaborator as well. > > > > On Thu, Oct 1, 2015 at 2:02 PM, Stefán Baxter <ste...@activitystream.com> > wrote: > > > Added a license and moved it here: > > https://github.com/activitystream/asdrill >

Status of the JDBC Storage Plugin

2015-10-04 Thread Stefán Baxter
Hi, I have been reading up on the JDBC storage plugin here and in Jira and see it's still planned for 1.2. can someone please tell me the status of that and if the limitations, that seem to be there, regarding push-down will be addressed before 1.2 or if they have been addressed already.

Re: repeated_contains - intended behaviour?

2015-10-04 Thread Stefán Baxter
ains() should be used for exact matching. I'm > > inclined to suggest something like repeated_contains_regex_matching() for > > the other, but that is a bit long. > > > > On Mon, Sep 21, 2015 at 2:41 AM, Stefán Baxter < > ste...@activitystream.com> > > wrote: > >

Re: Review for UDFs

2015-10-01 Thread Stefán Baxter
Added a license and moved it here: https://github.com/activitystream/asdrill Regards, -Stefan On Thu, Oct 1, 2015 at 8:56 PM, Stefán Baxter <ste...@activitystream.com> wrote: > Hi, > > I have some minor UDFs here: https://github.com/acmeguy/asdrill > > I'm net well-versed

Review for UDFs

2015-10-01 Thread Stefán Baxter
Hi, I have some minor UDFs here: https://github.com/acmeguy/asdrill I'm net well-versed in writing Drill UDFs and working streams/buffers so I would welcome a review from someone here :). (ListUtils in particular) These have no license attached to it but i will add an apache license Regards,

Re: directory pruning and UDFs

2015-09-24 Thread Stefán Baxter
we are adding the filename virtual > attribute. > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Tue, Sep 22, 2015 at 1:51 PM, Stefán Baxter <ste...@activitystream.com> > wrote: > > > Jacques, > > > > Is this something you think makes sense a

Re: null values while querying json files

2015-09-18 Thread Stefán Baxter
On Fri, Sep 18, 2015 at 3:10 PM, Stefán Baxter <ste...@activitystream.com> > wrote: > > > Hi, > > > > I have nothing meaningful to add but I had share that this BigInt > > assumption has caused more grief than any other single feature in Drill. > > >

Re: null values while querying json files

2015-09-18 Thread Stefán Baxter
Hi, I have nothing meaningful to add but I had share that this BigInt assumption has caused more grief than any other single feature in Drill. I would go so far as to say that the "type intolerance" and lack of "sensible conversion" is the biggest hurdle on the way to fulfilling the "eliminate

CTAS exception

2015-09-18 Thread Stefán Baxter
Hi, I have some json files that I want to transform to parquet. We have been doing this without any issues but this time around I get this exception: Error: SYSTEM ERROR: IllegalStateException: Failure while reading vector. Expected vector class of

Re: CTAS exception

2015-09-18 Thread Stefán Baxter
ce the error > with just a select statement. Can you share both queries you tried (the > failing CTAS and the successful SELECT *) ? > > Thanks > > On Fri, Sep 18, 2015 at 5:38 AM, Stefán Baxter <ste...@activitystream.com> > wrote: > > > Hi, > > > > I have som

Re: CTAS exception

2015-09-18 Thread Stefán Baxter
*occurred_at?**. *In your second query, since the only > column being read is *occurred_at, *you may not be hitting the issue. First > query being a select * would read all columns and may hit this schema > change error. > > > On Fri, Sep 18, 2015 at 9:16 AM, Stefán Baxter <s

  1   2   >