Re: [DRILL with ALTERYX]

2019-12-10 Thread Paul Rogers
Hi Thiago, Just wanted to follow up with a bit more detail. The use case you describe is what is sometimes called "query integration": having a single tool accept a query, then turn around and issue other queries to other data sources. Finally, the query integrator combines the resulting

Re: Integrating Arrow with Drill

2019-12-10 Thread Paul Rogers
Hi Nai Yan, You posted this same question a few days ago and we responded with some questions and discussion. Perhaps the dev list e-mail is going into your spam folder? You can find the discussion in the e-mail archives [1]. We would still like to learn more about how you might use Arrow with

[jira] [Created] (DRILL-7480) Revisit parameterized type design for Metadata API

2019-12-10 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7480: -- Summary: Revisit parameterized type design for Metadata API Key: DRILL-7480 URL: https://issues.apache.org/jira/browse/DRILL-7480 Project: Apache Drill Issue

[jira] [Created] (DRILL-7479) Short-term fixes for metadata API parameterized type issues

2019-12-10 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7479: -- Summary: Short-term fixes for metadata API parameterized type issues Key: DRILL-7479 URL: https://issues.apache.org/jira/browse/DRILL-7479 Project: Apache Drill

Re: About integration of drill and arrow

2019-12-09 Thread Paul Rogers
AM, Igor Guzenko wrote: > > Hello Nai and Paul, > > I would like to contribute full Apache Arrow integration. > > Thanks, > Igor > > On Mon, Dec 9, 2019 at 8:56 AM Paul Rogers > wrote: > >> Hi Nai Yan, >> >> Integration is still in the discu

[jira] [Resolved] (DRILL-7303) Filter record batch does not handle zero-length batches

2019-11-29 Thread Paul Rogers (Jira)
[ https://issues.apache.org/jira/browse/DRILL-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers resolved DRILL-7303. Resolution: Duplicate > Filter record batch does not handle zero-length batc

[jira] [Resolved] (DRILL-7311) Partial fixes for empty batch bugs

2019-11-29 Thread Paul Rogers (Jira)
[ https://issues.apache.org/jira/browse/DRILL-7311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers resolved DRILL-7311. Resolution: Duplicate > Partial fixes for empty batch b

[jira] [Resolved] (DRILL-7305) Multiple operators do not handle empty batches

2019-11-29 Thread Paul Rogers (Jira)
[ https://issues.apache.org/jira/browse/DRILL-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers resolved DRILL-7305. Resolution: Duplicate > Multiple operators do not handle empty batc

[jira] [Created] (DRILL-7458) Base storage plugin framework

2019-11-26 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7458: -- Summary: Base storage plugin framework Key: DRILL-7458 URL: https://issues.apache.org/jira/browse/DRILL-7458 Project: Apache Drill Issue Type: Improvement

[jira] [Created] (DRILL-7457) Join assignment is random when table costa are identical

2019-11-22 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7457: -- Summary: Join assignment is random when table costa are identical Key: DRILL-7457 URL: https://issues.apache.org/jira/browse/DRILL-7457 Project: Apache Drill

[jira] [Created] (DRILL-7456) Batch count fixes for 12 additional operators

2019-11-22 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7456: -- Summary: Batch count fixes for 12 additional operators Key: DRILL-7456 URL: https://issues.apache.org/jira/browse/DRILL-7456 Project: Apache Drill Issue Type

[jira] [Created] (DRILL-7455) "Renaming" projection operator to avoid physical copies

2019-11-22 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7455: -- Summary: "Renaming" projection operator to avoid physical copies Key: DRILL-7455 URL: https://issues.apache.org/jira/browse/DRILL-7455 Project: Ap

[jira] [Created] (DRILL-7451) Planner inserts project node even if scan handles project push-down

2019-11-19 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7451: -- Summary: Planner inserts project node even if scan handles project push-down Key: DRILL-7451 URL: https://issues.apache.org/jira/browse/DRILL-7451 Project: Apache Drill

[jira] [Created] (DRILL-7447) Simplify the Mock reader

2019-11-16 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7447: -- Summary: Simplify the Mock reader Key: DRILL-7447 URL: https://issues.apache.org/jira/browse/DRILL-7447 Project: Apache Drill Issue Type: Improvement

[jira] [Created] (DRILL-7446) Eclipse compilation issue in AbstractParquetGroupScan

2019-11-16 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7446: -- Summary: Eclipse compilation issue in AbstractParquetGroupScan Key: DRILL-7446 URL: https://issues.apache.org/jira/browse/DRILL-7446 Project: Apache Drill Issue

Re: Storage Plugin Assistance

2019-11-15 Thread Paul Rogers
, 2019, 12:40:22 PM PST, Charles Givre wrote: > On Nov 15, 2019, at 1:39 PM, Paul Rogers wrote: > > Hi Charles, > > A thought on debugging deserialization is to not do it in a query. Capture > the JSON returned from a rest call. Write a simple unit test that > deseri

Re: Storage Plugin Assistance

2019-11-15 Thread Paul Rogers
mats which are generic (name/value pairs) rather than expressed in the structure of JSON objects (as required by Jackson and Retrofit.) That is a topic for later, but is why the Sumo plugin has to be custom to Sumo's API for now. Thanks, - Paul [1] https://github.com/paul-rogers/drill/wiki/Create

[jira] [Created] (DRILL-7445) Create batch copier based on result set framework

2019-11-14 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7445: -- Summary: Create batch copier based on result set framework Key: DRILL-7445 URL: https://issues.apache.org/jira/browse/DRILL-7445 Project: Apache Drill Issue

[jira] [Created] (DRILL-7442) Create multi-batch row set reader

2019-11-10 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7442: -- Summary: Create multi-batch row set reader Key: DRILL-7442 URL: https://issues.apache.org/jira/browse/DRILL-7442 Project: Apache Drill Issue Type: Improvement

[jira] [Created] (DRILL-7441) Fix issues with fillEmpties, offset vectors

2019-11-10 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7441: -- Summary: Fix issues with fillEmpties, offset vectors Key: DRILL-7441 URL: https://issues.apache.org/jira/browse/DRILL-7441 Project: Apache Drill Issue Type: Bug

Re: Use cases for DFDL

2019-11-07 Thread Paul Rogers
;xml"] } I was envisioning this working in much the same way as other format plugins that use an external parser. -- C > On Nov 7, 2019, at 1:35 PM, Paul Rogers wrote: > > Hi All, > > One thought to add is that if DFDL defines the file schema, then it would be > ideal to u

Re: Use cases for DFDL

2019-11-07 Thread Paul Rogers
Hi All, One thought to add is that if DFDL defines the file schema, then it would be ideal to use that schema at plan time as well as run time. Drill's Calcite integration provides means to do this, though I am personally a bit hazy on the details. Certainly getting the reader to work is the

Re: Help for DRILL-3609

2019-11-06 Thread Paul Rogers
Hi Nitin, As it turns out, I just had to fix a bug in the windowing operator. I'm not an expert on this operator, but perhaps I can offer a suggestion or two. We have a few existing unit tests for window functions in TestWindowFrame. They are a bit hard to follow, however. Take a look at

[jira] [Created] (DRILL-7439) Batch count fixes for six additional operators

2019-11-05 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7439: -- Summary: Batch count fixes for six additional operators Key: DRILL-7439 URL: https://issues.apache.org/jira/browse/DRILL-7439 Project: Apache Drill Issue Type

Re: [DISCUSS] Drill Storage Plugins

2019-11-05 Thread Paul Rogers
Hi Charles, Storage plugins are a bit complex because they integrate not just with the runtime engine, but also with the Calcite planning engine. Format plugins are simpler because they are mostly runtime-only. The "Easy" framework hides much of the planner integration, and the EVF "Easier"

[jira] [Created] (DRILL-7436) Fix record count, vector structure issues in several operators

2019-11-03 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7436: -- Summary: Fix record count, vector structure issues in several operators Key: DRILL-7436 URL: https://issues.apache.org/jira/browse/DRILL-7436 Project: Apache Drill

[jira] [Created] (DRILL-7435) JSON reader incorrectly adds a LATE type to union vector

2019-11-03 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7435: -- Summary: JSON reader incorrectly adds a LATE type to union vector Key: DRILL-7435 URL: https://issues.apache.org/jira/browse/DRILL-7435 Project: Apache Drill

[jira] [Created] (DRILL-7434) TopNBatch constructs Union vector incorrectly

2019-11-03 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7434: -- Summary: TopNBatch constructs Union vector incorrectly Key: DRILL-7434 URL: https://issues.apache.org/jira/browse/DRILL-7434 Project: Apache Drill Issue Type

[jira] [Created] (DRILL-7428) Drill incorrectly allows a repeated map field to be projected to top level

2019-10-29 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7428: -- Summary: Drill incorrectly allows a repeated map field to be projected to top level Key: DRILL-7428 URL: https://issues.apache.org/jira/browse/DRILL-7428 Project: Apache

[jira] [Created] (DRILL-7425) Remove redundant record count field from operators

2019-10-27 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7425: -- Summary: Remove redundant record count field from operators Key: DRILL-7425 URL: https://issues.apache.org/jira/browse/DRILL-7425 Project: Apache Drill Issue

[jira] [Created] (DRILL-7424) Project operator fails to set the container row count

2019-10-27 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7424: -- Summary: Project operator fails to set the container row count Key: DRILL-7424 URL: https://issues.apache.org/jira/browse/DRILL-7424 Project: Apache Drill Issue

[jira] [Resolved] (DRILL-7333) Batch of container count fixes

2019-10-27 Thread Paul Rogers (Jira)
[ https://issues.apache.org/jira/browse/DRILL-7333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers resolved DRILL-7333. Resolution: Incomplete Abandoned this one; too many changes in one go. Will submit the work

[jira] [Created] (DRILL-7414) EVF incorrectly sets buffer writer index after rollover

2019-10-20 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7414: -- Summary: EVF incorrectly sets buffer writer index after rollover Key: DRILL-7414 URL: https://issues.apache.org/jira/browse/DRILL-7414 Project: Apache Drill

[jira] [Created] (DRILL-7413) Scan operator does not set the container record count

2019-10-20 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7413: -- Summary: Scan operator does not set the container record count Key: DRILL-7413 URL: https://issues.apache.org/jira/browse/DRILL-7413 Project: Apache Drill Issue

[jira] [Created] (DRILL-7412) Minor unit test improvements

2019-10-20 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7412: -- Summary: Minor unit test improvements Key: DRILL-7412 URL: https://issues.apache.org/jira/browse/DRILL-7412 Project: Apache Drill Issue Type: Improvement

[jira] [Created] (DRILL-7403) Validate batch checks, vector integretity, in unit tests

2019-10-13 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7403: -- Summary: Validate batch checks, vector integretity, in unit tests Key: DRILL-7403 URL: https://issues.apache.org/jira/browse/DRILL-7403 Project: Apache Drill

[jira] [Created] (DRILL-7402) Suppress batch dumps for expected failures in tests

2019-10-13 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7402: -- Summary: Suppress batch dumps for expected failures in tests Key: DRILL-7402 URL: https://issues.apache.org/jira/browse/DRILL-7402 Project: Apache Drill Issue

[jira] [Resolved] (DRILL-5914) CSV (text) reader fails to parse quoted newlines in trailing fields

2019-10-05 Thread Paul Rogers (Jira)
[ https://issues.apache.org/jira/browse/DRILL-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers resolved DRILL-5914. Resolution: Fixed This issue was fixed as part of the "Complaint text reader V3&quo

Re: MongoDB Question

2019-09-26 Thread Paul Rogers
Hi Charles, This kind of data-specific setting really should be associated with a schema so that it is DB-specific, but consistent across queries. That is almost the definition of a schema... In the future, once the recently-added schema system is more widely used, one might be able to set

Re: EVF Question: FULL BATCH?

2019-09-24 Thread Paul Rogers
rd a row: you can load it, check it, and decide to skip it. This will, eventually, allow us to push filtering down to the reader. For now, you just need to call save(). Thanks, - Paul On Tuesday, September 24, 2019, 09:56:34 AM PDT, Paul Rogers wrote: So the usual pattern is: while

Re: EVF Question: FULL BATCH?

2019-09-24 Thread Paul Rogers
So the usual pattern is: while (! rowWriter.isFull()) {  // Load the row  rowWriter.save();} Is it the case that PCAP is trying to force the row count to, say 4K or 8K or whatever? If so, ignore that count. The error is telling you that at least one vector has reached 16 MB in size (or you've

Re: UDFs not Working in Unit Tests

2019-09-20 Thread Paul Rogers
, and this really doesn't have anything to do with the functionality of the code. -- C > On Sep 20, 2019, at 12:40 PM, Paul Rogers wrote: > > Hi Charles, > > I seem to recall fighting with something similar in the past. The problem is > not with your setup; it is with how Drill

Re: UDFs not Working in Unit Tests

2019-09-20 Thread Paul Rogers
Hi Charles, I seem to recall fighting with something similar in the past. The problem is not with your setup; it is with how Drill finds your (custom?) UDF on the classpath. My memory is hazy; but I think it had to do with the way that Drill uses the drill-override.conf file to extend class

Re: [DISCUSS]: Changes to Formatting Rules

2019-08-16 Thread Paul Rogers
Hi Charles, Agree. I am reviewing a PR in which formatting was changed. Both the original and revised formatting are fine, and each is often seen in open source projects. But we really should settle on one style so we don't have each of us reformatting code to our favorite styles. Drill does

Re: WebUI is Vulnerable to CSRF?

2019-08-15 Thread Paul Rogers
Hi Don, The one saving grace is that no one should ever host the Drill web UI on a public-facing web site. The UI provides lots of admin operations that one would not really want to expose openly. A much better solution would be to wrap Drill in a custom-made web app that controls what

Re: complex data structure aggregators?

2019-08-12 Thread Paul Rogers
tc.) FWIW: There is a pile of information on UDF internals on my GitHub Wiki. [1] Aggregate UDFS are covered in [2]. Once we learn the answers to your specific questions, we can add the info to the Wiki. Thanks, - Paul [1] https://github.com/paul-rogers/drill/wiki/UDFs-Background-Information

Re: complex data structure aggregators?

2019-08-12 Thread Paul Rogers
org/wiki/Algorithms_for_calculating_variance#Welford's_online_algorithm> which allows good numerical stability while computing this on-line. On Mon, Aug 12, 2019 at 9:01 AM Paul Rogers wrote: > Hi Ted, > > Last I checked (when we wrote the book chapter on the subject), aggregate >

Re: complex data structure aggregators?

2019-08-12 Thread Paul Rogers
Hi Ted, Last I checked (when we wrote the book chapter on the subject), aggregate state are limited to scalars and Drill-defined types. There is no support to spill aggregate state, so that state will be lost if spilling is required to handle large aggregate batches. The current solution works

Re: [DISCUSS]: Drill after MapR

2019-08-08 Thread Paul Rogers
Hi Charles, Thanks for raising this issue. The short answer is probably "the cloud." The longer answer should include: * Who will manage the AWS/GCE instances? * Who will pay for the instances? * The MapR infrastructure uses the MapR file system and Hadoop distro. Probably should use Hadoop

Re: [QUESTION]: Caching UDFs

2019-08-08 Thread Paul Rogers
Hi Charles, In general, we cannot know if a function is deterministic. Your function might be rand(seed, max). It might do a JDBC lookup or a REST call. Drill can't know (unless we add some way to know that a function is deterministic: maybe a @Deterministic annotation.) That said, you can

[jira] [Created] (DRILL-7333) Batch of

2019-07-27 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-7333: -- Summary: Batch of Key: DRILL-7333 URL: https://issues.apache.org/jira/browse/DRILL-7333 Project: Apache Drill Issue Type: Bug Reporter: Paul Rogers

Re: [ANNOUNCE] New Committer: Igor Guzenko

2019-07-23 Thread Paul Rogers
Congrats Igor! Thanks, - Paul On Monday, July 22, 2019, 07:02:44 AM PDT, Arina Ielchiieva wrote: The Project Management Committee (PMC) for Apache Drill has invited Igor Guzenko to become a committer, and we are pleased to announce that he has accepted. Igor has been contributing

Re: drill.exec.grace_period_ms' Errors

2019-07-20 Thread Paul Rogers
Hi Charles, I just ran some unit tests, using master, and did not see the drill.exec.grace_period_ms error that you saw. drill.exec.grace_period_ms is defined in ExecConstants.java, is used in Drillbit startup in Drillbit.java, and has a value defined in src/main/resources/drill-module.conf.

Re: EVF Log Regex Errors

2019-07-20 Thread Paul Rogers
uot;,       "regex": "(\\w{3}\\s\\d{1,2}\\s\\d{4}\\s\\d{2}:\\d{2}:\\d{2})\\s+(\\w+)\\[(\\d+)\\]:\\s(.*?(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}).*?)",       "extension": "ssdlog",       "maxErrors": 10,       "schema": [{"fieldName&quo

[jira] [Resolved] (DRILL-7327) Log Regex Plugin Won't Recognize Schema

2019-07-20 Thread Paul Rogers (JIRA)
[ https://issues.apache.org/jira/browse/DRILL-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers resolved DRILL-7327. Resolution: Not A Bug > Log Regex Plugin Won't Recognize Sch

Re: EVF Log Regex Errors

2019-07-16 Thread Paul Rogers
ension": "ssdlog",       "maxErrors": 10,       "schema": [{"fieldName": "test"}]     }, --C > On Jul 16, 2019, at 7:08 PM, Paul Rogers wrote: > > Hi Charles, > > Thanks much for the feedback. I'll take a look. > > A

Re: EVF Log Regex Errors

2019-07-16 Thread Paul Rogers
Hi Charles, Thanks much for the feedback. I'll take a look. A quick look at your config suggests that the timestamp might be the issue. As I recall, there were no such tests in the unit test class. So, perhaps something slipped through. (We should add a test for this case.) In EVF, we use

[jira] [Created] (DRILL-7325) Scan, Project, Hash Join do not set container record count

2019-07-14 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-7325: -- Summary: Scan, Project, Hash Join do not set container record count Key: DRILL-7325 URL: https://issues.apache.org/jira/browse/DRILL-7325 Project: Apache Drill

[jira] [Created] (DRILL-7324) Many vector-validity errors from unit tests

2019-07-14 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-7324: -- Summary: Many vector-validity errors from unit tests Key: DRILL-7324 URL: https://issues.apache.org/jira/browse/DRILL-7324 Project: Apache Drill Issue Type: Bug

Re: Drill storage plugin for IPFS, any suggestion is welcome :)

2019-07-08 Thread Paul Rogers
王亮 你好, Very creative use of Drill! We usually think of Drill as a tool for "big data" distributed file systems such as HDFS, MFS and S3. IPFS seems to be for storing web content. I like how you've shown that IPFS is, in fact, a distributed file system, and made Drill work in this context.

[jira] [Created] (DRILL-7318) Unify type-to-string implementations

2019-07-06 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-7318: -- Summary: Unify type-to-string implementations Key: DRILL-7318 URL: https://issues.apache.org/jira/browse/DRILL-7318 Project: Apache Drill Issue Type

[jira] [Created] (DRILL-7311) Partial fixes for empty batch bugs

2019-06-30 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-7311: -- Summary: Partial fixes for empty batch bugs Key: DRILL-7311 URL: https://issues.apache.org/jira/browse/DRILL-7311 Project: Apache Drill Issue Type: Bug

[jira] [Created] (DRILL-7309) Improve documentation for table functions

2019-06-25 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-7309: -- Summary: Improve documentation for table functions Key: DRILL-7309 URL: https://issues.apache.org/jira/browse/DRILL-7309 Project: Apache Drill Issue Type: Bug

Re: Strange metadata from Text Reader

2019-06-24 Thread Paul Rogers
Hi All, To close the loop on this, see the detailed comments in DRILL-7308 which Charles kindly filed. There is a code bug in the REST metadata feature itself which causes the schema to repeat for every returned record batch, and which causes it to display precision and scale for VARCHAR

Re: Strange metadata from Text Reader

2019-06-24 Thread Paul Rogers
Hi Charles, Latest master? Please file a JIRA with repo steps. I’ll take a look. - Paul Sent from my iPhone > On Jun 24, 2019, at 11:38 AM, Charles Givre wrote: > > Hello Drill Devs, > I'm noticing some strange behavior with the newest version of Drill. If you > query a CSV file, you get

Re: Multi char csv delimiter

2019-06-24 Thread Paul Rogers
Hi Matthias, Field delimiters, quotes and quote escapes can be only one character. The line delimiter can be multi. Are you setting the line delimiter? - Paul Sent from my iPhone > On Jun 24, 2019, at 12:10 PM, Arina Yelchiyeva > wrote: > > Hi Matthias, > > Attachments are not supported

[jira] [Created] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-06-23 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-7306: -- Summary: Disable "fast schema" batch for new scan framework Key: DRILL-7306 URL: https://issues.apache.org/jira/browse/DRILL-7306 Project: Apache Drill

[jira] [Created] (DRILL-7305) Multiple operators do not handle empty batches

2019-06-23 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-7305: -- Summary: Multiple operators do not handle empty batches Key: DRILL-7305 URL: https://issues.apache.org/jira/browse/DRILL-7305 Project: Apache Drill Issue Type

[jira] [Created] (DRILL-7304) Filter record batch misses schema changes within maps

2019-06-22 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-7304: -- Summary: Filter record batch misses schema changes within maps Key: DRILL-7304 URL: https://issues.apache.org/jira/browse/DRILL-7304 Project: Apache Drill Issue

[jira] [Created] (DRILL-7303) Filter record batch does not handle zero-length batches

2019-06-21 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-7303: -- Summary: Filter record batch does not handle zero-length batches Key: DRILL-7303 URL: https://issues.apache.org/jira/browse/DRILL-7303 Project: Apache Drill

[jira] [Created] (DRILL-7301) Assertion failure in HashAgg with mem prediction off

2019-06-18 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-7301: -- Summary: Assertion failure in HashAgg with mem prediction off Key: DRILL-7301 URL: https://issues.apache.org/jira/browse/DRILL-7301 Project: Apache Drill Issue

[jira] [Created] (DRILL-7300) Drill console, query output no longer has "Edit Query" button

2019-06-18 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-7300: -- Summary: Drill console, query output no longer has "Edit Query" button Key: DRILL-7300 URL: https://issues.apache.org/jira/browse/DRILL-7300 Project: Ap

[jira] [Created] (DRILL-7299) Infinite exception loop in Sqlline after kill process

2019-06-18 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-7299: -- Summary: Infinite exception loop in Sqlline after kill process Key: DRILL-7299 URL: https://issues.apache.org/jira/browse/DRILL-7299 Project: Apache Drill Issue

[jira] [Created] (DRILL-7298) Revise log regex plugin to work with table functions

2019-06-18 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-7298: -- Summary: Revise log regex plugin to work with table functions Key: DRILL-7298 URL: https://issues.apache.org/jira/browse/DRILL-7298 Project: Apache Drill Issue

Extended Vector Framework ready for use

2019-06-13 Thread Paul Rogers
recent EVF work. [2] Please use this mailing list to share questions, comments and suggestions as you tackle your own plugins. Each plugin has its own unique quirks and issues which we can discuss here. Thanks, - Paul [1] https://drill.apache.org/docs/create-or-replace-schema/ [2] https:

[jira] [Created] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-12 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-7293: -- Summary: Convert the regex ("log") plugin to use EVF Key: DRILL-7293 URL: https://issues.apache.org/jira/browse/DRILL-7293 Project: Apache Drill

[jira] [Created] (DRILL-7292) Remove V1, V2 text readers

2019-06-12 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-7292: -- Summary: Remove V1, V2 text readers Key: DRILL-7292 URL: https://issues.apache.org/jira/browse/DRILL-7292 Project: Apache Drill Issue Type: Improvement

Re: [DISCUSSION] DRILL-7097 Rename MapVector to StructVector

2019-06-04 Thread Paul Rogers
/hive/blob/master/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/MapColumnVector.java#L30 > [3]https://github.com/apache/arrow/pull/ > > On Tue, Jun 4, 2019 at 2:59 AM Paul Rogers > wrote: > >> Hi Igor, >> >> Glad the community was able to provide a b

Re: [DISCUSSION] DRILL-7097 Rename MapVector to StructVector

2019-06-03 Thread Paul Rogers
Hi Igor, Glad the community was able to provide a bit of help. Let's talk about about another topic. You said: "And main purpose will be hiding of repeated map meta keys ("key","value") and simulation of real map functionality." On the one hand, we are all accustomed to thinking of a Java (or

Re: [DISCUSSION] DRILL-7097 Rename MapVector to StructVector

2019-06-01 Thread Paul Rogers
Hi All, TLDR; Drill already provides a number of powerful features that give us 80-90% of what we need for DICT type. Much time could be saved by using them, focusing efforts on adding the remaining bits specific to DICT. We divide the DICT problem down into two categories: 1. Internal

Re: How to implement AbstractRecordWriter

2019-05-31 Thread Paul Rogers
someone can help me out with an initial test run? On Fri, May 31, 2019 at 19:38 Paul Rogers wrote:

Re: How to implement AbstractRecordWriter

2019-05-31 Thread Paul Rogers
core code. Maybe it is time to decoupling the test framework from drill itself, too. On Fri, May 31, 2019 at 18:38 Paul Rogers wrote: > Hi Nicolas, > > Charles outlined the choices quite well. > > Let's talk about your observation that you find it annoying to deal with > the full D

Re: [DISCUSSION] DRILL-7097 Rename MapVector to StructVector

2019-05-31 Thread Paul Rogers
On Friday, May 31, 2019, 10:54:29 AM PDT, Ted Dunning wrote: Would it be possible to call the new structure a Dict (following Python's inspiration)? That would avoid the large disruption of renaming Map*. On Fri, May 31, 2019 at 10:10 AM Paul Rogers wrote: > Hi Igor, > >

Re: How to implement AbstractRecordWriter

2019-05-31 Thread Paul Rogers
ve another problem, you cannot easily tests your new modules unless they are within drill core code. Maybe it is time to decoupling the test framework from drill itself, too. On Fri, May 31, 2019 at 18:38 Paul Rogers wrote: > Hi Nicolas, > > Charles outlined the choices quite well

Re: [DISCUSSION] DRILL-7097 Rename MapVector to StructVector

2019-05-31 Thread Paul Rogers
Hi Igor, Thank you for finally addressing a long-running irritation: that the Drill Map type is not a map, it is a tuple. Perhaps you can divide the discussion into three parts. 1. Renaming classes, enums and other items internal to the Drill source code. 2. Renaming classes that are part of

Re: How to implement AbstractRecordWriter

2019-05-31 Thread Paul Rogers
o? I just find annoying to deal with the > full drill code in order to develop a plugin. At the same time, I might > want to detach the development of plugins from the drill life cycle itself. > > Please advise. > > Best Regards, > > Nicolas A Perez > > On Thu, May 30, 2019 at 9

Re: How to implement AbstractRecordWriter

2019-05-30 Thread Paul Rogers
Hi Nicolas, A quick check of the code suggests that AbstractWriter is a Json-serialized description of the physical plan. It represents the information sent from the planner to the execution engine, and is interpreted by the scan operator. That is, it is the "physical plan." The question is,

Re: Questions about bushy join

2019-05-29 Thread Paul Rogers
> = RecordType(ANY r_regionkey, ANY r_name): rowcount = 5.0, cumulative > > cost = {5.0 rows, 10.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = > > 10931 > > > > > > On Mon, May 27, 2019 at 8:23 PM weijie tong > > wrote: > > > > > Thanks for the

Re: adding insert

2019-05-27 Thread Paul Rogers
Hi Ted, Drill can do a CTAS today, which uses a writer provided by the format plugin. One would think this same structure could work for an INSERT operation, with a writer provided by the storage plugin. The devil, of course, is always in the details. And in finding resources to do the work...

Schema support in storage and format plugins

2019-05-27 Thread Paul Rogers
Hi All, Drill 1.16 introduced the the "provided schema" mechanism to help you query the kind of messy files found in the real world. Arina and Bridget created nice documentation [1] for the feature. Sorabh presented the feature at the recent Drill Meetup. If you are a plugin developer, we need

Re: Questions about bushy join

2019-05-27 Thread Paul Rogers
Hi All, Weijie, do you have some example plans that would appear to be sub-optimal, and would be improved with a bushy join plan? What characteristic of the query or schema causes the need for a busy plan? FWIW, Impala uses a compromise approach: it evaluates left-deep plans, then will "flip"

Re: adding insert

2019-05-27 Thread Paul Rogers
Hi Ted, >From item 3, it should like you are focusing on using Drill to front a DB >system, rather than proposing to use Drill to update files in a distributed >file system (DFS). Turns out that, for the DFS case, the former HortonWorks put quite a bit into working out viable insert/update

[jira] [Created] (DRILL-7279) Support provided schema for CSV without headers

2019-05-26 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-7279: -- Summary: Support provided schema for CSV without headers Key: DRILL-7279 URL: https://issues.apache.org/jira/browse/DRILL-7279 Project: Apache Drill Issue Type

[jira] [Created] (DRILL-7278) Refactor result set loader projection mechanism

2019-05-25 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-7278: -- Summary: Refactor result set loader projection mechanism Key: DRILL-7278 URL: https://issues.apache.org/jira/browse/DRILL-7278 Project: Apache Drill Issue Type

[jira] [Created] (DRILL-7261) Rollup of CSV V3 fixes

2019-05-15 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-7261: -- Summary: Rollup of CSV V3 fixes Key: DRILL-7261 URL: https://issues.apache.org/jira/browse/DRILL-7261 Project: Apache Drill Issue Type: Improvement Affects

Re: [ANNOUNCE] New Committer: Jyothsna Donapati

2019-05-09 Thread Paul Rogers
Congratulations! Well deserved. Thanks, - Paul On Thursday, May 9, 2019, 2:28:09 PM PDT, Aman Sinha wrote: The Project Management Committee (PMC) for Apache Drill has invited Jyothsna Donapati to become a committer, and we are pleased to announce that she has accepted. Jyothsna

[jira] [Created] (DRILL-7224) Update example row set test

2019-04-28 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-7224: -- Summary: Update example row set test Key: DRILL-7224 URL: https://issues.apache.org/jira/browse/DRILL-7224 Project: Apache Drill Issue Type: Improvement

Re: Hangout Discussion Topics for 04-16-2019

2019-04-24 Thread Paul Rogers
Hi Igor, Thanks for the recap. You asked about vector allocation. Here is where I think things stand. Others can fill in details that I may miss. We have several ways to size value vectors; but no single standard. As you note, the most common way is simply to accept the cost of letting the

Re: QUESTION: Packet Parser for PCAP Plugin

2019-04-23 Thread Paul Rogers
Hi Charles, Two comments.  First, Drill "maps" are actually structs (nested tuples): every record must have the same set of columns within the "map." That is, though the Drill type is called a "map", and you might assume that, given that name, it would act like a JSON, Python of Java map, the

Re: [Discuss] Integrate Arrow gandiva into Drill

2019-04-20 Thread Paul Rogers
case problem which needs to be considered, since the Gandiva jar is platform dependent. On Fri, Apr 19, 2019 at 8:43 AM Paul Rogers wrote: > Hi Weijie, > > Thanks much for the update on your Gandiva work. It is great work. > > Can you say more about how you are doing the integrat

<    1   2   3   4   5   6   7   8   9   10   >