[GitHub] [drill] luocooong merged pull request #2203: DRILL-7908: Fix GitHub Actions CI

2021-04-25 Thread GitBox


luocooong merged pull request #2203:
URL: https://github.com/apache/drill/pull/2203


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] luocooong commented on pull request #2192: DRILL-7828: Refactor Pcap and Pcapng format plugin

2021-04-25 Thread GitBox


luocooong commented on pull request #2192:
URL: https://github.com/apache/drill/pull/2192#issuecomment-826403486


   @paul-rogers Thanks for your review and thanks for your time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (DRILL-7325) Many operators do not set container record count

2021-04-25 Thread Paul Rogers (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers resolved DRILL-7325.

Resolution: Fixed

A number of individual commits fixed problems found in each operator. This 
overall task is now complete.

> Many operators do not set container record count
> 
>
> Key: DRILL-7325
> URL: https://issues.apache.org/jira/browse/DRILL-7325
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.19.0
>
>
> See DRILL-7324. The following are problems found because some operators fail 
> to set the record count for their containers.
> h4. Scan
> TestComplexTypeReader, on cluster setup, using the PojoRecordReader:
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from ScanBatch
> ScanBatch: Container record count not set
> Reason: ScanBatch never sets the record count of its container (this is a 
> generic issue, not specific to the PojoRecordReader).
> h4. Filter
> {{TestComplexTypeReader.testNonExistentFieldConverting()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from FilterRecordBatch
> FilterRecordBatch: Container record count not set
> {noformat}
> h4. Hash Join
> {{TestComplexTypeReader.test_array()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from HashJoinBatch
> HashJoinBatch: Container record count not set
> {noformat}
> Occurs on the first batch in which the hash join returns {{OK_NEW_SCHEMA}} 
> with no records.
> h4. Project
> TestCsvWithHeaders.testEmptyFile()}} (when the text reader returned empty, 
> schema-only batches):
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from ProjectRecordBatch
> ProjectRecordBatch: Container record count not set
> {noformat}
> Occurs in {{ProjectRecordBatch.handleNullInput()}}: it sets up the schema but 
> does not set the value count to 0.
> h4. Unordered Receiver
> {{TestCsvWithSchema.testMultiFileSchema()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from UnorderedReceiverBatch
> UnorderedReceiverBatch: Container record count not set
> {noformat}
> The problem is that {{RecordBatchLoader.load()}} does not set the container 
> record count.
> h4. Streaming Aggregate
> {{TestJsonReader.testSumWithTypeCase()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from StreamingAggBatch
> StreamingAggBatch: Container record count not set
> {noformat}
> The problem is that {{StreamingAggBatch.buildSchema()}} does not set the 
> container record count to 0.
> h4. Limit
> {{TestJsonReader.testDrill_1419()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from LimitRecordBatch
> LimitRecordBatch: Container record count not set
> {noformat}
> None of the paths in {{LimitRecordBatch.innerNext()}} set the container 
> record count.
> h4. Union All
> {{TestJsonReader.testKvgenWithUnionAll()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from UnionAllRecordBatch
> UnionAllRecordBatch: Container record count not set
> {noformat}
> When {{UnionAllRecordBatch}} calls 
> {{VectorAccessibleUtilities.setValueCount()}}, it did not also set the 
> container count.
> h4. Hash Aggregate
> {{TestJsonReader.drill_4479()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from HashAggBatch
> HashAggBatch: Container record count not set
> {noformat}
> Problem is that {{HashAggBatch.buildSchema()}} does not set the container 
> record count to 0 for the first, empty, batch sent for {{OK_NEW_SCHEMA.}}
> h4. And Many More
> I turns out that most operators fail to set one of the many row count 
> variables somewhere in their code path: maybe in the schema setup path, maybe 
> when building a batch along one of the many paths that operators follow. 
> Further, we have multiple row counts that must be set:
> * Values in each vector ({{setValueCount()}},
> * Row count in the container ({{setRecordCount()}}), which must be the same 
> as the vector value count.
> * Row count in the operator (batch), which is the (possibly filtered) count 
> of records presented to downstream operators. It must be less than or equal 
> to the container row count (except for an SV4.)
> * The SV2 record count, which is the number of entries in the SV2 and must be 
> the same as the batch row count (and less or equal to the container row 
> count.)
> * The SV2 actual bactch record count, which must be the same as the container 
> row count.
> * The SV4 record 

[jira] [Resolved] (DRILL-6953) Merge row set-based JSON reader

2021-04-25 Thread Paul Rogers (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers resolved DRILL-6953.

Resolution: Fixed

Resolved via  series of individual tickets.

> Merge row set-based JSON reader
> ---
>
> Key: DRILL-6953
> URL: https://issues.apache.org/jira/browse/DRILL-6953
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.19.0
>
>
> The final step in the ongoing "result set loader" saga is to merge the 
> revised JSON reader into master. This reader does two key things:
> * Demonstrates the prototypical "late schema" style of data reading (discover 
> schema while reading).
> * Implements many tricks and hacks to handle schema changes while loading.
> * Shows that, even with all these tricks, the only true solution is to 
> actually have a schema.
> The new JSON reader:
> * Uses an expanded state machine when parsing rather than the complex set of 
> if-statements in the current version.
> * Handles reading a run of nulls before seeing the first data value (as long 
> as the data value shows up in the first record batch).
> * Uses the result-set loader to generate fixed-size batches regardless of the 
> complexity, depth of structure, or width of variable-length fields.
> While the JSON reader itself is helpful, the key contribution is that it 
> shows how to use the entire kit of parts: result set loader, projection 
> framework, and so on. Since the projection framework can handle an external 
> schema, it is also a handy foundation for the ongoing schema project.
> Key work to complete after this merger will be to reconcile actual data with 
> the external schema. For example, if we know a column is supposed to be a 
> VarChar, then read the column as a VarChar regardless of the type JSON itself 
> picks. Or, if a column is supposed to be a Double, then convert Int and 
> String JSON values into Doubles.
> The Row Set framework was designed to allow inserting custom column writers. 
> This would be a great opportunity to do the work needed to create them. Then, 
> use the new JSON framework to allow parsing a JSON field as a specified Drill 
> type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [drill] cgivre commented on a change in pull request #2204: [DOC UPDATE] Apache-drill-contribution-ideas.md And 050-value-vectors.md Update

2021-04-25 Thread GitBox


cgivre commented on a change in pull request #2204:
URL: https://github.com/apache/drill/pull/2204#discussion_r619835445



##
File path: 
_docs/developer-information/contribute-to-drill/020-apache-drill-contribution-ideas.md
##
@@ -84,8 +77,8 @@ Initially, concentrate on basics:
 
 Implement custom storage plugins for the following non-Hadoop data sources:
 
-  * NoSQL databases (such as Mongo, Cassandra, Couch etc)
-  * Search engines (such as Solr, Lucidworks, Elastic Search etc)
+  * NoSQL databases (such as Mongo, Couch etc)
+  * Search engines (such as Solr, Lucidworks etc)
   * SQL databases (MySQL< PostGres etc)
   * Generic JDBC/ODBC data sources
   * HTTP URL

Review comment:
   Since this was written, most of these are done... You can query most 
relational DBs via the JDBC plugin.  We don't have a generic ODBC plugin, so 
maybe leave that.  We do have an HTTP URL plugin. 

##
File path: 
_docs/developer-information/contribute-to-drill/020-apache-drill-contribution-ideas.md
##
@@ -12,9 +12,6 @@ parent: "Contribute to Drill"
 * BI Tool testing
   * General CLI improvements 
   * Eco system integrations
-* MapReduce
-* Hive views
-* YARN
 * Spark
 * Hue

Review comment:
   Hue?  

##
File path: 
_docs/developer-information/contribute-to-drill/020-apache-drill-contribution-ideas.md
##
@@ -84,8 +77,8 @@ Initially, concentrate on basics:
 
 Implement custom storage plugins for the following non-Hadoop data sources:
 
-  * NoSQL databases (such as Mongo, Cassandra, Couch etc)
-  * Search engines (such as Solr, Lucidworks, Elastic Search etc)
+  * NoSQL databases (such as Mongo, Couch etc)

Review comment:
   Mongo is implemented maybe remove?  There actually is a Couchbase 
plugin out there also that someone implemented.  I've been trying to get them 
to commit that. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] kingswanwho opened a new pull request #2204: [DOC UPDATE] Apache-drill-contribution-ideas.md And 050-value-vectors.md Update

2021-04-25 Thread GitBox


kingswanwho opened a new pull request #2204:
URL: https://github.com/apache/drill/pull/2204


   # [DOC UPDATE] Apache-drill-contribution-ideas.md and 050-value-vectors.md 
update  
   This is a doc update for apache-drill-contribution-ideas.md & 
050-value-vectors.md, No JIRA issue filed here
   
   ## Description  
   For apache-drill-contribution-ideas.md:  
   - Plugins for Cassandra and Elasticsearch have been developed.
   
   For 050-value-vectors.md:
   - Removed bad links for Operators and Record Batch.
   
   ## Documentation
   This PR updates Apache-drill-contribution-ideas.md and 050-value-vectors.md.
   
   ## Testing
   None, this is a doc update.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] vdiravka opened a new pull request #2203: DRILL-7908: Fix GitHub Actions CI

2021-04-25 Thread GitBox


vdiravka opened a new pull request #2203:
URL: https://github.com/apache/drill/pull/2203


   # [DRILL-7908](https://issues.apache.org/jira/browse/DRILL-7908: Fix GitHub 
Actions CI
   
   ## Description
   
   Update GitHub actions to V2, change `zulu` jdk with `adopt`. Change memory 
properties to: `-DdirectMemoryMb=4500` `-DmemoryMb=1500`.
   
   ## Documentation
   NA
   
   ## Testing
   Build was performed several times. The reviewer can perform the build again 
to check the build pass successfully.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] vdiravka commented on a change in pull request #2202: DRILL-7904: Update to 30-jre Guava version

2021-04-25 Thread GitBox


vdiravka commented on a change in pull request #2202:
URL: https://github.com/apache/drill/pull/2202#discussion_r619815040



##
File path: docs/dev/ArtidfactsPublishing.md
##
@@ -1,22 +1,8 @@
-# How to upgrade Guava in Drill

Review comment:
   I thought to do it too. But this info can be useful for any other 
library shade, can't it be?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] vdiravka commented on a change in pull request #2202: DRILL-7904: Update to 30-jre Guava version

2021-04-25 Thread GitBox


vdiravka commented on a change in pull request #2202:
URL: https://github.com/apache/drill/pull/2202#discussion_r619815679



##
File path: distribution/src/assemble/component.xml
##
@@ -90,9 +90,6 @@
   jars
   false
   false
-  
-org.apache.drill:drill-shaded-guava:jar
-  
 
 
 

Review comment:
   right, thanks!

##
File path: docs/dev/ArtidfactsPublishing.md
##
@@ -1,22 +1,8 @@
-# How to upgrade Guava in Drill

Review comment:
   I thought to it too. But this info can be useful for any other library 
shade, can't it be?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org