[GitHub] [drill] luocooong opened a new pull request #2501: [MINOR UPDATE] Add license to CodeQL YAML

2022-03-24 Thread GitBox


luocooong opened a new pull request #2501:
URL: https://github.com/apache/drill/pull/2501


   [MINOR UPDATE] Add license to CodeQL YAML
   
   ## Description
   
   Fixing the failure of the `checkstyle and generate protobufs` on the CI.
   
   ## Documentation
   
   N/A
   
   ## Testing
   
   Use the CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] cgivre commented on pull request #2483: DRILL-8149: large xlsx configs

2022-03-24 Thread GitBox


cgivre commented on pull request #2483:
URL: https://github.com/apache/drill/pull/2483#issuecomment-1077725853


   > 
   
   No worries!  It looks good to me other than adding some doc updates.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] pjfanning commented on pull request #2483: DRILL-8149: large xlsx configs

2022-03-24 Thread GitBox


pjfanning commented on pull request #2483:
URL: https://github.com/apache/drill/pull/2483#issuecomment-1077722983


   @cgivre @luocooong thanks for the feedback. I've been busy on other stuff 
and am not prioritising completing this. Before it is ready to merge, it will 
need doc updates and test coverage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] cgivre commented on a change in pull request #2483: DRILL-8149: large xlsx configs

2022-03-24 Thread GitBox


cgivre commented on a change in pull request #2483:
URL: https://github.com/apache/drill/pull/2483#discussion_r834400430



##
File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
##
@@ -183,6 +189,9 @@ public String getFieldName() {
   lastColumn = plugin.getConfig().getLastColumn();
   allTextMode = plugin.getConfig().getAllTextMode();
   sheetName = plugin.getConfig().getSheetName();
+  maxArraySize = plugin.getConfig().getMaxArraySize();

Review comment:
   @pjfanning Thanks for the update!   Can you update the `README` file in 
this folder to explain what these options do?  Do you have recommended values?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] cgivre merged pull request #2485: DRILL-8086: Convert the CSV (AKA "compliant text") reader to EVF V2

2022-03-24 Thread GitBox


cgivre merged pull request #2485:
URL: https://github.com/apache/drill/pull/2485


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] cgivre commented on pull request #2485: DRILL-8086: Convert the CSV (AKA "compliant text") reader to EVF V2

2022-03-24 Thread GitBox


cgivre commented on pull request #2485:
URL: https://github.com/apache/drill/pull/2485#issuecomment-1077578456


   @jnturton I added a codeql check to the repo.  CodeQL suggested CPP, Java, 
and JavaScript as languages.  I thought we'd try it and see if it adds any 
value or not.  In any event, I removed cpp.  I'm fine with merging at this 
point.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] luocooong commented on a change in pull request #2493: DRILL-8164: Upgrade metadata-extractor because of CVE-2022-24613

2022-03-24 Thread GitBox


luocooong commented on a change in pull request #2493:
URL: https://github.com/apache/drill/pull/2493#discussion_r834253110



##
File path: 
metastore/rdbms-metastore/src/main/java/org/apache/drill/metastore/rdbms/RdbmsMetastore.java
##
@@ -117,6 +117,8 @@ private HikariDataSource dataSource(DrillConfig config) {
   private void initTables(DataSource dataSource) {
 try (Connection connection = dataSource.getConnection()) {
   JdbcConnection jdbcConnection = new JdbcConnection(connection);
+  // TODO Replace the following steps with new function once this issue is 
resolved.

Review comment:
   Thanks, let's hold-off the merge progress. Actually, I think this is a 
security issue with only a 50% probability, but in the latest responses from 
the reporting group, they point out that it can be exploited and attacked.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (DRILL-8174) Convert Avro format to EVF2

2022-03-24 Thread Cong Luo (Jira)
Cong Luo created DRILL-8174:
---

 Summary: Convert Avro format to EVF2
 Key: DRILL-8174
 URL: https://issues.apache.org/jira/browse/DRILL-8174
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Cong Luo
Assignee: Cong Luo


Second refactor of the Avro format.

We may need to resolve multiple bugs related to the Map Vector in this pull 
request.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [drill] luocooong commented on pull request #2485: DRILL-8086: Convert the CSV (AKA "compliant text") reader to EVF V2

2022-03-24 Thread GitBox


luocooong commented on pull request #2485:
URL: https://github.com/apache/drill/pull/2485#issuecomment-1077346954


   @paul-rogers @jnturton Don't worry, I think Charles accidentally pushed the 
local file: Pushed a CodeQL commit. Let's wait for him for a few hours.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] jnturton commented on pull request #2485: DRILL-8086: Convert the CSV (AKA "compliant text") reader to EVF V2

2022-03-24 Thread GitBox


jnturton commented on pull request #2485:
URL: https://github.com/apache/drill/pull/2485#issuecomment-1077330375


   > Rebased on master. It is now failing on the C++ code check, which is a bit 
odd because I don't recall seeing much C++ code in Drill...
   
   What are these new CodeQL actions here, a rebranding of LGTM?  And what 
prompted this new analysis of our C++, we previously only analysed Java, JS and 
Python IIRC.  
   
   (These are more questions for us, @cgivre @luocooong @vdiravka, than for you 
@paul-rogers)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] lgtm-com[bot] commented on pull request #2485: DRILL-8086: Convert the CSV (AKA "compliant text") reader to EVF V2

2022-03-24 Thread GitBox


lgtm-com[bot] commented on pull request #2485:
URL: https://github.com/apache/drill/pull/2485#issuecomment-1077317155


   This pull request **introduces 3 alerts** and **fixes 3** when merging 
946692198b84c7828781401fb120b51e552898b1 into 
ed3b1d395787534532e6fa42510c30b7a399befd - [view on 
LGTM.com](https://lgtm.com/projects/g/apache/drill/rev/pr-fa822892d7133370d7e14a4f4a6d046d755d9c94)
   
   **new alerts:**
   
   * 3 for Dereferenced variable may be null
   
   **fixed alerts:**
   
   * 3 for Dereferenced variable may be null


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] paul-rogers commented on pull request #2485: DRILL-8086: Convert the CSV (AKA "compliant text") reader to EVF V2

2022-03-24 Thread GitBox


paul-rogers commented on pull request #2485:
URL: https://github.com/apache/drill/pull/2485#issuecomment-1077311337


   Rebased on master. It is now failing on the C++ code check, which is a bit 
odd because I don't recall seeing much C++ code in Drill...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] paul-rogers commented on a change in pull request #2499: DRILL-8117: Clean up deprecated Apache code in Drill

2022-03-24 Thread GitBox


paul-rogers commented on a change in pull request #2499:
URL: https://github.com/apache/drill/pull/2499#discussion_r833963288



##
File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/impersonation/TestInboundImpersonation.java
##
@@ -156,22 +159,25 @@ public void unauthorizedTarget() throws Exception {
 
   @Test
   public void invalidPolicy() throws Exception {
-thrownException.expect(new 
UserExceptionMatcher(UserBitShared.DrillPBError.ErrorType.VALIDATION,
-"Invalid impersonation policies."));
+String query = "ALTER SYSTEM SET `%s`='%s'";

Review comment:
   Just picking one test file at random, here's an example of the idea: 
https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/compile/TestLargeFileCompilation.java#L273
   
   It is not about the exception handling, which is fine. It's just about the 
busy-work of formatting the `ALTER SYSTEM SET` commands over and over.
   
   Again, without looking closely, I don't know if this test happens to use one 
of the two clients that have these methods. If not, then you have to do things 
the tedious way.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] paul-rogers commented on a change in pull request #2485: DRILL-8086: Convert the CSV (AKA "compliant text") reader to EVF V2

2022-03-24 Thread GitBox


paul-rogers commented on a change in pull request #2485:
URL: https://github.com/apache/drill/pull/2485#discussion_r833958591



##
File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/TextFormatPlugin.java
##
@@ -81,9 +68,8 @@
  * to allow tight control of the size of produced batches (as well
  * as to support provided schema.)
  */
-public class TextFormatPlugin extends 
EasyFormatPlugin {
-
-  private final static String PLUGIN_NAME = "text";
+public class TextFormatPlugin extends EasyFormatPlugin {
+  final static String PLUGIN_NAME = "text";

Review comment:
   On the `field_x` vs `columns[]` issue, that part is possible in the 
current code, as an option. There is a "columns" class in EVF that handles the 
`columns[]` column. I suspect that a parallel "fields" version could be created 
that generates the numbered fields instead of array indexes. An option on the 
plugin would choose one or the other for the no-headers case.
   
   In fact, with this approach, the individual `field_n` columns could be 
typed, based on first-row type inference: something not possible with 
`columns[]`.
   
   Anyone up for giving it a shot? I'd be happy to offer pointers along the way.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] paul-rogers commented on pull request #2485: DRILL-8086: Convert the CSV (AKA "compliant text") reader to EVF V2

2022-03-24 Thread GitBox


paul-rogers commented on pull request #2485:
URL: https://github.com/apache/drill/pull/2485#issuecomment-1077279388


   @jnturton you make good points. I somewhat worry that folks who build large 
distributed systems are a bit under-represented in our current wonderful group 
of contributors, so I do see a drift toward small-scale concerns. (For example, 
I've never seen any shop query 1000s of Excel files, but folks do that every 
day for JSON, Parquet and CSV.)
   
   Similarly, the data science folks want Drill do to it all in SQL. But, at 
scale, people build out data pipelines so that the files which Drill queries 
are the result of a process to produce a query-optimized format such as 
well-partitioned Parquet.
   
   The idea of having two forks (or modes) was an attempt to preserve Drill's 
distributed system heritage, while also allowing the current contributors to go 
wild on the data science use cases.
   
   Maybe a simpler solution is to just have different "bootstrap" files: the 
at-scale edition, the data science edition, etc. Each addition includes plugins 
and defaults optimized for the target use case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] paul-rogers commented on a change in pull request #2485: DRILL-8086: Convert the CSV (AKA "compliant text") reader to EVF V2

2022-03-24 Thread GitBox


paul-rogers commented on a change in pull request #2485:
URL: https://github.com/apache/drill/pull/2485#discussion_r833941830



##
File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/TextFormatPlugin.java
##
@@ -81,9 +68,8 @@
  * to allow tight control of the size of produced batches (as well
  * as to support provided schema.)
  */
-public class TextFormatPlugin extends 
EasyFormatPlugin {
-
-  private final static String PLUGIN_NAME = "text";
+public class TextFormatPlugin extends EasyFormatPlugin {
+  final static String PLUGIN_NAME = "text";

Review comment:
   The team can certainly many things in the 2.0 branch. But, this is not a 
2.0 PR. The goal of this PR is to preserve functionality and simply upgrade the 
underlying reader mechanism to use EVF 2.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org