[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-07-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885052#comment-16885052
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-07-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885051#comment-16885051
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on issue #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#issuecomment-511349056
 
 
   +1
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-07-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884788#comment-16884788
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

paul-rogers commented on issue #1807: DRILL-7293: Convert the regex ("log") 
plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#issuecomment-511243796
 
 
   Rebased on master. Addressed review comments. Squashed commits.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-07-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881342#comment-16881342
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r301666390
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/log/TestLogReader.java
 ##
 @@ -607,4 +611,65 @@ public void testSchemaOnlyWithMissingCols() throws 
Exception {
   client.resetSession(ExecConstants.STORE_TABLE_USE_SCHEMA_FILE);
 }
   }
+
+  @Test
+  public void testEmptyPattern() throws Exception {
+String tablePath = buildTable(tableFuncDir, "tf", "emptyRegex",
+"sample.logf", "/regex/simple.log1");
+try {
+ String sql = "SELECT * FROM %s";
+ client.queryBuilder().sql(sql, tablePath).run();
+} catch (Exception e) {
+  assertTrue(e.getMessage().contains("Regex property is required"));
+}
+  }
+
+  /**
+   * Test the ability to use table functions to specify the regex.
+   */
+
+  @Test
+  public void testTableFunction() throws Exception {
+String tablePath = buildTable(tableFuncDir, "tf", "table1",
+"sample.logf", "/regex/simple.log1");
+
+// Run a query using a table function.
+
+String escaped = DATE_ONLY_PATTERN.replace("\\", "");
+String sql = "SELECT * FROM table(%s(type => '%s', regex => '%s', 
maxErrors => 10))";
+// String sql = "SELECT * FROM %s";
+RowSet results = client.queryBuilder().sql(sql, tablePath, 
LogFormatPlugin.PLUGIN_NAME, escaped).rowSet();
+
+// Verify that the returned data used the schema.
+
+BatchSchema expectedSchema = new SchemaBuilder()
+.addNullable("field_0", MinorType.VARCHAR)
+.addNullable("field_1", MinorType.VARCHAR)
+.addNullable("field_2", MinorType.VARCHAR)
+.build();
+
+RowSet expected = client.rowSetBuilder(expectedSchema)
+.addRow("2017", "12", "17")
+.addRow("2017", "12", "18")
+.addRow("2017", "12", "19")
+.build();
+
+RowSetUtilities.verify(expected, results);
+  }
+
+  @Test
+  public void testTableFunctionNoGroups() throws Exception {
+String tablePath = buildTable(tableFuncDir, "tf", "noGroups",
+"sample.logf", "/regex/simple.log1");
+
+// Use a table function to pass in a regex without a group.
+
+try {
+  String sql = "SELECT * FROM table(%s(type => '%s', regex => '''foo'''))";
+  client.queryBuilder().sql(sql, tablePath, 
LogFormatPlugin.PLUGIN_NAME).run();
 
 Review comment:
   Same here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-07-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881341#comment-16881341
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r301667671
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/LogBatchReader.java
 ##
 @@ -43,60 +44,135 @@
   public static final String RAW_LINE_COL_NAME = "_raw";
   public static final String UNMATCHED_LINE_COL_NAME = "_unmatched_rows";
 
-  private final LogFormatConfig formatConfig;
-  private final Pattern pattern;
-  private final TupleMetadata schema;
-  private final int maxErrors;
+  public static class LogReaderConfig {
+protected final LogFormatPlugin plugin;
+protected final Pattern pattern;
+protected final TupleMetadata schema;
+protected final boolean asArray;
+protected final int groupCount;
+protected final int maxErrors;
+
+public LogReaderConfig(LogFormatPlugin plugin, Pattern pattern,
+TupleMetadata schema, boolean asArray,
+int groupCount, int maxErrors) {
+  this.plugin = plugin;
+  this.pattern = pattern;
+  this.schema = schema;
+  this.asArray = asArray;
+  this.groupCount = groupCount;
+  this.maxErrors = maxErrors;
+}
+  }
+
+  /**
+   * Write group values to value vectors.
+   */
+
+  private interface VectorWriter {
+void loadVectors(Matcher m);
+  }
+
+  /**
+   * Write group values to individual scalar columns.
+   */
+
+  private static class ScalarGroupWriter implements VectorWriter {
+
+private final TupleWriter rowWriter;
+
+public ScalarGroupWriter(TupleWriter rowWriter) {
+  this.rowWriter = rowWriter;
+}
+
+@Override
+public void loadVectors(Matcher m) {
+  for (int i = 0; i < m.groupCount(); i++) {
+String value = m.group(i + 1);
+if (value != null) {
+  rowWriter.scalar(i).setString(value);
+}
+  }
+}
+  }
+
+  /**
+   * Write group values to the columns[] array.
+   */
+
+  private static class ColumnsArrayWriter implements VectorWriter {
+
+private final ScalarWriter elementWriter;
+
+public ColumnsArrayWriter(TupleWriter rowWriter) {
+  elementWriter = rowWriter.array(0).scalar();
+}
+
+   @Override
+public void loadVectors(Matcher m) {
+  for (int i = 0; i < m.groupCount(); i++) {
+String value = m.group(i + 1);
+elementWriter.setString(value == null ? "" : value);
+  }
+}
+  }
+
+  private final LogReaderConfig config;
   private FileSplit split;
   private BufferedReader reader;
-  private int capturingGroups;
   private ResultSetLoader loader;
+  private VectorWriter vectorWriter;
   private ScalarWriter rawColWriter;
   private ScalarWriter unmatchedColWriter;
   private boolean saveMatchedRows;
   private int lineNumber;
   private int errorCount;
 
-  public LogBatchReader(LogFormatConfig formatConfig, Pattern pattern,
-  TupleMetadata schema, int maxErrors) {
-this.formatConfig = formatConfig;
-this.pattern = pattern;
-this.schema = schema;
-this.maxErrors = maxErrors;
+  public LogBatchReader(LogReaderConfig config) {
+this.config = config;
   }
 
   @Override
   public boolean open(FileSchemaNegotiator negotiator) {
 split = negotiator.split();
-setupPattern();
-negotiator.setTableSchema(schema, true);
+negotiator.setTableSchema(config.schema, true);
 loader = negotiator.build();
 bindColumns(loader.writer());
 openFile(negotiator);
 return true;
   }
 
-  private void setupPattern() {
-// Turns out the only way to learn the capturing group count
-// is to create a matcher. We do so with a dummy string.
-
-Matcher m = pattern.matcher("dummy");
-capturingGroups = m.groupCount();
-  }
-
   private void bindColumns(RowSetLoader writer) {
-for (int i = 0; i < capturingGroups; i++) {
-  saveMatchedRows |= writer.scalar(i).isProjected();
-}
 rawColWriter = writer.scalar(RAW_LINE_COL_NAME);
-saveMatchedRows |= rawColWriter.isProjected();
 unmatchedColWriter = writer.scalar(UNMATCHED_LINE_COL_NAME);
+saveMatchedRows = rawColWriter.isProjected();
 
 // If no match-case columns are projected, and the unmatched
 // columns is unprojected, then we want to count (matched)
 // rows.
 
 saveMatchedRows |= !unmatchedColWriter.isProjected();
+
+// This reader is unusual: it can save only unmatched rows,
 
 Review comment:
   Not quite sure I understand meaning of such reader.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries 

[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-07-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881340#comment-16881340
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r301666308
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/log/TestLogReader.java
 ##
 @@ -607,4 +611,65 @@ public void testSchemaOnlyWithMissingCols() throws 
Exception {
   client.resetSession(ExecConstants.STORE_TABLE_USE_SCHEMA_FILE);
 }
   }
+
+  @Test
+  public void testEmptyPattern() throws Exception {
+String tablePath = buildTable(tableFuncDir, "tf", "emptyRegex",
+"sample.logf", "/regex/simple.log1");
+try {
+ String sql = "SELECT * FROM %s";
+ client.queryBuilder().sql(sql, tablePath).run();
 
 Review comment:
   Please add `fail()`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875900#comment-16875900
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

paul-rogers commented on issue #1807: DRILL-7293: Convert the regex ("log") 
plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#issuecomment-507084448
 
 
   Rebased on latest master. One unrelated test fails:
   
   ```
   [ERROR] Errors: 
   [ERROR]   TestDynamicUDFSupport.testDropFunction ยป UserRemote VALIDATION 
ERROR: From lin...
   ```
   
   This test fails about 50% of the time, so it is probably not related to this 
change. The failure prevents running subsequent tests. But, mock data source 
support is likely not used by the other packages.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873852#comment-16873852
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

paul-rogers commented on issue #1807: DRILL-7293: Convert the regex ("log") 
plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#issuecomment-506209593
 
 
   @arina-ielchiieva, added a unit test to show that the schema-only table 
function works.
   
   Tried to create a test that combined a "plugin" table function with the 
"schema" attribute. This failed due to the unfortunate use of "schema" as 
plugin property name. You've pointed out this issue all along, I finally 
understood why it was a problem. Still, I'm reluctant to change the config 
property name for fear of breaking compatibility.
   
   As it turns out, this limitation is only a minor nuisance since the only 
reason to combine the two kinds of table functions is to specify the regex 
property. A unit test shows that the regex can be specified as a table property 
instead.
   
   Also, went ahead and added support for the `columns` column. If no schema is 
provided (not in the plugin config, not in a table function, not in a provided 
schema), then rather than creating a set of dummy fields `field_0`, `field_1`, 
etc., the plugin how follows the text format plugin and puts the fields into 
the `columns` array. The dummy fields are still used if the user specifies at 
least one column schema, but the regex has more groups than specified columns.
   
   This means that, if the user uses a table function to specify just the 
regex, the user gets a reasonable result: the fields come back in the `columns` 
array.
   
   Unit tests show the new `columns` array support.  
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872983#comment-16872983
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on issue #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#issuecomment-505737730
 
 
   @paul-rogers this was introduced in 
https://issues.apache.org/jira/browse/DRILL-6965 as part of schema provisioning 
project, Jira has the description and has doc-impacting label, hopefully it 
would be documented some day.
   Yes, you are right plugin configs are initialized the usual way and once 
`DrillTable` instance is created, it is enriched with schema from schema 
parameter. All magic is done in `org.apache.drill.exec.store.AbstractSchema` 
class.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872979#comment-16872979
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

paul-rogers commented on issue #1807: DRILL-7293: Convert the regex ("log") 
plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#issuecomment-505735209
 
 
   @arina-ielchiieva, thanks, I didn't realize we'd extended the table 
functions. Out of curiosity, how does the schema form know which plugin config 
to use? Or, does this form create a schema object and use the normal path for 
the plugin config?
   
   Might we have this documented somewhere? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872963#comment-16872963
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on issue #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#issuecomment-505733281
 
 
   @paul-rogers thanks for making the changes. Regarding List parameter in 
table function, yes, it does not work and its a known issue, format plugins 
should not use list.
   Though in my previous comments I have asked to try schema parameter in 
String not in List.
   Adding example of the queries I expect should work once again:
   1. SELECT * FROM table(dfs.tf.noGroups(
 type => 'logRegex',
 regex => '(\\d\\d\\d\\d)-(\\d\\d)-(\\d\\d) .*',
 schema=>'inline=(month varchar)'))
   2. select * from table(t(schema=>'inline=(col1 varchar)'))
   
   Examples of the tests can be found in 
`org.apache.drill.TestSchemaWithTableFunction`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872962#comment-16872962
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on issue #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#issuecomment-505733281
 
 
   @paul-rogers thanks for making the changes. Regarding List parameter in 
table function, yes, it does not work and its a known issue, format plugins 
should not use list.
   Though in my previous comments I have asked to try schema parameter in 
String not in List.
   Adding example of the queries I expect should work once again:
   1. SELECT * FROM table(dfs.tf.noGroups(
 type => 'logRegex',
 regex => '(\\d\\d\\d\\d)-(\\d\\d)-(\\d\\d) .*',
 schema=>'inline=(month varchar)'))
   2. select * from table(t(schema=>'inline=(col1 varchar)'))
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872961#comment-16872961
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on issue #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#issuecomment-505733281
 
 
   @paul-rogers thanks for making the changes. Regarding List parameter in 
table function, yes, it does not work and its a known issue, format plugins 
should not use list.
   Though in my previous comments I have asked to try schema parameter in 
String not in List.
   Adding example of the queries I expect should work once again:
   1. SELECT * FROM table(dfs.tf.noGroups(
 type => 'logRegex',
 regex => '(\\d\\d\\d\\d)-(\\d\\d)-(\\d\\d) .*',
 schema=>'inline(month varchar)'))
   2. select * from table(t(schema=>'inline=(col1 varchar)'))
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872959#comment-16872959
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on issue #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#issuecomment-505733281
 
 
   @paul-rogers thanks for making the changes. Regarding List parameter in 
table function, yes, it does not work and its a known issue, format plugins 
should not use list.
   Though in my previous comments I have asked to try schema parameter in 
String not in List.
   Adding example of the queries I expect should work once again:
   1. `SELECT * FROM table(dfs.tf.noGroups(
 type => 'logRegex',
 regex => '(\\d\\d\\d\\d)-(\\d\\d)-(\\d\\d) .*',
 `schema`=>inline(`month` varchar)))
   2. select * from table(t(schema=>'inline=(col1 varchar)'))
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872960#comment-16872960
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on issue #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#issuecomment-505733281
 
 
   @paul-rogers thanks for making the changes. Regarding List parameter in 
table function, yes, it does not work and its a known issue, format plugins 
should not use list.
   Though in my previous comments I have asked to try schema parameter in 
String not in List.
   Adding example of the queries I expect should work once again:
   1. SELECT * FROM table(dfs.tf.noGroups(
 type => 'logRegex',
 regex => '(\\d\\d\\d\\d)-(\\d\\d)-(\\d\\d) .*',
 schema=>inline(month varchar)))
   2. select * from table(t(schema=>'inline=(col1 varchar)'))
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872948#comment-16872948
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

paul-rogers commented on issue #1807: DRILL-7293: Convert the regex ("log") 
plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#issuecomment-505730216
 
 
   One solution to the schema issue for table functions is to use the `columns` 
trick from the text reader. If no schema is provided, then instead of creating 
a set of `field_n` columns, create a single `columns` array column. 
Specifically, if there is no schema defined for the table, and no schema in the 
plugin config (perhaps because the plugin config was created via a table 
function), then just use `columns`.
   
   If I get some time, I'll try this out. With the EVF, this might actually be 
pretty simple. Might be best to add such a feature via another PR.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872907#comment-16872907
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

paul-rogers commented on issue #1807: DRILL-7293: Convert the regex ("log") 
plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#issuecomment-505714454
 
 
   @arina-ielchiieva, I was able to get the plugin to work for this query:
   
   ```
   SELECT * FROM table(dfs.tf.table1(
 type => 'logRegex',
 regex => '(\\d\\d\\d\\d)-(\\d\\d)-(\\d\\d) .*',
 maxErrors => 10))
   ```
   
   To do this, I had to fix some of the issues described in DRILL-7298. In 
particular, DRILL-6672 notes that table functions are not able to call 
{{setFoo()}} methods as Jackson can, so table functions only work if the format 
plugin config fields are {{public}}. The were not public for the log format 
plugin, so I changed them to {{public}} to get the above query to work.
   
   If we look at the code in 
[`FormatPluginOptionsDescriptor.createConfigForTable()`](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FormatPluginOptionsDescriptor.java#L123),
 we'll see that there is nothing that would handle the `values` syntax 
suggested in your note. The only supported types are Java primitives.
   
   When I tried this query:
   
   ```
   SELECT * FROM table(dfs.tf.noGroups(
 type => 'logRegex',
 regex => '(\\d\\d\\d\\d)-(\\d\\d)-(\\d\\d) .*',
 `schema`=>values('month', 'VARCHAR')))
   ```
   
   I got this result:
   
   ```
   PARSE ERROR: Encountered "values" at line 1, column 115.
   
   SQL Query: SELECT * FROM table(dfs.tf.noGroups(type => 'logRegex', regex => 
'(\\d\\d\\d\\d)-(\\d\\d)-(\\d\\d) .*', `schema`=>values('month', 'VARCHAR')))

^
   ```
   
   So, looks like the {{values}} trick does not work. Even if it did, the code 
to produce the values argument would use some kind of Java collection which 
would not match the {{List}} of the {{schema}} field.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870914#comment-16870914
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on issue #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#issuecomment-504905705
 
 
   @paul-rogers I am still unclear if you have tried the following query for 
log plugin data: `select * from table(t(schema=>'inline=(col1 varchar)'))` 
where `t` is table with log plugin data. Did you try it? I suppose it should 
work.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16867012#comment-16867012
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

paul-rogers commented on issue #1807: DRILL-7293: Convert the regex ("log") 
plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#issuecomment-503297370
 
 
   Rebased on latest master.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866352#comment-16866352
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r294674117
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/README.md
 ##
 @@ -143,6 +143,14 @@ cardinality.
 You may find it helpful to specify the regex and column names via the plugin
 config, types via the `CREATE SCHEMA` command.
 
+## Table Functions
+
+Log files come in many forms. It would be very convenient to use Drill table
 
 Review comment:
   I guess initial choice of list property did not take into account that it 
does not work with table function. I don't think you can fix this backward 
compatibility in ZK but since this plugin is a role model for others I think it 
should have proper configuration, so changing `schema` property to be `String` 
instead `List` might be reasonable. Or can we have both properties one in 
String, another in `List`? We can indicate in release notes that log 
plugin has been changed and config in ZK must be updated.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866348#comment-16866348
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r294670491
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/README.md
 ##
 @@ -129,19 +129,62 @@ Drill 1.16 introduced the `CREATE SCHEMA` command to 
allow you to define the
 schema for your table. This plugin was created earlier. Here is how the two 
schema
 systems interact.
 
+### Plugin Config Provides Regex and Field Names
+
+The first way to use the provided schema is just to define column types.
+In this use case, the plugin config provides the physical layout (pattern
+and column names), the provided schema provides data types and default
+values (for missing columns.)
+
+In this case:
+
 * The plugin config must provide the regex.
-* The plugin config should provide the list of column names. (If not provided,
+* The plugin config provides the list of column names. (If not provided,
 the names will be `field_1`, `field_2`, etc.)
-* The plugin config can provide a type for each field. Text data from the regex
-is converted to a nullable column of the specified type.
-* The table can provide a schema via `CREATE SCHEMA`. If so, the column names
-in the schema must match those in the plugin config. The types in the provided
-schema are used instead of those specified in the plugin config. The schema
+* The plugin config should not provide column types.
+* The table provides a schema via `CREATE SCHEMA`. Column names
+in the schema must match those in the plugin config by name. The types in the
+provided schema are used instead of those specified in the plugin config. The 
schema
 allows you to specify the data type, and either nullable or `not null`
 cardinality.
 
-You may find it helpful to specify the regex and column names via the plugin
-config, types via the `CREATE SCHEMA` command.
+### Provided Schema Provides The Regex
+
+Another way to use the provided schema is to define an empty plugin config; 
don't
+even provide the regex. Use table properties to define the regex (and the 
maximum
+error count, if desired.)
+
+In this case:
+
+* Set the table property `drill.regex.regex` to the desired pattern.
 
 Review comment:
   I think using `drill.logRegex.regex` will be fine.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866249#comment-16866249
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

paul-rogers commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r294615119
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/README.md
 ##
 @@ -129,19 +129,62 @@ Drill 1.16 introduced the `CREATE SCHEMA` command to 
allow you to define the
 schema for your table. This plugin was created earlier. Here is how the two 
schema
 systems interact.
 
+### Plugin Config Provides Regex and Field Names
+
+The first way to use the provided schema is just to define column types.
+In this use case, the plugin config provides the physical layout (pattern
+and column names), the provided schema provides data types and default
+values (for missing columns.)
+
+In this case:
+
 * The plugin config must provide the regex.
-* The plugin config should provide the list of column names. (If not provided,
+* The plugin config provides the list of column names. (If not provided,
 the names will be `field_1`, `field_2`, etc.)
-* The plugin config can provide a type for each field. Text data from the regex
-is converted to a nullable column of the specified type.
-* The table can provide a schema via `CREATE SCHEMA`. If so, the column names
-in the schema must match those in the plugin config. The types in the provided
-schema are used instead of those specified in the plugin config. The schema
+* The plugin config should not provide column types.
+* The table provides a schema via `CREATE SCHEMA`. Column names
+in the schema must match those in the plugin config by name. The types in the
+provided schema are used instead of those specified in the plugin config. The 
schema
 allows you to specify the data type, and either nullable or `not null`
 cardinality.
 
-You may find it helpful to specify the regex and column names via the plugin
-config, types via the `CREATE SCHEMA` command.
+### Provided Schema Provides The Regex
+
+Another way to use the provided schema is to define an empty plugin config; 
don't
+even provide the regex. Use table properties to define the regex (and the 
maximum
+error count, if desired.)
+
+In this case:
+
+* Set the table property `drill.regex.regex` to the desired pattern.
 
 Review comment:
   Agree, it is pretty awkward. The saving grace is that I did, I believe, 
change "regex" to "logRegex" as you suggested. That is, the second item is the 
plugin "type" name.
   
   When we worked on the text reader, I had first tried to choose good names 
for the third item. You rightly pointed out that it might be easier to remember 
if we simply use the existing config field names, which is what I did here.
   
   So, even if the names are awkward, the pattern we've evolved is:
   
   ```
   drill..
   ```
   
   That said, I'm open to suggestions if there is a better way to handle these 
names; now is the time to make improvements before folks deploy schema files 
with the names.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866246#comment-16866246
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

paul-rogers commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r294614546
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/README.md
 ##
 @@ -143,6 +143,14 @@ cardinality.
 You may find it helpful to specify the regex and column names via the plugin
 config, types via the `CREATE SCHEMA` command.
 
+## Table Functions
+
+Log files come in many forms. It would be very convenient to use Drill table
 
 Review comment:
   As I recall, Drill does not have a good way to deal with changes to the 
schema of a storage plugin. Some time back, I remember struggling to understand 
why my server would not start, only to eventually learn that some plugin or 
other changed its config and so Drill failed when trying to load the existing 
config from ZK. Has this been fixed?
   
   If we change schema to a string, we'd need to run code to convert old 
configs. Also, we'd have the problem of what to do with the type property. We 
could not easily convert an existing config into a table schema.
   
   Given these uncertainties, my thought was to leave the config alone and try 
to fit in the provided schema as best we can on top of the existing config.
   
   Can you suggest a better approach?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866245#comment-16866245
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

paul-rogers commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r294614546
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/README.md
 ##
 @@ -143,6 +143,14 @@ cardinality.
 You may find it helpful to specify the regex and column names via the plugin
 config, types via the `CREATE SCHEMA` command.
 
+## Table Functions
+
+Log files come in many forms. It would be very convenient to use Drill table
 
 Review comment:
   As I recall, Drill does not have a good way to deal with changes to the 
schema of a storage plugin. Some time back, I remember struggling to understand 
why my server would not start, only to eventually learn that some plugin or 
other changed its config and so Drill failed when trying to load the existing 
config from ZK. Has this been fixed?
   
   If we change schema to a string, we'd need to run code to convert old 
configs. Also, we'd have the problem of what to do with the type property. We 
could not easily convert an existing config into a table schema.
   
   Given these uncertainties, my thought was to leave the config alone and try 
to fit in the provided schema as best we can on top of the existing config.
   
   Can you think of a better approach?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863791#comment-16863791
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r293692425
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/README.md
 ##
 @@ -129,19 +129,62 @@ Drill 1.16 introduced the `CREATE SCHEMA` command to 
allow you to define the
 schema for your table. This plugin was created earlier. Here is how the two 
schema
 systems interact.
 
+### Plugin Config Provides Regex and Field Names
+
+The first way to use the provided schema is just to define column types.
+In this use case, the plugin config provides the physical layout (pattern
+and column names), the provided schema provides data types and default
+values (for missing columns.)
+
+In this case:
+
 * The plugin config must provide the regex.
-* The plugin config should provide the list of column names. (If not provided,
+* The plugin config provides the list of column names. (If not provided,
 the names will be `field_1`, `field_2`, etc.)
-* The plugin config can provide a type for each field. Text data from the regex
-is converted to a nullable column of the specified type.
-* The table can provide a schema via `CREATE SCHEMA`. If so, the column names
-in the schema must match those in the plugin config. The types in the provided
-schema are used instead of those specified in the plugin config. The schema
+* The plugin config should not provide column types.
+* The table provides a schema via `CREATE SCHEMA`. Column names
+in the schema must match those in the plugin config by name. The types in the
+provided schema are used instead of those specified in the plugin config. The 
schema
 allows you to specify the data type, and either nullable or `not null`
 cardinality.
 
-You may find it helpful to specify the regex and column names via the plugin
-config, types via the `CREATE SCHEMA` command.
+### Provided Schema Provides The Regex
+
+Another way to use the provided schema is to define an empty plugin config; 
don't
+even provide the regex. Use table properties to define the regex (and the 
maximum
+error count, if desired.)
+
+In this case:
+
+* Set the table property `drill.regex.regex` to the desired pattern.
 
 Review comment:
   I think we should use different naming, `drill.regex.regex` look awkward. 
Maybe `drill.regex.pattern` or something like this?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863788#comment-16863788
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r293690640
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/LogFormatField.java
 ##
 @@ -18,35 +18,31 @@
 
 package org.apache.drill.exec.store.log;
 
+import 
org.apache.drill.shaded.guava.com.google.common.annotations.VisibleForTesting;
+
 import com.fasterxml.jackson.annotation.JsonInclude;
 import com.fasterxml.jackson.annotation.JsonTypeName;
 
+
+/**
+ * The three configuration options for a field are:
+ * 
+ * The field name
+ * The data type (fieldType).  Field type defaults to VARCHAR
 
 Review comment:
   Extra space before `Field`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863789#comment-16863789
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r293691586
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/README.md
 ##
 @@ -143,6 +143,14 @@ cardinality.
 You may find it helpful to specify the regex and column names via the plugin
 config, types via the `CREATE SCHEMA` command.
 
+## Table Functions
+
+Log files come in many forms. It would be very convenient to use Drill table
 
 Review comment:
   Table function will work for all log format properties, except of list. 
Knowing that list is not supported, does it makes sense to replace list schema 
parameter with String and rename it to avoid clash with schema parameter for 
schema provisioning.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863790#comment-16863790
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r293692739
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/LogFormatPlugin.java
 ##
 @@ -47,23 +47,30 @@
 import org.slf4j.LoggerFactory;
 
 public class LogFormatPlugin extends EasyFormatPlugin {
-  public static final String PLUGIN_NAME = "logRegex";
   private static final Logger logger = 
LoggerFactory.getLogger(LogFormatPlugin.class);
 
+  public static final String PLUGIN_NAME = "logRegex";
+  public static final String PROP_PREFIX = TupleMetadata.DRILL_PROP_PREFIX + 
"regex.";
 
 Review comment:
   Since properties are `log` specific should we add `log` in the properties 
naming as well as we did for `text` properties?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863780#comment-16863780
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r293690297
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/log/TestLogReader.java
 ##
 @@ -389,4 +416,56 @@ public void testRawUMNoSchema() throws RpcException {
 
 RowSetUtilities.verify(expected, results);
   }
+
+  @Test
+  public void testProvidedSchema() throws Exception {
 
 Review comment:
   `select * from table(t(schema=>'inline=(col1 varchar)'))` should work 
disregarding format properties. But log format has schema property so I am 
wondering if there will be a clash or schema parameter will be correctly 
resolved, since log format has it as list and schema provisioning in string.
   Since now log format supports schema provisioning, the above query should 
apply schema for log files, could you please check this query?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863659#comment-16863659
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

paul-rogers commented on issue #1807: DRILL-7293: Convert the regex ("log") 
plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#issuecomment-501956556
 
 
   Added the ability to specify the regex (and column schema) in the provided 
schema. Defined a table property for the regex. Although we can't (yet) use 
table properties to define the schema, we can now use `CREATE SCHEMA` to define 
both the regex and the schema.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863654#comment-16863654
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

paul-rogers commented on issue #1807: DRILL-7293: Convert the regex ("log") 
plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#issuecomment-501956415
 
 
   This PR is now failing due to the Protobuf errors. Thanks @vvysotskyi for 
fixing them. I'll rebase onto that fix once it is committed and the reviewers 
have approved the commits.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863601#comment-16863601
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

paul-rogers commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r293633490
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/log/TestLogReader.java
 ##
 @@ -389,4 +416,56 @@ public void testRawUMNoSchema() throws RpcException {
 
 RowSetUtilities.verify(expected, results);
   }
+
+  @Test
+  public void testProvidedSchema() throws Exception {
 
 Review comment:
   Short answer: it does not seem to work. I tried this in the past and found 
that table functions take only simple values (numbers, strings), not lists. 
Since this plugin uses a list, I never could figure out how to use it with 
table functions. In particular, how would the table function know how to create 
the instance of `LogFormatField` within the list? Am I missing something?
   
   This plugin, in particular, would very much benefit from the use of a table 
function so that the user does not have to define a new plugin config for each 
new file type.
   
   If there is a way to make this work, we can add the test and describe the 
answer in the README file.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862957#comment-16862957
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r293324471
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/README.md
 ##
 @@ -11,26 +18,50 @@ If you wanted to analyze log files such as the MySQL log 
sample shown below usin
 070917 16:29:01  21 Query   select * from location
 070917 16:29:12  21 Query   select * from location where id = 1 LIMIT 1
 ```
-This plugin will allow you to configure Drill to directly query logfiles of 
any configuration.
+
+Using this plugin, you can configure Drill to directly query log files of
+any configuration.
 
 ## Configuration Options
-* **`type`**:  This tells Drill which extension to use.  In this case, it must 
be `logRegex`.  This field is mandatory.
-* **`regex`**:  This is the regular expression which defines how the log file 
lines will be split.  You must enclose the parts of the regex in grouping 
parentheses that you wish to extract.  Note that this plugin uses Java regular 
expressions and requires that shortcuts such as `\d` have an additional slash:  
ie `\\d`.  This field is mandatory.
-* **`extension`**:  This option tells Drill which file extensions should be 
mapped to this configuration.  Note that you can have multiple configurations 
of this plugin to allow you to query various log files.  This field is 
mandatory.
-* **`maxErrors`**:  Log files can be inconsistent and messy.  The `maxErrors` 
variable allows you to set how many errors the reader will ignore before 
halting execution and throwing an error.  Defaults to 10.
-* **`schema`**:  The `schema` field is where you define the structure of the 
log file.  This section is optional.  If you do not define a schema, all fields 
will be assigned a column name of `field_n` where `n` is the index of the 
field. The undefined fields will be assigned a default data type of `VARCHAR`.
+
+* **`type`**:  This tells Drill which extension to use.  In this case, it must
 
 Review comment:
   ```suggestion
   * **`type`**: This tells Drill which extension to use. In this case, it must
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862966#comment-16862966
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r293327844
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/log/TestLogReader.java
 ##
 @@ -389,4 +416,56 @@ public void testRawUMNoSchema() throws RpcException {
 
 RowSetUtilities.verify(expected, results);
   }
+
+  @Test
+  public void testProvidedSchema() throws Exception {
 
 Review comment:
   Could you please add unit tests to check how this format plugin works with 
schema parameter in table function?
   Example: `org.apache.drill.TestSchemaWithTableFunction`
   We might need to check two cases:
   `select * from table(t(schema=>'inline=(col1 varchar)'))`
   `select * from table(t(type=>'logRegex', schema=>'inline=(col1 varchar)'))`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862968#comment-16862968
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r293324511
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/README.md
 ##
 @@ -11,26 +18,50 @@ If you wanted to analyze log files such as the MySQL log 
sample shown below usin
 070917 16:29:01  21 Query   select * from location
 070917 16:29:12  21 Query   select * from location where id = 1 LIMIT 1
 ```
-This plugin will allow you to configure Drill to directly query logfiles of 
any configuration.
+
+Using this plugin, you can configure Drill to directly query log files of
+any configuration.
 
 ## Configuration Options
-* **`type`**:  This tells Drill which extension to use.  In this case, it must 
be `logRegex`.  This field is mandatory.
-* **`regex`**:  This is the regular expression which defines how the log file 
lines will be split.  You must enclose the parts of the regex in grouping 
parentheses that you wish to extract.  Note that this plugin uses Java regular 
expressions and requires that shortcuts such as `\d` have an additional slash:  
ie `\\d`.  This field is mandatory.
-* **`extension`**:  This option tells Drill which file extensions should be 
mapped to this configuration.  Note that you can have multiple configurations 
of this plugin to allow you to query various log files.  This field is 
mandatory.
-* **`maxErrors`**:  Log files can be inconsistent and messy.  The `maxErrors` 
variable allows you to set how many errors the reader will ignore before 
halting execution and throwing an error.  Defaults to 10.
-* **`schema`**:  The `schema` field is where you define the structure of the 
log file.  This section is optional.  If you do not define a schema, all fields 
will be assigned a column name of `field_n` where `n` is the index of the 
field. The undefined fields will be assigned a default data type of `VARCHAR`.
+
+* **`type`**:  This tells Drill which extension to use.  In this case, it must
+be `logRegex`.  This field is mandatory.
 
 Review comment:
   ```suggestion
   be `logRegex`. This field is mandatory.
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862955#comment-16862955
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r293323024
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/LogBatchReader.java
 ##
 @@ -0,0 +1,210 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.log;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import java.util.regex.PatternSyntaxException;
+
+import org.apache.drill.common.exceptions.UserException;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.rowSet.ResultSetLoader;
+import org.apache.drill.exec.physical.rowSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.mapred.FileSplit;
+
+public class LogBatchReader implements ManagedReader {
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(LogBatchReader.class);
 
 Review comment:
   No need to use full imports.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862967#comment-16862967
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r293327274
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/LogFormatPlugin.java
 ##
 @@ -18,86 +18,224 @@
 
 package org.apache.drill.exec.store.log;
 
-import java.io.IOException;
-import org.apache.drill.exec.planner.common.DrillStatsTable;
-import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import java.util.regex.PatternSyntaxException;
+
 import org.apache.drill.common.exceptions.ExecutionSetupException;
-import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.exceptions.UserException;
 import org.apache.drill.common.logical.StoragePluginConfig;
-import org.apache.drill.exec.ops.FragmentContext;
-import org.apache.drill.exec.proto.UserBitShared;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileReaderFactory;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileScanBuilder;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.proto.UserBitShared.CoreOperatorType;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
 import org.apache.drill.exec.server.DrillbitContext;
-import org.apache.drill.exec.store.RecordReader;
-import org.apache.drill.exec.store.RecordWriter;
-import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.server.options.OptionManager;
 import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin;
-import org.apache.drill.exec.store.dfs.easy.EasyWriter;
-import org.apache.drill.exec.store.dfs.easy.FileWork;
+import org.apache.drill.exec.store.dfs.easy.EasySubScan;
+import org.apache.drill.shaded.guava.com.google.common.base.Strings;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
 import org.apache.hadoop.conf.Configuration;
 
-import java.util.List;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
-
 public class LogFormatPlugin extends EasyFormatPlugin {
+  public static final String PLUGIN_NAME = "logRegex";
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(LogFormatPlugin.class);
+
+  private static class LogReaderFactory extends FileReaderFactory {
+private final LogFormatPlugin plugin;
+private final Pattern pattern;
+private final TupleMetadata schema;
+
+public LogReaderFactory(LogFormatPlugin plugin, Pattern pattern, 
TupleMetadata schema) {
+  this.plugin = plugin;
+  this.pattern = pattern;
+  this.schema = schema;
+}
 
-  public static final String DEFAULT_NAME = "logRegex";
-  private final LogFormatConfig formatConfig;
+@Override
+public ManagedReader newReader() {
+   return new LogBatchReader(plugin.getConfig(), pattern, schema);
+}
+  }
 
   public LogFormatPlugin(String name, DrillbitContext context,
  Configuration fsConf, StoragePluginConfig 
storageConfig,
  LogFormatConfig formatConfig) {
-super(name, context, fsConf, storageConfig, formatConfig,
-true,  // readable
-false, // writable
-true, // blockSplittable
-true,  // compressible
-Lists.newArrayList(formatConfig.getExtension()),
-DEFAULT_NAME);
-this.formatConfig = formatConfig;
+super(name, easyConfig(fsConf, formatConfig), context, storageConfig, 
formatConfig);
   }
 
-  @Override
-  public RecordReader getRecordReader(FragmentContext context,
-  DrillFileSystem dfs, FileWork fileWork, 
List columns,
-  String userName) throws 
ExecutionSetupException {
-return new LogRecordReader(context, dfs, fileWork,
-columns, userName, formatConfig);
+  private static EasyFormatConfig easyConfig(Configuration fsConf, 
LogFormatConfig pluginConfig) {
+EasyFormatConfig config = new EasyFormatConfig();
+config.readable = true;
+config.writable = false;
+// Should be block splitable, but logic not yet implemented.
+config.blockSplittable = false;
+config.compressible = true;
+config.supportsProjectPushdown = true;
+config.extensions 

[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862958#comment-16862958
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r293324364
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/README.md
 ##
 @@ -1,8 +1,15 @@
 # Drill Regex/Logfile Plugin
-Plugin for Apache Drill that allows Drill to read and query arbitrary files 
where the schema can be defined by a regex.  The original intent was for this 
to be used for log files, however, it can be used for any structured data.
+
+Plugin for Apache Drill that allows Drill to read and query arbitrary files
+where the schema can be defined by a regex.  The original intent was for this
 
 Review comment:
   ```suggestion
   where the schema can be defined by a regex. The original intent was for this
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862961#comment-16862961
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r293325049
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/README.md
 ##
 @@ -11,26 +18,50 @@ If you wanted to analyze log files such as the MySQL log 
sample shown below usin
 070917 16:29:01  21 Query   select * from location
 070917 16:29:12  21 Query   select * from location where id = 1 LIMIT 1
 ```
-This plugin will allow you to configure Drill to directly query logfiles of 
any configuration.
+
+Using this plugin, you can configure Drill to directly query log files of
+any configuration.
 
 ## Configuration Options
-* **`type`**:  This tells Drill which extension to use.  In this case, it must 
be `logRegex`.  This field is mandatory.
-* **`regex`**:  This is the regular expression which defines how the log file 
lines will be split.  You must enclose the parts of the regex in grouping 
parentheses that you wish to extract.  Note that this plugin uses Java regular 
expressions and requires that shortcuts such as `\d` have an additional slash:  
ie `\\d`.  This field is mandatory.
-* **`extension`**:  This option tells Drill which file extensions should be 
mapped to this configuration.  Note that you can have multiple configurations 
of this plugin to allow you to query various log files.  This field is 
mandatory.
-* **`maxErrors`**:  Log files can be inconsistent and messy.  The `maxErrors` 
variable allows you to set how many errors the reader will ignore before 
halting execution and throwing an error.  Defaults to 10.
-* **`schema`**:  The `schema` field is where you define the structure of the 
log file.  This section is optional.  If you do not define a schema, all fields 
will be assigned a column name of `field_n` where `n` is the index of the 
field. The undefined fields will be assigned a default data type of `VARCHAR`.
+
+* **`type`**:  This tells Drill which extension to use.  In this case, it must
+be `logRegex`.  This field is mandatory.
+* **`regex`**:  This is the regular expression which defines how the log file
+lines will be split.  You must enclose the parts of the regex in grouping
 
 Review comment:
   Looks like everywhere in the doc there are two spaces before sentences 
instead of one. Could you please check and fix?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862962#comment-16862962
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r293326614
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/LogBatchReader.java
 ##
 @@ -0,0 +1,210 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.log;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import java.util.regex.PatternSyntaxException;
+
+import org.apache.drill.common.exceptions.UserException;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.rowSet.ResultSetLoader;
+import org.apache.drill.exec.physical.rowSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.mapred.FileSplit;
+
+public class LogBatchReader implements ManagedReader {
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(LogBatchReader.class);
+  public static final String RAW_LINE_COL_NAME = "_raw";
+  public static final String UNMATCHED_LINE_COL_NAME = "_unmatched_rows";
+
+  private FileSplit split;
+  private final LogFormatConfig formatConfig;
+  private final Pattern pattern;
+  private final TupleMetadata schema;
+  private BufferedReader reader;
+  private int capturingGroups;
+  private ResultSetLoader loader;
+  private ScalarWriter rawColWriter;
+  private ScalarWriter unmatchedColWriter;
+  private boolean saveMatchedRows;
+  private int maxErrors;
+  private int lineNumber;
+  private int errorCount;
+
+  public LogBatchReader(LogFormatConfig formatConfig, Pattern pattern, 
TupleMetadata schema) {
+this.formatConfig = formatConfig;
+this.maxErrors = Math.max(0, formatConfig.getMaxErrors());
+this.pattern = pattern;
+this.schema = schema;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+setupPattern();
+negotiator.setTableSchema(schema, true);
+loader = negotiator.build();
+bindColumns(loader.writer());
+openFile(negotiator);
+return true;
+  }
+
+  private void setupPattern() {
+try {
+  Matcher m = pattern.matcher("test");
+  capturingGroups = m.groupCount();
+} catch (PatternSyntaxException e) {
+  throw UserException
+  .validationError(e)
+  .message("Failed to parse regex: \"%s\"", formatConfig.getRegex())
+  .build(logger);
+}
+  }
+
+  private void bindColumns(RowSetLoader writer) {
+for (int i = 0; i < capturingGroups; i++) {
+  saveMatchedRows |= writer.scalar(i).isProjected();
+}
+rawColWriter = writer.scalar(RAW_LINE_COL_NAME);
+saveMatchedRows |= rawColWriter.isProjected();
+unmatchedColWriter = writer.scalar(UNMATCHED_LINE_COL_NAME);
+
+// If no match-case columns are projected, and the unmatched
+// columns is unprojected, then we want to count (matched)
+// rows.
+
+saveMatchedRows |= !unmatchedColWriter.isProjected();
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+InputStream in;
+try {
+  in = negotiator.fileSystem().open(split.getPath());
+} catch (Exception e) {
+  throw UserException
+  .dataReadError(e)
+  .message("Failed to open open input file: %s", split.getPath())
+  .addContext("User name", negotiator.userName())
+  .build(logger);
+}
+reader = new BufferedReader(new InputStreamReader(in, Charsets.UTF_8));
+  }
+
+  @Override
+  public boolean next() {
+

[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862959#comment-16862959
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r293323694
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/LogFormatPlugin.java
 ##
 @@ -18,86 +18,224 @@
 
 package org.apache.drill.exec.store.log;
 
-import java.io.IOException;
-import org.apache.drill.exec.planner.common.DrillStatsTable;
-import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import java.util.regex.PatternSyntaxException;
+
 import org.apache.drill.common.exceptions.ExecutionSetupException;
-import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.exceptions.UserException;
 import org.apache.drill.common.logical.StoragePluginConfig;
-import org.apache.drill.exec.ops.FragmentContext;
-import org.apache.drill.exec.proto.UserBitShared;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileReaderFactory;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileScanBuilder;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.proto.UserBitShared.CoreOperatorType;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
 import org.apache.drill.exec.server.DrillbitContext;
-import org.apache.drill.exec.store.RecordReader;
-import org.apache.drill.exec.store.RecordWriter;
-import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.server.options.OptionManager;
 import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin;
-import org.apache.drill.exec.store.dfs.easy.EasyWriter;
-import org.apache.drill.exec.store.dfs.easy.FileWork;
+import org.apache.drill.exec.store.dfs.easy.EasySubScan;
+import org.apache.drill.shaded.guava.com.google.common.base.Strings;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
 import org.apache.hadoop.conf.Configuration;
 
-import java.util.List;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
-
 public class LogFormatPlugin extends EasyFormatPlugin {
+  public static final String PLUGIN_NAME = "logRegex";
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(LogFormatPlugin.class);
+
+  private static class LogReaderFactory extends FileReaderFactory {
+private final LogFormatPlugin plugin;
+private final Pattern pattern;
+private final TupleMetadata schema;
+
+public LogReaderFactory(LogFormatPlugin plugin, Pattern pattern, 
TupleMetadata schema) {
+  this.plugin = plugin;
+  this.pattern = pattern;
+  this.schema = schema;
+}
 
-  public static final String DEFAULT_NAME = "logRegex";
-  private final LogFormatConfig formatConfig;
+@Override
+public ManagedReader newReader() {
+   return new LogBatchReader(plugin.getConfig(), pattern, schema);
+}
+  }
 
   public LogFormatPlugin(String name, DrillbitContext context,
  Configuration fsConf, StoragePluginConfig 
storageConfig,
  LogFormatConfig formatConfig) {
-super(name, context, fsConf, storageConfig, formatConfig,
-true,  // readable
-false, // writable
-true, // blockSplittable
-true,  // compressible
-Lists.newArrayList(formatConfig.getExtension()),
-DEFAULT_NAME);
-this.formatConfig = formatConfig;
+super(name, easyConfig(fsConf, formatConfig), context, storageConfig, 
formatConfig);
   }
 
-  @Override
-  public RecordReader getRecordReader(FragmentContext context,
-  DrillFileSystem dfs, FileWork fileWork, 
List columns,
-  String userName) throws 
ExecutionSetupException {
-return new LogRecordReader(context, dfs, fileWork,
-columns, userName, formatConfig);
+  private static EasyFormatConfig easyConfig(Configuration fsConf, 
LogFormatConfig pluginConfig) {
+EasyFormatConfig config = new EasyFormatConfig();
+config.readable = true;
+config.writable = false;
+// Should be block splitable, but logic not yet implemented.
+config.blockSplittable = false;
+config.compressible = true;
+config.supportsProjectPushdown = true;
+config.extensions 

[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862969#comment-16862969
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r293326805
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/LogBatchReader.java
 ##
 @@ -0,0 +1,210 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.log;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import java.util.regex.PatternSyntaxException;
+
+import org.apache.drill.common.exceptions.UserException;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.rowSet.ResultSetLoader;
+import org.apache.drill.exec.physical.rowSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.mapred.FileSplit;
+
+public class LogBatchReader implements ManagedReader {
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(LogBatchReader.class);
+  public static final String RAW_LINE_COL_NAME = "_raw";
+  public static final String UNMATCHED_LINE_COL_NAME = "_unmatched_rows";
+
+  private FileSplit split;
+  private final LogFormatConfig formatConfig;
+  private final Pattern pattern;
+  private final TupleMetadata schema;
+  private BufferedReader reader;
+  private int capturingGroups;
+  private ResultSetLoader loader;
+  private ScalarWriter rawColWriter;
+  private ScalarWriter unmatchedColWriter;
+  private boolean saveMatchedRows;
+  private int maxErrors;
+  private int lineNumber;
+  private int errorCount;
+
+  public LogBatchReader(LogFormatConfig formatConfig, Pattern pattern, 
TupleMetadata schema) {
+this.formatConfig = formatConfig;
+this.maxErrors = Math.max(0, formatConfig.getMaxErrors());
+this.pattern = pattern;
+this.schema = schema;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+setupPattern();
+negotiator.setTableSchema(schema, true);
+loader = negotiator.build();
+bindColumns(loader.writer());
+openFile(negotiator);
+return true;
+  }
+
+  private void setupPattern() {
+try {
+  Matcher m = pattern.matcher("test");
+  capturingGroups = m.groupCount();
+} catch (PatternSyntaxException e) {
+  throw UserException
+  .validationError(e)
+  .message("Failed to parse regex: \"%s\"", formatConfig.getRegex())
+  .build(logger);
+}
+  }
+
+  private void bindColumns(RowSetLoader writer) {
+for (int i = 0; i < capturingGroups; i++) {
+  saveMatchedRows |= writer.scalar(i).isProjected();
+}
+rawColWriter = writer.scalar(RAW_LINE_COL_NAME);
+saveMatchedRows |= rawColWriter.isProjected();
+unmatchedColWriter = writer.scalar(UNMATCHED_LINE_COL_NAME);
+
+// If no match-case columns are projected, and the unmatched
+// columns is unprojected, then we want to count (matched)
+// rows.
+
+saveMatchedRows |= !unmatchedColWriter.isProjected();
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+InputStream in;
+try {
+  in = negotiator.fileSystem().open(split.getPath());
+} catch (Exception e) {
+  throw UserException
+  .dataReadError(e)
+  .message("Failed to open open input file: %s", split.getPath())
+  .addContext("User name", negotiator.userName())
+  .build(logger);
+}
+reader = new BufferedReader(new InputStreamReader(in, Charsets.UTF_8));
+  }
+
+  @Override
+  public boolean next() {
+

[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862956#comment-16862956
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r293323517
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/LogFormatPlugin.java
 ##
 @@ -18,86 +18,224 @@
 
 package org.apache.drill.exec.store.log;
 
-import java.io.IOException;
-import org.apache.drill.exec.planner.common.DrillStatsTable;
-import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import java.util.regex.PatternSyntaxException;
+
 import org.apache.drill.common.exceptions.ExecutionSetupException;
-import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.exceptions.UserException;
 import org.apache.drill.common.logical.StoragePluginConfig;
-import org.apache.drill.exec.ops.FragmentContext;
-import org.apache.drill.exec.proto.UserBitShared;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileReaderFactory;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileScanBuilder;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.proto.UserBitShared.CoreOperatorType;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
 import org.apache.drill.exec.server.DrillbitContext;
-import org.apache.drill.exec.store.RecordReader;
-import org.apache.drill.exec.store.RecordWriter;
-import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.server.options.OptionManager;
 import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin;
-import org.apache.drill.exec.store.dfs.easy.EasyWriter;
-import org.apache.drill.exec.store.dfs.easy.FileWork;
+import org.apache.drill.exec.store.dfs.easy.EasySubScan;
+import org.apache.drill.shaded.guava.com.google.common.base.Strings;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
 import org.apache.hadoop.conf.Configuration;
 
-import java.util.List;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
-
 public class LogFormatPlugin extends EasyFormatPlugin {
+  public static final String PLUGIN_NAME = "logRegex";
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(LogFormatPlugin.class);
+
+  private static class LogReaderFactory extends FileReaderFactory {
+private final LogFormatPlugin plugin;
+private final Pattern pattern;
+private final TupleMetadata schema;
+
+public LogReaderFactory(LogFormatPlugin plugin, Pattern pattern, 
TupleMetadata schema) {
+  this.plugin = plugin;
+  this.pattern = pattern;
+  this.schema = schema;
+}
 
-  public static final String DEFAULT_NAME = "logRegex";
-  private final LogFormatConfig formatConfig;
+@Override
+public ManagedReader newReader() {
+   return new LogBatchReader(plugin.getConfig(), pattern, schema);
+}
+  }
 
   public LogFormatPlugin(String name, DrillbitContext context,
  Configuration fsConf, StoragePluginConfig 
storageConfig,
  LogFormatConfig formatConfig) {
-super(name, context, fsConf, storageConfig, formatConfig,
-true,  // readable
-false, // writable
-true, // blockSplittable
-true,  // compressible
-Lists.newArrayList(formatConfig.getExtension()),
-DEFAULT_NAME);
-this.formatConfig = formatConfig;
+super(name, easyConfig(fsConf, formatConfig), context, storageConfig, 
formatConfig);
   }
 
-  @Override
-  public RecordReader getRecordReader(FragmentContext context,
-  DrillFileSystem dfs, FileWork fileWork, 
List columns,
-  String userName) throws 
ExecutionSetupException {
-return new LogRecordReader(context, dfs, fileWork,
-columns, userName, formatConfig);
+  private static EasyFormatConfig easyConfig(Configuration fsConf, 
LogFormatConfig pluginConfig) {
+EasyFormatConfig config = new EasyFormatConfig();
+config.readable = true;
+config.writable = false;
+// Should be block splitable, but logic not yet implemented.
+config.blockSplittable = false;
+config.compressible = true;
+config.supportsProjectPushdown = true;
+config.extensions 

[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862965#comment-16862965
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r29332
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/LogBatchReader.java
 ##
 @@ -0,0 +1,210 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.log;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import java.util.regex.PatternSyntaxException;
+
+import org.apache.drill.common.exceptions.UserException;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.rowSet.ResultSetLoader;
+import org.apache.drill.exec.physical.rowSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.mapred.FileSplit;
+
+public class LogBatchReader implements ManagedReader {
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(LogBatchReader.class);
+  public static final String RAW_LINE_COL_NAME = "_raw";
+  public static final String UNMATCHED_LINE_COL_NAME = "_unmatched_rows";
+
+  private FileSplit split;
+  private final LogFormatConfig formatConfig;
+  private final Pattern pattern;
+  private final TupleMetadata schema;
+  private BufferedReader reader;
+  private int capturingGroups;
+  private ResultSetLoader loader;
+  private ScalarWriter rawColWriter;
+  private ScalarWriter unmatchedColWriter;
+  private boolean saveMatchedRows;
+  private int maxErrors;
+  private int lineNumber;
+  private int errorCount;
+
+  public LogBatchReader(LogFormatConfig formatConfig, Pattern pattern, 
TupleMetadata schema) {
+this.formatConfig = formatConfig;
+this.maxErrors = Math.max(0, formatConfig.getMaxErrors());
+this.pattern = pattern;
+this.schema = schema;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+setupPattern();
+negotiator.setTableSchema(schema, true);
+loader = negotiator.build();
+bindColumns(loader.writer());
+openFile(negotiator);
+return true;
+  }
+
+  private void setupPattern() {
+try {
+  Matcher m = pattern.matcher("test");
+  capturingGroups = m.groupCount();
+} catch (PatternSyntaxException e) {
+  throw UserException
+  .validationError(e)
+  .message("Failed to parse regex: \"%s\"", formatConfig.getRegex())
+  .build(logger);
+}
+  }
+
+  private void bindColumns(RowSetLoader writer) {
+for (int i = 0; i < capturingGroups; i++) {
+  saveMatchedRows |= writer.scalar(i).isProjected();
+}
+rawColWriter = writer.scalar(RAW_LINE_COL_NAME);
+saveMatchedRows |= rawColWriter.isProjected();
+unmatchedColWriter = writer.scalar(UNMATCHED_LINE_COL_NAME);
+
+// If no match-case columns are projected, and the unmatched
+// columns is unprojected, then we want to count (matched)
+// rows.
+
+saveMatchedRows |= !unmatchedColWriter.isProjected();
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+InputStream in;
+try {
+  in = negotiator.fileSystem().open(split.getPath());
+} catch (Exception e) {
+  throw UserException
+  .dataReadError(e)
+  .message("Failed to open open input file: %s", split.getPath())
+  .addContext("User name", negotiator.userName())
+  .build(logger);
+}
+reader = new BufferedReader(new InputStreamReader(in, Charsets.UTF_8));
+  }
+
+  @Override
+  public boolean next() {
+

[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862964#comment-16862964
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r293327227
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/LogFormatPlugin.java
 ##
 @@ -18,86 +18,224 @@
 
 package org.apache.drill.exec.store.log;
 
-import java.io.IOException;
-import org.apache.drill.exec.planner.common.DrillStatsTable;
-import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import java.util.regex.PatternSyntaxException;
+
 import org.apache.drill.common.exceptions.ExecutionSetupException;
-import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.exceptions.UserException;
 import org.apache.drill.common.logical.StoragePluginConfig;
-import org.apache.drill.exec.ops.FragmentContext;
-import org.apache.drill.exec.proto.UserBitShared;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileReaderFactory;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileScanBuilder;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.proto.UserBitShared.CoreOperatorType;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
 import org.apache.drill.exec.server.DrillbitContext;
-import org.apache.drill.exec.store.RecordReader;
-import org.apache.drill.exec.store.RecordWriter;
-import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.server.options.OptionManager;
 import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin;
-import org.apache.drill.exec.store.dfs.easy.EasyWriter;
-import org.apache.drill.exec.store.dfs.easy.FileWork;
+import org.apache.drill.exec.store.dfs.easy.EasySubScan;
+import org.apache.drill.shaded.guava.com.google.common.base.Strings;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
 import org.apache.hadoop.conf.Configuration;
 
-import java.util.List;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
-
 public class LogFormatPlugin extends EasyFormatPlugin {
+  public static final String PLUGIN_NAME = "logRegex";
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(LogFormatPlugin.class);
+
+  private static class LogReaderFactory extends FileReaderFactory {
+private final LogFormatPlugin plugin;
+private final Pattern pattern;
+private final TupleMetadata schema;
+
+public LogReaderFactory(LogFormatPlugin plugin, Pattern pattern, 
TupleMetadata schema) {
+  this.plugin = plugin;
+  this.pattern = pattern;
+  this.schema = schema;
+}
 
-  public static final String DEFAULT_NAME = "logRegex";
-  private final LogFormatConfig formatConfig;
+@Override
+public ManagedReader newReader() {
+   return new LogBatchReader(plugin.getConfig(), pattern, schema);
+}
+  }
 
   public LogFormatPlugin(String name, DrillbitContext context,
  Configuration fsConf, StoragePluginConfig 
storageConfig,
  LogFormatConfig formatConfig) {
-super(name, context, fsConf, storageConfig, formatConfig,
-true,  // readable
-false, // writable
-true, // blockSplittable
-true,  // compressible
-Lists.newArrayList(formatConfig.getExtension()),
-DEFAULT_NAME);
-this.formatConfig = formatConfig;
+super(name, easyConfig(fsConf, formatConfig), context, storageConfig, 
formatConfig);
   }
 
-  @Override
-  public RecordReader getRecordReader(FragmentContext context,
-  DrillFileSystem dfs, FileWork fileWork, 
List columns,
-  String userName) throws 
ExecutionSetupException {
-return new LogRecordReader(context, dfs, fileWork,
-columns, userName, formatConfig);
+  private static EasyFormatConfig easyConfig(Configuration fsConf, 
LogFormatConfig pluginConfig) {
+EasyFormatConfig config = new EasyFormatConfig();
+config.readable = true;
+config.writable = false;
+// Should be block splitable, but logic not yet implemented.
+config.blockSplittable = false;
+config.compressible = true;
+config.supportsProjectPushdown = true;
+config.extensions 

[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862963#comment-16862963
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r293326946
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/LogBatchReader.java
 ##
 @@ -0,0 +1,210 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.log;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import java.util.regex.PatternSyntaxException;
+
+import org.apache.drill.common.exceptions.UserException;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.rowSet.ResultSetLoader;
+import org.apache.drill.exec.physical.rowSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.mapred.FileSplit;
+
+public class LogBatchReader implements ManagedReader {
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(LogBatchReader.class);
+  public static final String RAW_LINE_COL_NAME = "_raw";
+  public static final String UNMATCHED_LINE_COL_NAME = "_unmatched_rows";
+
+  private FileSplit split;
+  private final LogFormatConfig formatConfig;
+  private final Pattern pattern;
+  private final TupleMetadata schema;
+  private BufferedReader reader;
+  private int capturingGroups;
+  private ResultSetLoader loader;
+  private ScalarWriter rawColWriter;
+  private ScalarWriter unmatchedColWriter;
+  private boolean saveMatchedRows;
+  private int maxErrors;
+  private int lineNumber;
+  private int errorCount;
+
+  public LogBatchReader(LogFormatConfig formatConfig, Pattern pattern, 
TupleMetadata schema) {
+this.formatConfig = formatConfig;
+this.maxErrors = Math.max(0, formatConfig.getMaxErrors());
+this.pattern = pattern;
+this.schema = schema;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+setupPattern();
+negotiator.setTableSchema(schema, true);
+loader = negotiator.build();
+bindColumns(loader.writer());
+openFile(negotiator);
+return true;
+  }
+
+  private void setupPattern() {
+try {
+  Matcher m = pattern.matcher("test");
+  capturingGroups = m.groupCount();
+} catch (PatternSyntaxException e) {
+  throw UserException
+  .validationError(e)
+  .message("Failed to parse regex: \"%s\"", formatConfig.getRegex())
+  .build(logger);
+}
+  }
+
+  private void bindColumns(RowSetLoader writer) {
+for (int i = 0; i < capturingGroups; i++) {
+  saveMatchedRows |= writer.scalar(i).isProjected();
+}
+rawColWriter = writer.scalar(RAW_LINE_COL_NAME);
+saveMatchedRows |= rawColWriter.isProjected();
+unmatchedColWriter = writer.scalar(UNMATCHED_LINE_COL_NAME);
+
+// If no match-case columns are projected, and the unmatched
+// columns is unprojected, then we want to count (matched)
+// rows.
+
+saveMatchedRows |= !unmatchedColWriter.isProjected();
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+InputStream in;
+try {
+  in = negotiator.fileSystem().open(split.getPath());
+} catch (Exception e) {
+  throw UserException
+  .dataReadError(e)
+  .message("Failed to open open input file: %s", split.getPath())
 
 Review comment:
   ```suggestion
 .message("Failed to open input file: %s", split.getPath())
   ```
 

This is an automated message from the Apache Git 

[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862960#comment-16862960
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#discussion_r293323284
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/log/LogFormatPlugin.java
 ##
 @@ -18,86 +18,224 @@
 
 package org.apache.drill.exec.store.log;
 
-import java.io.IOException;
-import org.apache.drill.exec.planner.common.DrillStatsTable;
-import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import java.util.regex.PatternSyntaxException;
+
 import org.apache.drill.common.exceptions.ExecutionSetupException;
-import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.exceptions.UserException;
 import org.apache.drill.common.logical.StoragePluginConfig;
-import org.apache.drill.exec.ops.FragmentContext;
-import org.apache.drill.exec.proto.UserBitShared;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileReaderFactory;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileScanBuilder;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.proto.UserBitShared.CoreOperatorType;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
 import org.apache.drill.exec.server.DrillbitContext;
-import org.apache.drill.exec.store.RecordReader;
-import org.apache.drill.exec.store.RecordWriter;
-import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.server.options.OptionManager;
 import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin;
-import org.apache.drill.exec.store.dfs.easy.EasyWriter;
-import org.apache.drill.exec.store.dfs.easy.FileWork;
+import org.apache.drill.exec.store.dfs.easy.EasySubScan;
+import org.apache.drill.shaded.guava.com.google.common.base.Strings;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
 import org.apache.hadoop.conf.Configuration;
 
-import java.util.List;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
-
 public class LogFormatPlugin extends EasyFormatPlugin {
+  public static final String PLUGIN_NAME = "logRegex";
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(LogFormatPlugin.class);
 
 Review comment:
   Same here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862618#comment-16862618
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

paul-rogers commented on pull request #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807
 
 
   Converts the log format plugin (which uses a regex for parsing) to work
   with the Extended Vector Format.
   
   This commit provides the basic conversion:
   
   * Use the plugin config object to pass config to the Easy framework.
   * Use the EVF scan mechanism in place of the legacy "ScanBatch"
   mechanism.
   * Minor code and README cleanup.
   
   This commit corresponds to the Basic Tutorial steps in the EVF tutorial.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)