[GitHub] drill issue #1014: DRILL-5771: Fix serDe errors for format plugins
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1014 @arina-ielchiieva can you resolve the merge conflict? ---
[GitHub] drill issue #914: DRILL-5657: Size-aware vector writer structure
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/914 +1. Ship it! ---
[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...
Github user chunhui-shi commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151798147 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import com.google.common.collect.ImmutableSortedSet; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +import org.apache.calcite.DataContext; +import org.apache.calcite.jdbc.CalciteRootSchema; +import org.apache.calcite.jdbc.CalciteSchema; + +import org.apache.calcite.linq4j.tree.Expression; +import org.apache.calcite.linq4j.tree.Expressions; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.calcite.schema.impl.AbstractSchema; +import org.apache.calcite.util.BuiltInMethod; +import org.apache.calcite.util.Compatible; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.exec.store.SchemaConfig; +import org.apache.drill.exec.store.StoragePlugin; +import org.apache.drill.exec.store.StoragePluginRegistry; +import org.apache.drill.exec.store.SubSchemaWrapper; + +import java.io.IOException; +import java.util.List; +import java.util.Map; +import java.util.NavigableSet; +import java.util.Set; + +/** + * This class is to allow us loading schemas from storage plugins later when {@link #getSubSchema(String, boolean)} + * is called. + */ +public class DynamicRootSchema extends DynamicSchema +implements CalciteRootSchema { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class); + /** Creates a root schema. */ + DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) { +super(null, new RootSchema(), ""); +this.schemaConfig = schemaConfig; +this.storages = storages; + } + + @Override + public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) { +CalciteSchema retSchema = getSubSchemaMap().get(schemaName); +if (retSchema != null) { + return retSchema; +} + +loadSchemaFactory(schemaName, caseSensitive); +retSchema = getSubSchemaMap().get(schemaName); +return retSchema; + } + + @Override + public NavigableSet getTableNames() { +Set pluginNames = Sets.newHashSet(); +for (Map.EntrystorageEntry : getSchemaFactories()) { + pluginNames.add(storageEntry.getKey()); +} +return Compatible.INSTANCE.navigableSet( +ImmutableSortedSet.copyOf( +Sets.union(pluginNames, getSubSchemaMap().keySet(; + } + + /** + * load schema factory(storage plugin) for schemaName + * @param schemaName + * @param caseSensitive + */ + public void loadSchemaFactory(String schemaName, boolean caseSensitive) { +try { + SchemaPlus thisPlus = this.plus(); + StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName); + if (plugin != null) { +plugin.registerSchemas(schemaConfig, thisPlus); +return; + } + + // we could not find the plugin, the schemaName could be `dfs.tmp`, a 2nd level schema under 'dfs' + String[] paths = schemaName.split("\\."); + if (paths.length == 2) { +plugin = getSchemaFactories().getPlugin(paths[0]); +if (plugin == null) { + return; +} + +// we could find storage plugin for first part(e.g. 'dfs') of schemaName (e.g. 'dfs.tmp') +// register schema for this storage plugin to 'this'. +plugin.registerSchemas(schemaConfig, thisPlus); + +// we load second level schemas for this storage plugin +final SchemaPlus firstlevelSchema = thisPlus.getSubSchema(paths[0]);
[GitHub] drill issue #914: DRILL-5657: Size-aware vector writer structure
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/914 I was hoping that we would just get better encapsulation without losing performance. This performance boost is rather serendipitous. One possible explanation might be in the JVM optimizing away code paths in the benchmark itself. This means that we might not see the same performance gains in real life. @bitblender knows more about this than I do. ---
[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...
Github user chunhui-shi commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151793647 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java --- @@ -373,12 +402,12 @@ public String toString() { public class WorkspaceSchema extends AbstractSchema implements ExpandingConcurrentMap.MapValueFactory{ private final ExpandingConcurrentMap tables = new ExpandingConcurrentMap<>(this); private final SchemaConfig schemaConfig; -private final DrillFileSystem fs; +private DrillFileSystem fs; -public WorkspaceSchema(List parentSchemaPath, String wsName, SchemaConfig schemaConfig) throws IOException { +public WorkspaceSchema(List parentSchemaPath, String wsName, SchemaConfig schemaConfig, DrillFileSystem fs) throws IOException { super(parentSchemaPath, wsName); this.schemaConfig = schemaConfig; - this.fs = ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), fsConf); + this.fs = fs; --- End diff -- Now we pass in fs instead creating from inside of WorkspaceSchema. ---
JSON reader enhancement
Hi All, I’d like to propose a minor enhancement to the JSON reader to better handle non-relational JSON structures. (See DRILL-5974 [1].) As background, Drill handles simple tuples: {a: 10, b: “fred”} Drill also handles arrays: {name: “fred”, hobbies: [“bowling”, “golf”]} Drill even handles arrays of tuples: {name: “fred”, orders: [ {id: 1001, amount: 12.34}, {id: 1002, amount: 56.78}]} The above are termed "relational" because there is a straightforward mapping to/from tables into the above JSON structures. Things get interesting with non-relational types, such as 2-D arrays: {id: 4, shape: “square”, points: [[0, 0], [0, 5], [5, 0], [5, 5]]} Drill has two solutions: * Turn on the experimental list and union support. * Enable all-text mode to read all fields as JSON text. Here, I’d like to propose a middle ground: * Read fields with relational types into vectors. * Read non-relational fields using text mode. Thus, the first three examples would all result in the JSON data parsed into Drill vectors. But, the fourth, non-relational example would produce a row that looks like this: id, shape, points 4, “shape”, “[[0, 0], [0, 5], [5, 0], [5, 5]]” Although Drill can’t parse the 2-D array, Drill will pass the array along to the client, which can use its favorite JSON parser to parse the array and do something useful (like draw the square in this case.) In particular, the proposal: * Apply this change only to the revised “batch size aware” JSON reader. * Use the above parsing model by default. * Use the experimental list-and-union support if the existing `exec.enable_union_type` system/session option is set. Existing queries should “just work.” In fact, now JSON with non-relational types will work “out-of-the-box” without all-text mode or the experimental types. Thoughts? - Paul [1] https://issues.apache.org/jira/browse/DRILL-5974
[jira] [Created] (DRILL-5974) Read JSON non-relational fields using text mode
Paul Rogers created DRILL-5974: -- Summary: Read JSON non-relational fields using text mode Key: DRILL-5974 URL: https://issues.apache.org/jira/browse/DRILL-5974 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.13.0 Reporter: Paul Rogers Assignee: Paul Rogers Fix For: 1.13.0 Proposed is a minor enhancement to the JSON reader to better handle non-relational JSON structures. As background, Drill handles simple tuples: {code} {a: 10, b: “fred”} {code} Drill also handles arrays: {code} {name: “fred”, hobbies: [“bowling”, “golf”]} {code} Drill even handles arrays of tuples: {code} {name: “fred”, orders: [ {id: 1001, amount: 12.34}, {id: 1002, amount: 56.78}]} {code} The above are termed "relational" because there is a straightforward mapping to/from tables into the above JSON structures. Things get interesting with non-relational types, such as 2-D arrays: {code} {id: 4, shape: “square”, points: [[0, 0], [0, 5], [5, 0], [5, 5]]} {code} Drill has two solutions: * Turn on the experimental list and union support. * Enable all-text mode to read all fields as JSON text. Proposed is a middle ground: * Read fields with relational types into vectors. * Read non-relational fields using text mode. Thus, the first three examples would all result in the JSON data parsed into Drill vectors. But, the fourth, non-relational example would produce a row that looks like this: {noformat} id, shape, points 4, “shape”, “[[0, 0], [0, 5], [5, 0], [5, 5]]” {noformat} Although Drill can’t parse the 2-D array, Drill will pass the array along to the client, which can use its favorite JSON parser to parse the array and do something useful (like draw the square in this case.) Specifically, the proposal is to: * Apply this change only to the revised “batch size aware” JSON reader. * Use the above parsing model by default. * Use the experimental list-and-union support if the existing {{exec.enable_union_type}} system/session option is set. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] drill issue #258: DRILL-4091: Support for additional gis operations in gis c...
Github user cgivre commented on the issue: https://github.com/apache/drill/pull/258 I'm getting some unit test failures when I build this. [org.apache.drill.exec.expr.fn.impl.gis.TestGeometryFunctions.txt](https://github.com/apache/drill/files/1483754/org.apache.drill.exec.expr.fn.impl.gis.TestGeometryFunctions.txt) ---
[GitHub] drill issue #914: DRILL-5657: Size-aware vector writer structure
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/914 There has been discussion recently on Drill's goal of integrating with Arrow. The work to use the `Drillbuf` highlights how we can integrate with Arrow. Simply replace the `Drillbuf` usage with `ArrowBuf`, make the required changes in vector names, add a few methods to the Arrow API, and this entire mechanism can be easily ported over to use Arrow. ---
[GitHub] drill pull request #914: DRILL-5657: Size-aware vector writer structure
Github user bitblender commented on a diff in the pull request: https://github.com/apache/drill/pull/914#discussion_r140644571 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/TupleState.java --- @@ -0,0 +1,353 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.rowSet.impl; + +import java.util.ArrayList; +import java.util.List; + +import org.apache.drill.exec.expr.TypeHelper; +import org.apache.drill.exec.physical.rowSet.impl.ColumnState.BaseMapColumnState; +import org.apache.drill.exec.physical.rowSet.impl.ColumnState.MapArrayColumnState; +import org.apache.drill.exec.physical.rowSet.impl.ColumnState.MapColumnState; +import org.apache.drill.exec.record.ColumnMetadata; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.record.TupleMetadata; +import org.apache.drill.exec.record.TupleSchema; +import org.apache.drill.exec.record.TupleSchema.AbstractColumnMetadata; +import org.apache.drill.exec.vector.ValueVector; +import org.apache.drill.exec.vector.accessor.ObjectType; +import org.apache.drill.exec.vector.accessor.ObjectWriter; +import org.apache.drill.exec.vector.accessor.TupleWriter; +import org.apache.drill.exec.vector.accessor.TupleWriter.TupleWriterListener; +import org.apache.drill.exec.vector.accessor.impl.HierarchicalFormatter; +import org.apache.drill.exec.vector.accessor.writer.AbstractObjectWriter; +import org.apache.drill.exec.vector.accessor.writer.AbstractTupleWriter; +import org.apache.drill.exec.vector.accessor.writer.ColumnWriterFactory; +import org.apache.drill.exec.vector.complex.AbstractMapVector; + +public abstract class TupleState implements TupleWriterListener { + + public static class RowState extends TupleState { + +/** + * The row-level writer for stepping through rows as they are written, + * and for accessing top-level columns. + */ + +private final RowSetLoaderImpl writer; + +public RowState(ResultSetLoaderImpl rsLoader) { + super(rsLoader, rsLoader.projectionSet); + writer = new RowSetLoaderImpl(rsLoader, schema); + writer.bindListener(this); +} + +public RowSetLoaderImpl rootWriter() { return writer; } + +@Override +public AbstractTupleWriter writer() { return writer; } + +@Override +public int innerCardinality() { return resultSetLoader.targetRowCount();} + } + + public static class MapState extends TupleState { + +protected final AbstractMapVector mapVector; +protected final BaseMapColumnState mapColumnState; +protected int outerCardinality; + +public MapState(ResultSetLoaderImpl rsLoader, +BaseMapColumnState mapColumnState, +AbstractMapVector mapVector, +ProjectionSet projectionSet) { + super(rsLoader, projectionSet); + this.mapVector = mapVector; + this.mapColumnState = mapColumnState; + mapColumnState.writer().bindListener(this); +} + +@Override +protected void columnAdded(ColumnState colState) { + @SuppressWarnings("resource") + ValueVector vector = colState.vector(); + + // Can't materialize the child if the map itself is + // not materialized. + + assert mapVector != null || vector == null; + if (vector != null) { +mapVector.putChild(vector.getField().getName(), vector); + } +} + +@Override +public AbstractTupleWriter writer() { + AbstractObjectWriter objWriter = mapColumnState.writer(); + TupleWriter tupleWriter; + if (objWriter.type() == ObjectType.ARRAY) { +tupleWriter = objWriter.array().tuple(); + } else { +tupleWriter = objWriter.tuple(); + } + return (AbstractTupleWriter) tupleWriter; +} + +@Override
[GitHub] drill pull request #914: DRILL-5657: Size-aware vector writer structure
Github user bitblender commented on a diff in the pull request: https://github.com/apache/drill/pull/914#discussion_r151762298 --- Diff: exec/vector/src/main/codegen/templates/ColumnAccessors.java --- @@ -275,17 +273,17 @@ public boolean isNull() { final int offset = writeIndex(len); <#else> final int writeIndex = writeIndex(); - <#assign putAddr = "bufAddr + writeIndex * VALUE_WIDTH"> + <#assign putAddr = "writeIndex * VALUE_WIDTH"> --- End diff -- putAddr is not an address but an offset, right ? Should be renamed. ---
[GitHub] drill pull request #914: DRILL-5657: Size-aware vector writer structure
Github user bitblender commented on a diff in the pull request: https://github.com/apache/drill/pull/914#discussion_r151762286 --- Diff: exec/memory/base/src/main/java/io/netty/buffer/DrillBuf.java --- @@ -882,4 +882,71 @@ public void print(StringBuilder sb, int indent, Verbosity verbosity) { } } + // The "unsafe" methods are for use ONLY by code that does its own + // bounds checking. They are called "unsafe" for a reason: they will crash + // the JVM if values are addressed out of bounds. + + /** + * Write an integer to the buffer at the given byte index, without + * bounds checks. + * + * @param index byte (not int) index of the location to write + * @param value the value to write + */ + + public void unsafePutInt(int index, int value) { --- End diff -- The first argument in these unsafePutXXX methods is an offset right? Should the 'index' be changed to an 'offset' ? ---
[GitHub] drill issue #335: DRILL-4303: ESRI Shapefile (shp) format plugin
Github user cgivre commented on the issue: https://github.com/apache/drill/pull/335 HI @k255 Are you still interested in this? I think if we're going to get the GIS functions into Drill, we really should get this in as well and I'm happy to help. For whatever reason, I didn't see this until this morning. Anyway, if you'd be willing to rebase this, I can help shepherd it through the review process. -- C ---
[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151713734 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/MockBreakageStorage.java --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.mock; + +import org.apache.calcite.schema.SchemaPlus; +import org.apache.drill.exec.server.DrillbitContext; +import org.apache.drill.exec.store.SchemaConfig; + +import java.io.IOException; + +public class MockBreakageStorage extends MockStorageEngine { + + boolean breakRegister; --- End diff -- private ---
[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151713266 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestSchema.java --- @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl; + +import org.apache.drill.test.ClientFixture; +import org.apache.drill.test.ClusterFixture; +import org.apache.drill.test.ClusterMockStorageFixture; +import org.apache.drill.test.DrillTest; + +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.Test; + +import static org.junit.Assert.assertTrue; + +public class TestSchema extends DrillTest { + + private static ClusterMockStorageFixture cluster; + private static ClientFixture client; + + @BeforeClass + public static void setup() throws Exception { +cluster = ClusterFixture.builder().buildCustomMockStorage(); +boolean breakRegisterSchema = true; +cluster.insertMockStorage("mock_broken", breakRegisterSchema); +cluster.insertMockStorage("mock_good", !breakRegisterSchema); +client = cluster.clientFixture(); + } + + @Test + public void testQueryBrokenStorage() { +String sql = "SELECT id_i, name_s10 FROM `mock_broken`.`employees_5`"; +try { + client.queryBuilder().sql(sql).printCsv(); +} catch (Exception ex) { + assertTrue(ex.getMessage().contains("VALIDATION ERROR: Schema")); +} + } + + @Test + public void testQueryGoodStorage() { +String sql = "SELECT id_i, name_s10 FROM `mock_good`.`employees_5`"; +client.queryBuilder().sql(sql).printCsv(); + } + + @Test + public void testQueryGoodStorageWithDefaultSchema() throws Exception { +String use_dfs = "use dfs.tmp"; +client.queryBuilder().sql(use_dfs).run(); +String sql = "SELECT id_i, name_s10 FROM `mock_good`.`employees_5`"; +client.queryBuilder().sql(sql).printCsv(); --- End diff -- Do we actually want to print csv here? I suggest we produce no output here. ---
[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151732407 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import com.google.common.collect.ImmutableSortedSet; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +import org.apache.calcite.DataContext; +import org.apache.calcite.jdbc.CalciteRootSchema; +import org.apache.calcite.jdbc.CalciteSchema; + +import org.apache.calcite.linq4j.tree.Expression; +import org.apache.calcite.linq4j.tree.Expressions; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.calcite.schema.impl.AbstractSchema; +import org.apache.calcite.util.BuiltInMethod; +import org.apache.calcite.util.Compatible; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.exec.store.SchemaConfig; +import org.apache.drill.exec.store.StoragePlugin; +import org.apache.drill.exec.store.StoragePluginRegistry; +import org.apache.drill.exec.store.SubSchemaWrapper; + +import java.io.IOException; +import java.util.List; +import java.util.Map; +import java.util.NavigableSet; +import java.util.Set; + +/** + * This class is to allow us loading schemas from storage plugins later when {@link #getSubSchema(String, boolean)} + * is called. + */ +public class DynamicRootSchema extends DynamicSchema +implements CalciteRootSchema { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class); + /** Creates a root schema. */ + DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) { +super(null, new RootSchema(), ""); +this.schemaConfig = schemaConfig; +this.storages = storages; + } + + @Override + public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) { +CalciteSchema retSchema = getSubSchemaMap().get(schemaName); +if (retSchema != null) { + return retSchema; +} + +loadSchemaFactory(schemaName, caseSensitive); +retSchema = getSubSchemaMap().get(schemaName); +return retSchema; + } + + @Override + public NavigableSet getTableNames() { +Set pluginNames = Sets.newHashSet(); +for (Map.EntrystorageEntry : getSchemaFactories()) { + pluginNames.add(storageEntry.getKey()); +} +return Compatible.INSTANCE.navigableSet( +ImmutableSortedSet.copyOf( +Sets.union(pluginNames, getSubSchemaMap().keySet(; + } + + /** + * load schema factory(storage plugin) for schemaName + * @param schemaName + * @param caseSensitive + */ + public void loadSchemaFactory(String schemaName, boolean caseSensitive) { +try { + SchemaPlus thisPlus = this.plus(); + StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName); + if (plugin != null) { +plugin.registerSchemas(schemaConfig, thisPlus); +return; + } + + // we could not find the plugin, the schemaName could be `dfs.tmp`, a 2nd level schema under 'dfs' + String[] paths = schemaName.split("\\."); + if (paths.length == 2) { +plugin = getSchemaFactories().getPlugin(paths[0]); +if (plugin == null) { + return; +} + +// we could find storage plugin for first part(e.g. 'dfs') of schemaName (e.g. 'dfs.tmp') +// register schema for this storage plugin to 'this'. +plugin.registerSchemas(schemaConfig, thisPlus); + +// we load second level schemas for this storage plugin +final SchemaPlus firstlevelSchema =
[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151732963 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java --- @@ -373,12 +402,12 @@ public String toString() { public class WorkspaceSchema extends AbstractSchema implements ExpandingConcurrentMap.MapValueFactory{ private final ExpandingConcurrentMap tables = new ExpandingConcurrentMap<>(this); private final SchemaConfig schemaConfig; -private final DrillFileSystem fs; +private DrillFileSystem fs; -public WorkspaceSchema(List parentSchemaPath, String wsName, SchemaConfig schemaConfig) throws IOException { +public WorkspaceSchema(List parentSchemaPath, String wsName, SchemaConfig schemaConfig, DrillFileSystem fs) throws IOException { super(parentSchemaPath, wsName); this.schemaConfig = schemaConfig; - this.fs = ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), fsConf); + this.fs = fs; --- End diff -- Why don't we anymore need to create fs using `ImpersonationUtil` but needed before? ---
[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151713624 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestSchema.java --- @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl; + +import org.apache.drill.test.ClientFixture; +import org.apache.drill.test.ClusterFixture; +import org.apache.drill.test.ClusterMockStorageFixture; +import org.apache.drill.test.DrillTest; + +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.Test; + +import static org.junit.Assert.assertTrue; + +public class TestSchema extends DrillTest { + + private static ClusterMockStorageFixture cluster; + private static ClientFixture client; + + @BeforeClass + public static void setup() throws Exception { +cluster = ClusterFixture.builder().buildCustomMockStorage(); +boolean breakRegisterSchema = true; +cluster.insertMockStorage("mock_broken", breakRegisterSchema); +cluster.insertMockStorage("mock_good", !breakRegisterSchema); +client = cluster.clientFixture(); + } + + @Test + public void testQueryBrokenStorage() { +String sql = "SELECT id_i, name_s10 FROM `mock_broken`.`employees_5`"; +try { + client.queryBuilder().sql(sql).printCsv(); +} catch (Exception ex) { + assertTrue(ex.getMessage().contains("VALIDATION ERROR: Schema")); +} + } + + @Test + public void testQueryGoodStorage() { +String sql = "SELECT id_i, name_s10 FROM `mock_good`.`employees_5`"; +client.queryBuilder().sql(sql).printCsv(); + } + + @Test + public void testQueryGoodStorageWithDefaultSchema() throws Exception { +String use_dfs = "use dfs.tmp"; +client.queryBuilder().sql(use_dfs).run(); +String sql = "SELECT id_i, name_s10 FROM `mock_good`.`employees_5`"; +client.queryBuilder().sql(sql).printCsv(); + } + + @Test + public void testUseBrokenStorage() throws Exception { +try { + String use_dfs = "use mock_broken"; + client.queryBuilder().sql(use_dfs).run(); +} catch(Exception ex) { + assertTrue(ex.getMessage().contains("VALIDATION ERROR: Schema")); +} + } + + @AfterClass --- End diff -- Can be removed. ---
[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151716880 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import com.google.common.collect.ImmutableSortedSet; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +import org.apache.calcite.DataContext; +import org.apache.calcite.jdbc.CalciteRootSchema; +import org.apache.calcite.jdbc.CalciteSchema; + +import org.apache.calcite.linq4j.tree.Expression; +import org.apache.calcite.linq4j.tree.Expressions; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.calcite.schema.impl.AbstractSchema; +import org.apache.calcite.util.BuiltInMethod; +import org.apache.calcite.util.Compatible; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.exec.store.SchemaConfig; +import org.apache.drill.exec.store.StoragePlugin; +import org.apache.drill.exec.store.StoragePluginRegistry; +import org.apache.drill.exec.store.SubSchemaWrapper; + +import java.io.IOException; +import java.util.List; +import java.util.Map; +import java.util.NavigableSet; +import java.util.Set; + +/** + * This class is to allow us loading schemas from storage plugins later when {@link #getSubSchema(String, boolean)} + * is called. + */ +public class DynamicRootSchema extends DynamicSchema +implements CalciteRootSchema { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class); + /** Creates a root schema. */ + DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) { +super(null, new RootSchema(), ""); +this.schemaConfig = schemaConfig; +this.storages = storages; + } + + @Override + public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) { +CalciteSchema retSchema = getSubSchemaMap().get(schemaName); +if (retSchema != null) { + return retSchema; +} + +loadSchemaFactory(schemaName, caseSensitive); +retSchema = getSubSchemaMap().get(schemaName); +return retSchema; + } + + @Override + public NavigableSet getTableNames() { +Set pluginNames = Sets.newHashSet(); +for (Map.EntrystorageEntry : getSchemaFactories()) { + pluginNames.add(storageEntry.getKey()); +} +return Compatible.INSTANCE.navigableSet( +ImmutableSortedSet.copyOf( +Sets.union(pluginNames, getSubSchemaMap().keySet(; --- End diff -- Could you please explain what this method actually returns? According by its name it should return table names but it seems it returns different things... ---
[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151714217 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java --- @@ -150,14 +152,30 @@ public WorkspaceSchemaFactory( * @return True if the user has access. False otherwise. */ public boolean accessible(final String userName) throws IOException { -final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf); +final DrillFileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf); +return accessible(fs); + } + + /** + * Checks whether a FileSystem object has the permission to list/read workspace directory + * @param fs a DrillFileSystem object that was created with certain user privilege + * @return True if the user has access. False otherwise. + * @throws IOException + */ + public boolean accessible(DrillFileSystem fs) throws IOException { try { - // We have to rely on the listStatus as a FileSystem can have complicated controls such as regular unix style - // permissions, Access Control Lists (ACLs) or Access Control Expressions (ACE). Hadoop 2.7 version of FileSystem - // has a limited private API (FileSystem.access) to check the permissions directly - // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill currently relies on Hadoop 2.5.0 version of - // FileClient. TODO: Update this when DRILL-3749 is fixed. - fs.listStatus(wsPath); + /** + * For Windows local file system, fs.access ends up using DeprecatedRawLocalFileStatus which has + * TrustedInstaller as owner, and a member of Administrators group could not satisfy the permission. + * In this case, we will still use method listStatus. + * In other cases, we use access method since it is cheaper. + */ + if (SystemUtils.IS_OS_WINDOWS && fs.getUri().getScheme().equalsIgnoreCase(FileSystemSchemaFactory.LOCAL_FS_SCHEME)) { --- End diff -- Just in case, did you check that everything works on Windows? ---
[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151717700 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import com.google.common.collect.ImmutableSortedSet; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +import org.apache.calcite.DataContext; +import org.apache.calcite.jdbc.CalciteRootSchema; +import org.apache.calcite.jdbc.CalciteSchema; + +import org.apache.calcite.linq4j.tree.Expression; +import org.apache.calcite.linq4j.tree.Expressions; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.calcite.schema.impl.AbstractSchema; +import org.apache.calcite.util.BuiltInMethod; +import org.apache.calcite.util.Compatible; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.exec.store.SchemaConfig; +import org.apache.drill.exec.store.StoragePlugin; +import org.apache.drill.exec.store.StoragePluginRegistry; +import org.apache.drill.exec.store.SubSchemaWrapper; + +import java.io.IOException; +import java.util.List; +import java.util.Map; +import java.util.NavigableSet; +import java.util.Set; + +/** + * This class is to allow us loading schemas from storage plugins later when {@link #getSubSchema(String, boolean)} + * is called. + */ +public class DynamicRootSchema extends DynamicSchema +implements CalciteRootSchema { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class); + /** Creates a root schema. */ + DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) { +super(null, new RootSchema(), ""); +this.schemaConfig = schemaConfig; +this.storages = storages; + } + + @Override + public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) { +CalciteSchema retSchema = getSubSchemaMap().get(schemaName); +if (retSchema != null) { + return retSchema; +} + +loadSchemaFactory(schemaName, caseSensitive); +retSchema = getSubSchemaMap().get(schemaName); +return retSchema; + } + + @Override + public NavigableSet getTableNames() { +Set pluginNames = Sets.newHashSet(); +for (Map.EntrystorageEntry : getSchemaFactories()) { + pluginNames.add(storageEntry.getKey()); +} +return Compatible.INSTANCE.navigableSet( +ImmutableSortedSet.copyOf( +Sets.union(pluginNames, getSubSchemaMap().keySet(; + } + + /** + * load schema factory(storage plugin) for schemaName + * @param schemaName + * @param caseSensitive + */ + public void loadSchemaFactory(String schemaName, boolean caseSensitive) { +try { + SchemaPlus thisPlus = this.plus(); + StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName); + if (plugin != null) { +plugin.registerSchemas(schemaConfig, thisPlus); +return; + } + + // we could not find the plugin, the schemaName could be `dfs.tmp`, a 2nd level schema under 'dfs' + String[] paths = schemaName.split("\\."); + if (paths.length == 2) { +plugin = getSchemaFactories().getPlugin(paths[0]); +if (plugin == null) { + return; +} + +// we could find storage plugin for first part(e.g. 'dfs') of schemaName (e.g. 'dfs.tmp') +// register schema for this storage plugin to 'this'. +plugin.registerSchemas(schemaConfig, thisPlus); --- End diff -- Can we get NPE here? Let's say that after split on line 97 we got length as 1 or 3? ---
[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151714418 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java --- @@ -50,11 +53,23 @@ public static final String DEFAULT_WS_NAME = "default"; + public static final String LOCAL_FS_SCHEME = "file"; + private List factories; private String schemaName; + protected FileSystemPlugin plugin; public FileSystemSchemaFactory(String schemaName, List factories) { -super(); +// when the correspondent FileSystemPlugin is not passed in, we dig into ANY workspace factory to get it. +if (factories.size() > 0 ) { --- End diff -- Please remove space `if (factories.size() > 0) {`. ---
[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1032#discussion_r151713558 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestSchema.java --- @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl; + +import org.apache.drill.test.ClientFixture; +import org.apache.drill.test.ClusterFixture; +import org.apache.drill.test.ClusterMockStorageFixture; +import org.apache.drill.test.DrillTest; + +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.Test; + +import static org.junit.Assert.assertTrue; + +public class TestSchema extends DrillTest { + + private static ClusterMockStorageFixture cluster; + private static ClientFixture client; + + @BeforeClass + public static void setup() throws Exception { +cluster = ClusterFixture.builder().buildCustomMockStorage(); +boolean breakRegisterSchema = true; +cluster.insertMockStorage("mock_broken", breakRegisterSchema); +cluster.insertMockStorage("mock_good", !breakRegisterSchema); +client = cluster.clientFixture(); + } + + @Test + public void testQueryBrokenStorage() { +String sql = "SELECT id_i, name_s10 FROM `mock_broken`.`employees_5`"; +try { + client.queryBuilder().sql(sql).printCsv(); +} catch (Exception ex) { + assertTrue(ex.getMessage().contains("VALIDATION ERROR: Schema")); --- End diff -- This test can give false positive result when exception won't be thrown at all. Please re-throw the exception after the check and add `@Test(expected = Exception.class)`. ---
[GitHub] drill issue #1031: DRILL-5917: Ban org.json:json library in Drill
Github user vrozov commented on the issue: https://github.com/apache/drill/pull/1031 I already tested mapr and default profile locally. Once test pass on your cluster, let me know and I'll squash commits. ---
[GitHub] drill issue #1031: DRILL-5917: Ban org.json:json library in Drill
Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1031 Test cluster is down for the weekend. You can test it locally, just run {{mvn clean install -Pmapr}}. ---
[GitHub] drill issue #1031: DRILL-5917: Ban org.json:json library in Drill
Github user vrozov commented on the issue: https://github.com/apache/drill/pull/1031 @arina-ielchiieva Please test ---
[GitHub] drill pull request #1034: DRILL-5960: Adding asGeoJSON function
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1034#discussion_r151713040 --- Diff: contrib/gis/src/main/java/org/apache/drill/exec/expr/fn/impl/gis/STAsGeoJson.java --- @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * Wrapper for ESRI ST_AsGeoJson function to convert geometry to valid geojson + */ +package org.apache.drill.exec.expr.fn.impl.gis; + +import javax.inject.Inject; + +import org.apache.drill.exec.expr.DrillSimpleFunc; +import org.apache.drill.exec.expr.annotations.FunctionTemplate; +import org.apache.drill.exec.expr.annotations.Output; +import org.apache.drill.exec.expr.annotations.Param; +import org.apache.drill.exec.expr.holders.VarBinaryHolder; +import org.apache.drill.exec.expr.holders.VarCharHolder; + +import io.netty.buffer.DrillBuf; + +@FunctionTemplate(name = "st_as_geo_json", scope = FunctionTemplate.FunctionScope.SIMPLE, --- End diff -- @ChrisSandison since you have changed function names here and did not not in unit tests below, they will fail. Please note, functions are cached when Drill is built so to make sure your changes took affect you need to build drill first and then run unit tests). It seems I saw comment that suggested to keep function name as with compliance with previously created functions like (`st_astext`) which at some point makes sense. It looked in Calcite and they have geographical functions with the same naming convention as well, so I guess we should revert the previous name. ---