[jira] [Commented] (DRILL-3522) IllegalStateException from Mongo storage plugin
[ https://issues.apache.org/jira/browse/DRILL-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253059#comment-15253059 ] Steven Phillips commented on DRILL-3522: Just merged. commit id: a07f4de > IllegalStateException from Mongo storage plugin > --- > > Key: DRILL-3522 > URL: https://issues.apache.org/jira/browse/DRILL-3522 > Project: Apache Drill > Issue Type: Bug > Components: Storage - MongoDB >Affects Versions: 1.1.0 >Reporter: Adam Gilmore >Assignee: Adam Gilmore >Priority: Critical > Attachments: DRILL-3522.1.patch.txt > > > With a Mongo storage plugin enabled, we are sporadically getting the > following exception when running queries (even not against the Mongo storage > plugin): > {code} > SYSTEM ERROR: IllegalStateException: state should be: open > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception > during fragment initialization: > org.apache.drill.common.exceptions.DrillRuntimeException: state should be: > open > org.apache.drill.exec.work.foreman.Foreman.run():253 > java.util.concurrent.ThreadPoolExecutor.runWorker():1145 > java.util.concurrent.ThreadPoolExecutor$Worker.run():615 > java.lang.Thread.run():745 > Caused By (com.google.common.util.concurrent.UncheckedExecutionException) > org.apache.drill.common.exceptions.DrillRuntimeException: state should be: > open > com.google.common.cache.LocalCache$Segment.get():2263 > com.google.common.cache.LocalCache.get():4000 > com.google.common.cache.LocalCache.getOrLoad():4004 > com.google.common.cache.LocalCache$LocalLoadingCache.get():4874 > > org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$MongoSchema.getSubSchemaNames():172 > > org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$MongoSchema.setHolder():159 > > org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory.registerSchemas():127 > org.apache.drill.exec.store.mongo.MongoStoragePlugin.registerSchemas():86 > > org.apache.drill.exec.store.StoragePluginRegistry$DrillSchemaFactory.registerSchemas():328 > org.apache.drill.exec.ops.QueryContext.getRootSchema():165 > org.apache.drill.exec.ops.QueryContext.getRootSchema():154 > org.apache.drill.exec.ops.QueryContext.getRootSchema():142 > org.apache.drill.exec.ops.QueryContext.getNewDefaultSchema():128 > org.apache.drill.exec.planner.sql.DrillSqlWorker.():91 > org.apache.drill.exec.work.foreman.Foreman.runSQL():901 > org.apache.drill.exec.work.foreman.Foreman.run():242 > java.util.concurrent.ThreadPoolExecutor.runWorker():1145 > java.util.concurrent.ThreadPoolExecutor$Worker.run():615 > java.lang.Thread.run():745 > Caused By (org.apache.drill.common.exceptions.DrillRuntimeException) state > should be: open > > org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$DatabaseLoader.load():98 > > org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$DatabaseLoader.load():82 > com.google.common.cache.LocalCache$LoadingValueReference.loadFuture():3599 > com.google.common.cache.LocalCache$Segment.loadSync():2379 > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad():2342 > com.google.common.cache.LocalCache$Segment.get():2257 > com.google.common.cache.LocalCache.get():4000 > com.google.common.cache.LocalCache.getOrLoad():4004 > com.google.common.cache.LocalCache$LocalLoadingCache.get():4874 > > org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$MongoSchema.getSubSchemaNames():172 > > org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$MongoSchema.setHolder():159 > > org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory.registerSchemas():127 > org.apache.drill.exec.store.mongo.MongoStoragePlugin.registerSchemas():86 > > org.apache.drill.exec.store.StoragePluginRegistry$DrillSchemaFactory.registerSchemas():328 > org.apache.drill.exec.ops.QueryContext.getRootSchema():165 > org.apache.drill.exec.ops.QueryContext.getRootSchema():154 > org.apache.drill.exec.ops.QueryContext.getRootSchema():142 > org.apache.drill.exec.ops.QueryContext.getNewDefaultSchema():128 > org.apache.drill.exec.planner.sql.DrillSqlWorker.():91 > org.apache.drill.exec.work.foreman.Foreman.runSQL():901 > org.apache.drill.exec.work.foreman.Foreman.run():242 > java.util.concurrent.ThreadPoolExecutor.runWorker():1145 > java.util.concurrent.ThreadPoolExecutor$Worker.run():615 > java.lang.Thread.run():745 > Caused By (java.lang.IllegalStateException) state should be: open > com.mongodb.assertions.Assertions.isTrue():70 > com.mongodb.connection.BaseCluster.selectServer():79 > com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.():75 >
[jira] [Commented] (DRILL-4615) Support directory names in schema
[ https://issues.apache.org/jira/browse/DRILL-4615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248163#comment-15248163 ] Steven Phillips commented on DRILL-4615: It seems what you are describing is an alternative way of interpreting directory attributes. Drill's current approach is to create the columns dir0, dir1, etc, which contain the string value of the directory names. These column names and values are currently used in two different places in drill. The first is for partition pruning during the planning stage, and then in the columns are materialized during the actual execution of the scan. You can see examples of these uses in the classes: FileSystemPartitionDescriptor, and ParquetScanBatchCreator. We should probably refactor and make abstract the code which materializes the partition column names and values into some sort of Attribute Provider, and then we could implement an alternate version which interprets the directories the way Spark and Hive do. If this is something you are interested in working on, I can help out. > Support directory names in schema > - > > Key: DRILL-4615 > URL: https://issues.apache.org/jira/browse/DRILL-4615 > Project: Apache Drill > Issue Type: Improvement >Reporter: Jesse Yates > > In Spark, partitioned parquet output is written with directories like: > {code} > /column1=1 > /column2=hello > /data.parquet > /column2=world > /moredata.parquet > /column1=2 > {code} > However, when querying these files with Drill we end up interpreting the > directories as strings when what they really are is column names + values. In > the data files we only have the remaining columns. Querying this with drill > means that you can really only have a couple of data types (far short of what > spark/parquet supports) in the column and still have correct operations. > Given the size of the data, I don't want to have to CTAS all the parquet > files (especially as they are being periodically updated). > I think this ends up being a nice addition for general file directory reads > as well since many people already encode meaning into their directory > structure, but having self describing directories is even better. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4558) When a query returns diacritics in a string, the string is cut
[ https://issues.apache.org/jira/browse/DRILL-4558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15218975#comment-15218975 ] Steven Phillips commented on DRILL-4558: This looks like a problem in the BsonRecordReader: {code} private void writeString(String readString, final MapOrListWriterImpl writer, String fieldName, boolean isList) { final int length = readString.length(); final VarCharHolder vh = new VarCharHolder(); ensure(length); try { workBuf.setBytes(0, readString.getBytes("UTF-8")); } catch (UnsupportedEncodingException e) { throw new DrillRuntimeException("Unable to read string value for field: " + fieldName, e); } vh.buffer = workBuf; vh.start = 0; vh.end = length; if (isList == false) { writer.varChar(fieldName).write(vh); } else { writer.list.varChar().write(vh); } } {code} the length variable should be the length of the byte array, not the length of the String. A quick work-around would be to disable the bson reader: set store.mongo.bson.record.reader = false; > When a query returns diacritics in a string, the string is cut > -- > > Key: DRILL-4558 > URL: https://issues.apache.org/jira/browse/DRILL-4558 > Project: Apache Drill > Issue Type: Bug > Components: Storage - MongoDB > Environment: Apache Drill 1.6 > MongoDB 3.2.1 >Reporter: Vincent Uribe > > With the given document in a collection "Test" from a database testDb : > { > "_id" : ObjectId("56e7f1bd0944228aab06d0e2"), > "ID_ATTRIBUT" : "3", > "VAL_ATTRIBUT" : "Végétaux", > "UPDATED" : ISODate("2016-01-09T23:00:00.000Z") > } > When querying select * from mongoStorage.testDb.Test I get > _id: [B@affb65 > ID_ATTRIBUT: 3 > VAL_ATTRIBUT: *Végéta* > UPDATED: 2016-01-09T23:00:00.000Z > As you can see, the two 'é' cut the string "végétaux" by 2 characters, giving > végéta. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4566) Add TDigest functions for computing median and quantile
Steven Phillips created DRILL-4566: -- Summary: Add TDigest functions for computing median and quantile Key: DRILL-4566 URL: https://issues.apache.org/jira/browse/DRILL-4566 Project: Apache Drill Issue Type: New Feature Reporter: Steven Phillips Assignee: Steven Phillips The tdigest library can be used by Drill to compute approximate value and percentiles with using too much memory or spilling to disk, which would be required to compute exactly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4562) NPE when evaluating expression on nested union type
Steven Phillips created DRILL-4562: -- Summary: NPE when evaluating expression on nested union type Key: DRILL-4562 URL: https://issues.apache.org/jira/browse/DRILL-4562 Project: Apache Drill Issue Type: Bug Reporter: Steven Phillips Assignee: Steven Phillips A simple reproduction: {code} select typeof(t.a.b) c from `f.json` t {code} where f.json contains: {code} {a : { b : 1 }} {a : { b: "hello" }} {a : { b: { c : 2} }} {code} Fails with following: {code} (java.lang.NullPointerException) null org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatchesUnion():40 org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatches():141 org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldId():207 org.apache.drill.exec.record.SimpleVectorWrapper.getFieldIdIfMatches():101 org.apache.drill.exec.record.VectorContainer.getValueVectorId():269 org.apache.drill.exec.physical.impl.ScanBatch.getValueVectorId():325 org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.getValueVectorId():182 org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitSchemaPath():628 org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitSchemaPath():217 org.apache.drill.common.expression.SchemaPath.accept():152 org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitFunctionCall():274 org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitFunctionCall():217 org.apache.drill.common.expression.FunctionCall.accept():60 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3317) when ProtobufLengthDecoder couldn't allocate a new DrillBuf, this error is just logged and nothing else is done
[ https://issues.apache.org/jira/browse/DRILL-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217658#comment-15217658 ] Steven Phillips commented on DRILL-3317: Until a few months ago, the OutOfMemoryHandler would cause a message to propogated to the operators, which could potentially be handled by ExternalSort. But with the allocator changes in DRILL-4134 (809f4620d7d82c72240212de13b993049550959d), this is no longer happening, and now the it just logs a message. Was there a particular reason that functionality was removed? In the comments for DRILL-3241, [~jnadeau] says that the funcionality should be removed because it does not work. Do you mean that it's not working now because it was removed? Or that it didn't work even before it was removed? > when ProtobufLengthDecoder couldn't allocate a new DrillBuf, this error is > just logged and nothing else is done > --- > > Key: DRILL-3317 > URL: https://issues.apache.org/jira/browse/DRILL-3317 > Project: Apache Drill > Issue Type: Bug > Components: Execution - RPC >Reporter: Deneche A. Hakim >Assignee: Jacques Nadeau > Fix For: 1.7.0 > > > Trying to reproduce DRILL-3241 I sometimes get the following error in the > logs: > {noformat} > ERROR: Out of memory outside any particular fragment. > at > org.apache.drill.exec.rpc.data.DataResponseHandlerImpl.informOutOfMemory(DataResponseHandlerImpl.java:40) > at > org.apache.drill.exec.rpc.data.DataServer$2.handle(DataServer.java:227) > at > org.apache.drill.exec.rpc.ProtobufLengthDecoder.decode(ProtobufLengthDecoder.java:87) > at > org.apache.drill.exec.rpc.data.DataProtobufLengthDecoder$Server.decode(DataProtobufLengthDecoder.java:52) > at > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:315) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:229) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > WARN: Failure allocating buffer on incoming stream due to memory limits. > Current Allocation: 1372678764. > at > org.apache.drill.exec.rpc.ProtobufLengthDecoder.decode(ProtobufLengthDecoder.java:85) > at > org.apache.drill.exec.rpc.data.DataProtobufLengthDecoder$Server.decode(DataProtobufLengthDecoder.java:52) > at > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:315) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:229) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > {noformat} > ProtobufLengthDecoder.decode() does call OutOfMemoryHandler.handle() which > calls DataResponseHandlerImpl.informOutOfMemory() which just logs the error > in the logs. > If we have fragments waiting for data they will be stuck waiting forever, and > the query will hang (behavior observed in DRILL-3241 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (DRILL-4489) Add ValueVector tests from Drill
[ https://issues.apache.org/jira/browse/DRILL-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips closed DRILL-4489. -- Resolution: Invalid This jira should be in the Arrow project, not Drill > Add ValueVector tests from Drill > > > Key: DRILL-4489 > URL: https://issues.apache.org/jira/browse/DRILL-4489 > Project: Apache Drill > Issue Type: Bug >Reporter: Steven Phillips > > There are some simple ValueVector tests that should be included in the Arrow > project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4489) Add ValueVector tests from Drill
Steven Phillips created DRILL-4489: -- Summary: Add ValueVector tests from Drill Key: DRILL-4489 URL: https://issues.apache.org/jira/browse/DRILL-4489 Project: Apache Drill Issue Type: Bug Reporter: Steven Phillips There are some simple ValueVector tests that should be included in the Arrow project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4486) Expression serializer incorrectly serializes escaped characters
Steven Phillips created DRILL-4486: -- Summary: Expression serializer incorrectly serializes escaped characters Key: DRILL-4486 URL: https://issues.apache.org/jira/browse/DRILL-4486 Project: Apache Drill Issue Type: Bug Reporter: Steven Phillips Assignee: Steven Phillips the drill expression parser requires backslashes to be escaped. But the ExpressionStringBuilder is not properly escaping them. This causes problems, especially in the case of regex expressions run with parallel execution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4455) Depend on Apache Arrow for Vector and Memory
Steven Phillips created DRILL-4455: -- Summary: Depend on Apache Arrow for Vector and Memory Key: DRILL-4455 URL: https://issues.apache.org/jira/browse/DRILL-4455 Project: Apache Drill Issue Type: Bug Reporter: Steven Phillips Assignee: Steven Phillips Fix For: 1.7.0 The code for value vectors and memory has been split and contributed to the apache arrow repository. In order to help this project advance, Drill should depend on the arrow project instead of internal value vector code. This change will require recompiling any external code, such as UDFs and StoragePlugins. The changes will mainly just involve renaming the classes to the org.apache.arrow namespace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4382) Remove dependency on drill-logical from vector submodule
[ https://issues.apache.org/jira/browse/DRILL-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-4382: --- Assignee: Hanifi Gunes (was: Steven Phillips) > Remove dependency on drill-logical from vector submodule > > > Key: DRILL-4382 > URL: https://issues.apache.org/jira/browse/DRILL-4382 > Project: Apache Drill > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Hanifi Gunes > > This is in preparation for transitioning the code to the Apache Arrow project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4382) Remove dependency on drill-logical from vector submodule
Steven Phillips created DRILL-4382: -- Summary: Remove dependency on drill-logical from vector submodule Key: DRILL-4382 URL: https://issues.apache.org/jira/browse/DRILL-4382 Project: Apache Drill Issue Type: Bug Reporter: Steven Phillips Assignee: Steven Phillips This is in preparation for transitioning the code to the Apache Arrow project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4297) Provide a new interface to send custom messages
[ https://issues.apache.org/jira/browse/DRILL-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133748#comment-15133748 ] Steven Phillips commented on DRILL-4297: +1 > Provide a new interface to send custom messages > --- > > Key: DRILL-4297 > URL: https://issues.apache.org/jira/browse/DRILL-4297 > Project: Apache Drill > Issue Type: Improvement >Reporter: amit hadke >Assignee: Steven Phillips > Attachments: DRILL-4297.patch > > > Currently custom messages are restricted to protobuf messages. > Provide a new interface to custom message that allows to send/receives > Pojos/bytes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4339) Avro Reader can not read records - Regression
[ https://issues.apache.org/jira/browse/DRILL-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129381#comment-15129381 ] Steven Phillips commented on DRILL-4339: I personally don't have much problem with having to recompile my code, I was just wondering if this would create problems for others. If reverting the signature change could avoid a few headaches, and there is very little cost in making the change, I say we go ahead and merge it. +1 > Avro Reader can not read records - Regression > - > > Key: DRILL-4339 > URL: https://issues.apache.org/jira/browse/DRILL-4339 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other >Affects Versions: 1.5.0 >Reporter: Stefán Baxter >Priority: Blocker > Fix For: 1.5.0 > > > Simple reading of Avro records no longer works > 0: jdbc:drill:zk=local> select * from dfs.asa.`/`; > Exception in thread "drill-executor-2" java.lang.NoSuchMethodError: > org.apache.drill.exec.store.avro.AvroRecordReader.setColumns(Ljava/util/Collection;)V > at > org.apache.drill.exec.store.avro.AvroRecordReader.(AvroRecordReader.java:99) > at > org.apache.drill.exec.store.avro.AvroFormatPlugin.getRecordReader(AvroFormatPlugin.java:73) > at > org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin.getReaderBatch(EasyFormatPlugin.java:172) > at > org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:35) > at > org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:28) > at > org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:147) > at > org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:170) > at > org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:127) > at > org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:170) > at > org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:127) > at > org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:170) > at > org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreator.java:101) > at > org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:79) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:230) > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > We have been using the Avro reader for a while and this looks like a > regression. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4215) Transfer ownership of buffers when doing transfers
Steven Phillips created DRILL-4215: -- Summary: Transfer ownership of buffers when doing transfers Key: DRILL-4215 URL: https://issues.apache.org/jira/browse/DRILL-4215 Project: Apache Drill Issue Type: Bug Reporter: Steven Phillips Assignee: Steven Phillips The new allocator has the feature of allowing the transfer of ownership of buffers from one allocator to another. We should make use of this feature by transferring ownership whenever we transfer buffers between vectors. This will allow better tracking of how much memory operators are holding on to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4208) Storage plugin configuration persistence not working for Apache Drill
[ https://issues.apache.org/jira/browse/DRILL-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063271#comment-15063271 ] Steven Phillips commented on DRILL-4208: The properties in drill-override.conf are hierarchical. Since you are already inside drill.exec, you don't include it in the key. So it should be like this: drill.exec: { cluster-id: "drillbits1", zk.connect: "localhost:2181", sys.store.provider.local.path = "/home/dev/abc" } > Storage plugin configuration persistence not working for Apache Drill > - > > Key: DRILL-4208 > URL: https://issues.apache.org/jira/browse/DRILL-4208 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.3.0 > Environment: Ubuntu 14.0.4 >Reporter: Devender Yadav > Fix For: Future > > > According to Drill's documentation : > Drill uses /tmp/drill/sys.storage_plugins to store storage plugin > configurations. The temporary directory clears when you quit the Drill shell. > To save your storage plugin configurations from one session to the next, set > the following option in the drill-override.conf file if you are running Drill > in embedded mode. > drill.exec.sys.store.provider.local.path = "/mypath" > I checked /tmp/drill/sys.storage_plugins, there is some data in this file. > Then I modified drill-override.conf : > drill.exec: { > cluster-id: "drillbits1", > zk.connect: "localhost:2181", > drill.exec.sys.store.provider.local.path = "/home/dev/abc" > } > I restarted drill & even restarted my machine. Nothing is created at this > location. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4159) TestCsvHeader sometimes fails due to ordering issue
Steven Phillips created DRILL-4159: -- Summary: TestCsvHeader sometimes fails due to ordering issue Key: DRILL-4159 URL: https://issues.apache.org/jira/browse/DRILL-4159 Project: Apache Drill Issue Type: Bug Reporter: Steven Phillips Assignee: Steven Phillips This test should be rewritten to use the query test framework, rather than doing a string comparison of the entire result set. And it should be specified as unordered, so that results aren't affected by the random order in which files are read. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4160) Order By on a flattened column throws SchemaChangeException - Missing function implementation
[ https://issues.apache.org/jira/browse/DRILL-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039586#comment-15039586 ] Steven Phillips commented on DRILL-4160: This doesn't have anything to do with flatten. You can't order by or group by a map type. The message could be better, but it's tricky because the failure doesn't occur in the sort operator, it happens in the exchange before the data even gets to the sort. > Order By on a flattened column throws SchemaChangeException - Missing > function implementation > - > > Key: DRILL-4160 > URL: https://issues.apache.org/jira/browse/DRILL-4160 > Project: Apache Drill > Issue Type: Bug > Components: SQL Parser, Storage - JSON >Reporter: Abhishek Girish > Attachments: drillbit.log.txt > > > Query with an ORDER BY clause on a flattened column fails: > {code} > > select `name`, `type`, flatten(kvgen(`compliments`)) as `compliments` from > > `user` order by `name`, `type`, `compliments` limit 2; > Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to > materialize incoming schema. Errors: > Error in expression at index 2. Error: Missing function implementation: > [hash64asdouble(MAP-REQUIRED, BIGINT-REQUIRED)]. Full expression: null.. > Fragment 3:0 > [Error Id: 3b3d3224-953a-46a2-8caa-fa6949e58ffd on abhi1:31010] > (state=,code=0) > {code} > Query without order by on the flatten column executes fine. > {code} > > select `name`, `type`, flatten(kvgen(`compliments`)) as `compliments` from > > `user` order by `name`, `type` limit 2; > ++---++ > |name| type | compliments | > ++---++ > | Kurt | user | {"key":"cute","value":1.0} | > | Kurt | user | {"key":"writer","value":1.0} | > ++---++ > 2 rows selected (4.239 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (DRILL-4160) Order By on a flattened column throws SchemaChangeException - Missing function implementation
[ https://issues.apache.org/jira/browse/DRILL-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-4160: --- Comment: was deleted (was: never mind, i misread the query, you are not ordering by compliments.) > Order By on a flattened column throws SchemaChangeException - Missing > function implementation > - > > Key: DRILL-4160 > URL: https://issues.apache.org/jira/browse/DRILL-4160 > Project: Apache Drill > Issue Type: Bug > Components: SQL Parser, Storage - JSON >Reporter: Abhishek Girish > Attachments: drillbit.log.txt > > > Query with an ORDER BY clause on a flattened column fails: > {code} > > select `name`, `type`, flatten(kvgen(`compliments`)) as `compliments` from > > `user` order by `name`, `type`, `compliments` limit 2; > Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to > materialize incoming schema. Errors: > Error in expression at index 2. Error: Missing function implementation: > [hash64asdouble(MAP-REQUIRED, BIGINT-REQUIRED)]. Full expression: null.. > Fragment 3:0 > [Error Id: 3b3d3224-953a-46a2-8caa-fa6949e58ffd on abhi1:31010] > (state=,code=0) > {code} > Query without order by on the flatten column executes fine. > {code} > > select `name`, `type`, flatten(kvgen(`compliments`)) as `compliments` from > > `user` order by `name`, `type` limit 2; > ++---++ > |name| type | compliments | > ++---++ > | Kurt | user | {"key":"cute","value":1.0} | > | Kurt | user | {"key":"writer","value":1.0} | > ++---++ > 2 rows selected (4.239 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-4145) IndexOutOfBoundsException raised during select * query on S3 csv file
[ https://issues.apache.org/jira/browse/DRILL-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips reassigned DRILL-4145: -- Assignee: Steven Phillips > IndexOutOfBoundsException raised during select * query on S3 csv file > - > > Key: DRILL-4145 > URL: https://issues.apache.org/jira/browse/DRILL-4145 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.3.0 > Environment: Drill 1.3.0 on a 3 node distributed-mode cluster on AWS. > Data files on S3. > S3 storage plugin configuration: > { > "type": "file", > "enabled": true, > "connection": "s3a://", > "workspaces": { > "root": { > "location": "/", > "writable": false, > "defaultInputFormat": null > }, > "views": { > "location": "/processed", > "writable": true, > "defaultInputFormat": null > }, > "tmp": { > "location": "/tmp", > "writable": true, > "defaultInputFormat": null > } > }, > "formats": { > "psv": { > "type": "text", > "extensions": [ > "tbl" > ], > "delimiter": "|" > }, > "csv": { > "type": "text", > "extensions": [ > "csv" > ], > "extractHeader": true, > "delimiter": "," > }, > "tsv": { > "type": "text", > "extensions": [ > "tsv" > ], > "delimiter": "\t" > }, > "parquet": { > "type": "parquet" > }, > "json": { > "type": "json" > }, > "avro": { > "type": "avro" > }, > "sequencefile": { > "type": "sequencefile", > "extensions": [ > "seq" > ] > }, > "csvh": { > "type": "text", > "extensions": [ > "csvh", > "csv" > ], > "extractHeader": true, > "delimiter": "," > } > } > } >Reporter: Peter McTaggart >Assignee: Steven Phillips > Attachments: apps1-bad.csv, apps1.csv > > > When trying to query (via sqlline or WebUI) a .csv file I am getting an > IndexOutofBoundsException: > {noformat} 0: jdbc:drill:> select * from > s3data.root.`staging/data/apps1-bad.csv` limit 1; > Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 > (expected: range(0, 16384)) > Fragment 0:0 > [Error Id: be9856d2-0b80-4b9c-94a4-a1ca38ec5db0 on > ip-X.compute.internal:31010] (state=,code=0) > 0: jdbc:drill:> select * from s3data.root.`staging/data/apps1.csv` limit 1; > +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ > | FIELD_1 | FIELD_2| FIELD_3 | FIELD_4 | FIELD_5 | FIELD_6 > | FIELD_7 | FIELD_8 | FIELD_9 | FIELD_10 | FIELD_11 | > FIELD_12 | FIELD_13 | FIELD_14 | FIELD_15 | FIELD_16 | FIELD_17 | > FIELD_18 | FIELD_19 | FIELD_20 | FIELD_21 | FIELD_22 | > FIELD_23 | FIELD_24 | FIELD_25 | FIELD_26 | FIELD_27 | FIELD_28 | > FIELD_29 | FIELD_30 | FIELD_31 | FIELD_32 | FIELD_33 | FIELD_34 | > FIELD_35 | > +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ > | 489517 | 27/10/2015 02:05:27 | 261 | 1130232 | 0| > 925630488 | 0| 925630488 | -1 | 19531580547 | | > 27/10/2015 02:00:00 | | 30| 300 | 0 | 0 >| | | 27/10/2015 02:05:27 | 0 | 1 | 0 > | 35.0 | | | | 505 | 872.0 > | | aBc | | | | | >
[jira] [Commented] (DRILL-4145) IndexOutOfBoundsException raised during select * query on S3 csv file
[ https://issues.apache.org/jira/browse/DRILL-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035589#comment-15035589 ] Steven Phillips commented on DRILL-4145: There is a bug in the case where there is an empty string for the last field. Basically, when the parser sees the pattern , the parser calls the "endEmptyField()" method of the TextInput. This was ok when using the RepeatedVarCharInput, because calling this method resulted in an empty string element being added to the array. But in the FieldVarCharOutput, ending the field doesn't do anything unless you first start the field. > IndexOutOfBoundsException raised during select * query on S3 csv file > - > > Key: DRILL-4145 > URL: https://issues.apache.org/jira/browse/DRILL-4145 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.3.0 > Environment: Drill 1.3.0 on a 3 node distributed-mode cluster on AWS. > Data files on S3. > S3 storage plugin configuration: > { > "type": "file", > "enabled": true, > "connection": "s3a://", > "workspaces": { > "root": { > "location": "/", > "writable": false, > "defaultInputFormat": null > }, > "views": { > "location": "/processed", > "writable": true, > "defaultInputFormat": null > }, > "tmp": { > "location": "/tmp", > "writable": true, > "defaultInputFormat": null > } > }, > "formats": { > "psv": { > "type": "text", > "extensions": [ > "tbl" > ], > "delimiter": "|" > }, > "csv": { > "type": "text", > "extensions": [ > "csv" > ], > "extractHeader": true, > "delimiter": "," > }, > "tsv": { > "type": "text", > "extensions": [ > "tsv" > ], > "delimiter": "\t" > }, > "parquet": { > "type": "parquet" > }, > "json": { > "type": "json" > }, > "avro": { > "type": "avro" > }, > "sequencefile": { > "type": "sequencefile", > "extensions": [ > "seq" > ] > }, > "csvh": { > "type": "text", > "extensions": [ > "csvh", > "csv" > ], > "extractHeader": true, > "delimiter": "," > } > } > } >Reporter: Peter McTaggart > Attachments: apps1-bad.csv, apps1.csv > > > When trying to query (via sqlline or WebUI) a .csv file I am getting an > IndexOutofBoundsException: > {noformat} 0: jdbc:drill:> select * from > s3data.root.`staging/data/apps1-bad.csv` limit 1; > Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 > (expected: range(0, 16384)) > Fragment 0:0 > [Error Id: be9856d2-0b80-4b9c-94a4-a1ca38ec5db0 on > ip-X.compute.internal:31010] (state=,code=0) > 0: jdbc:drill:> select * from s3data.root.`staging/data/apps1.csv` limit 1; > +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ > | FIELD_1 | FIELD_2| FIELD_3 | FIELD_4 | FIELD_5 | FIELD_6 > | FIELD_7 | FIELD_8 | FIELD_9 | FIELD_10 | FIELD_11 | > FIELD_12 | FIELD_13 | FIELD_14 | FIELD_15 | FIELD_16 | FIELD_17 | > FIELD_18 | FIELD_19 | FIELD_20 | FIELD_21 | FIELD_22 | > FIELD_23 | FIELD_24 | FIELD_25 | FIELD_26 | FIELD_27 | FIELD_28 | > FIELD_29 | FIELD_30 | FIELD_31 | FIELD_32 | FIELD_33 | FIELD_34 | > FIELD_35 | > +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ > | 489517 | 27/10/2015 02:05:27 | 261 | 1130232 | 0| > 925630488 | 0| 925630488 | -1 | 19531580547 | | > 27/10/2015 02:00:00 | | 30| 300 | 0 | 0 >| | | 27/10/2015 02:05:27 | 0 | 1 | 0 > | 35.0 | | | | 505 | 872.0 > | | aBc | | | | | >
[jira] [Updated] (DRILL-4145) IndexOutOfBoundsException raised during select * query on S3 csv file
[ https://issues.apache.org/jira/browse/DRILL-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-4145: --- Assignee: Jacques Nadeau (was: Steven Phillips) > IndexOutOfBoundsException raised during select * query on S3 csv file > - > > Key: DRILL-4145 > URL: https://issues.apache.org/jira/browse/DRILL-4145 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.3.0 > Environment: Drill 1.3.0 on a 3 node distributed-mode cluster on AWS. > Data files on S3. > S3 storage plugin configuration: > { > "type": "file", > "enabled": true, > "connection": "s3a://", > "workspaces": { > "root": { > "location": "/", > "writable": false, > "defaultInputFormat": null > }, > "views": { > "location": "/processed", > "writable": true, > "defaultInputFormat": null > }, > "tmp": { > "location": "/tmp", > "writable": true, > "defaultInputFormat": null > } > }, > "formats": { > "psv": { > "type": "text", > "extensions": [ > "tbl" > ], > "delimiter": "|" > }, > "csv": { > "type": "text", > "extensions": [ > "csv" > ], > "extractHeader": true, > "delimiter": "," > }, > "tsv": { > "type": "text", > "extensions": [ > "tsv" > ], > "delimiter": "\t" > }, > "parquet": { > "type": "parquet" > }, > "json": { > "type": "json" > }, > "avro": { > "type": "avro" > }, > "sequencefile": { > "type": "sequencefile", > "extensions": [ > "seq" > ] > }, > "csvh": { > "type": "text", > "extensions": [ > "csvh", > "csv" > ], > "extractHeader": true, > "delimiter": "," > } > } > } >Reporter: Peter McTaggart >Assignee: Jacques Nadeau > Attachments: apps1-bad.csv, apps1.csv > > > When trying to query (via sqlline or WebUI) a .csv file I am getting an > IndexOutofBoundsException: > {noformat} 0: jdbc:drill:> select * from > s3data.root.`staging/data/apps1-bad.csv` limit 1; > Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 > (expected: range(0, 16384)) > Fragment 0:0 > [Error Id: be9856d2-0b80-4b9c-94a4-a1ca38ec5db0 on > ip-X.compute.internal:31010] (state=,code=0) > 0: jdbc:drill:> select * from s3data.root.`staging/data/apps1.csv` limit 1; > +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ > | FIELD_1 | FIELD_2| FIELD_3 | FIELD_4 | FIELD_5 | FIELD_6 > | FIELD_7 | FIELD_8 | FIELD_9 | FIELD_10 | FIELD_11 | > FIELD_12 | FIELD_13 | FIELD_14 | FIELD_15 | FIELD_16 | FIELD_17 | > FIELD_18 | FIELD_19 | FIELD_20 | FIELD_21 | FIELD_22 | > FIELD_23 | FIELD_24 | FIELD_25 | FIELD_26 | FIELD_27 | FIELD_28 | > FIELD_29 | FIELD_30 | FIELD_31 | FIELD_32 | FIELD_33 | FIELD_34 | > FIELD_35 | > +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ > | 489517 | 27/10/2015 02:05:27 | 261 | 1130232 | 0| > 925630488 | 0| 925630488 | -1 | 19531580547 | | > 27/10/2015 02:00:00 | | 30| 300 | 0 | 0 >| | | 27/10/2015 02:05:27 | 0 | 1 | 0 > | 35.0 | | | | 505 | 872.0 > | | aBc | | | | | >
[jira] [Commented] (DRILL-2419) UDF that returns string representation of expression type
[ https://issues.apache.org/jira/browse/DRILL-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034828#comment-15034828 ] Steven Phillips commented on DRILL-2419: Yes, the typeof function is in the UnionFunctions class. > UDF that returns string representation of expression type > - > > Key: DRILL-2419 > URL: https://issues.apache.org/jira/browse/DRILL-2419 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Reporter: Victoria Markman >Assignee: Mehant Baid > Fix For: Future > > > Suggested name: typeof (credit goes to Aman) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4081) Handle schema changes in ExternalSort
Steven Phillips created DRILL-4081: -- Summary: Handle schema changes in ExternalSort Key: DRILL-4081 URL: https://issues.apache.org/jira/browse/DRILL-4081 Project: Apache Drill Issue Type: Improvement Reporter: Steven Phillips Assignee: Steven Phillips This improvement will make use of the Union vector to handle schema changes. When a new schema appears, the schema will be "merged" with the previous schema. The result will be a new schema that uses Union type to store the columns where this is a type conflict. All of the batches (including the batches that have already arrived) will be coerced into this new schema. A new comparison function will be included to handle the comparison of Union type. Comparison of union type will work as follows: 1. All numeric types can be mutually compared, and will be compared using Drill implicit cast rules. 2. All other types will not be compared against other types, but only among values of the same type. 3. There will be an overall precedence of types with regards to ordering. This precedence is not yet defined, but will be as part of the work on this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4054) convert_from(,'JSON') gives JsonParseException
[ https://issues.apache.org/jira/browse/DRILL-4054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997926#comment-14997926 ] Steven Phillips commented on DRILL-4054: (1) I agree the message is bad. If that's the issue, please be explicit. It wasn't clear from the description whether there was an actual bug or just a request for better error message. (2) This doesn't seem to have anything to do with the original bug. In fact, this isn't even a bug. The convert_from function requires a varbinary or varchar input. It is not possible to perform this function against a MAP type. > convert_from(,'JSON') gives JsonParseException > --- > > Key: DRILL-4054 > URL: https://issues.apache.org/jira/browse/DRILL-4054 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.3.0 >Reporter: Khurram Faraaz > > convert_from(,'JSON') gives JsonParseException > sys.version => 3a73f098 > Drill 1.3 > 4 node cluster CentOS > {code} > 0: jdbc:drill:schema=dfs.tmp> select columns[3], convert_from(CAST(columns[3] > AS VARCHAR(64)),'JSON') json FROM `allData.csv`; > Error: SYSTEM ERROR: JsonParseException: Unrecognized token > 'AXCB': was expecting > ('true', 'false' or 'null') > at [Source: > org.apache.drill.exec.vector.complex.fn.DrillBufInputStream@5441715d; line: > 1, column: 105] > Fragment 0:0 > [Error Id: 7f8cb677-20e9-4e99-bbec-3ada707671ee on centos-03.qa.lab:31010] > (state=,code=0) > Stack trace from drillbit.log > [Error Id: 7f8cb677-20e9-4e99-bbec-3ada707671ee on centos-03.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > JsonParseException: Unrecognized token > 'AXCB': was expecting > ('true', 'false' or 'null') > at [Source: > org.apache.drill.exec.vector.complex.fn.DrillBufInputStream@5441715d; line: > 1, column: 105] > Fragment 0:0 > [Error Id: 7f8cb677-20e9-4e99-bbec-3ada707671ee on centos-03.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534) > ~[drill-common-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321) > [drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184) > [drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290) > [drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.3.0.jar:1.3.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_85] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_85] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85] > Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error > while converting from JSON. > at > org.apache.drill.exec.test.generated.ProjectorGen6.doEval(ProjectorTemplate.java:126) > ~[na:na] > at > org.apache.drill.exec.test.generated.ProjectorGen6.projectRecords(ProjectorTemplate.java:62) > ~[na:na] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:174) > ~[drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93) > ~[drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:131) > ~[drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:156) > ~[drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) > ~[drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:80) > ~[drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) > ~[drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:256) > ~[drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:250) > ~[drill-java-exec-1.3.0.jar:1.3.0] > at java.security.AccessController.doPrivileged(Native Method) > ~[na:1.7.0_85] > at javax.security.auth.Subject.doAs(Subject.java:415)
[jira] [Commented] (DRILL-4054) convert_from(,'JSON') gives JsonParseException
[ https://issues.apache.org/jira/browse/DRILL-4054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997896#comment-14997896 ] Steven Phillips commented on DRILL-4054: Could you explain what the bug is here? > convert_from(,'JSON') gives JsonParseException > --- > > Key: DRILL-4054 > URL: https://issues.apache.org/jira/browse/DRILL-4054 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.3.0 >Reporter: Khurram Faraaz > > convert_from(,'JSON') gives JsonParseException > sys.version => 3a73f098 > Drill 1.3 > 4 node cluster CentOS > {code} > 0: jdbc:drill:schema=dfs.tmp> select columns[3], convert_from(CAST(columns[3] > AS VARCHAR(64)),'JSON') json FROM `allData.csv`; > Error: SYSTEM ERROR: JsonParseException: Unrecognized token > 'AXCB': was expecting > ('true', 'false' or 'null') > at [Source: > org.apache.drill.exec.vector.complex.fn.DrillBufInputStream@5441715d; line: > 1, column: 105] > Fragment 0:0 > [Error Id: 7f8cb677-20e9-4e99-bbec-3ada707671ee on centos-03.qa.lab:31010] > (state=,code=0) > Stack trace from drillbit.log > [Error Id: 7f8cb677-20e9-4e99-bbec-3ada707671ee on centos-03.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > JsonParseException: Unrecognized token > 'AXCB': was expecting > ('true', 'false' or 'null') > at [Source: > org.apache.drill.exec.vector.complex.fn.DrillBufInputStream@5441715d; line: > 1, column: 105] > Fragment 0:0 > [Error Id: 7f8cb677-20e9-4e99-bbec-3ada707671ee on centos-03.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534) > ~[drill-common-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321) > [drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184) > [drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290) > [drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.3.0.jar:1.3.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_85] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_85] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85] > Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error > while converting from JSON. > at > org.apache.drill.exec.test.generated.ProjectorGen6.doEval(ProjectorTemplate.java:126) > ~[na:na] > at > org.apache.drill.exec.test.generated.ProjectorGen6.projectRecords(ProjectorTemplate.java:62) > ~[na:na] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:174) > ~[drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93) > ~[drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:131) > ~[drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:156) > ~[drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) > ~[drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:80) > ~[drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) > ~[drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:256) > ~[drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:250) > ~[drill-java-exec-1.3.0.jar:1.3.0] > at java.security.AccessController.doPrivileged(Native Method) > ~[na:1.7.0_85] > at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_85] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) > ~[hadoop-common-2.7.0-mapr-1506.jar:na] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250) > [drill-java-exec-1.3.0.jar:1.3.0] > ... 4 common frames omitted > Caused by:
[jira] [Commented] (DRILL-3845) Partition sender shouldn't send the "last batch" to a receiver that sent a "receiver finished" to the sender
[ https://issues.apache.org/jira/browse/DRILL-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992319#comment-14992319 ] Steven Phillips commented on DRILL-3845: Based on your comment, it seems like the upstream fragment that runs for an hour is supposed to be terminated. Is that correct? If so, that seems to be the real problem. Why is it not terminating? > Partition sender shouldn't send the "last batch" to a receiver that sent a > "receiver finished" to the sender > > > Key: DRILL-3845 > URL: https://issues.apache.org/jira/browse/DRILL-3845 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Reporter: Deneche A. Hakim >Assignee: Deneche A. Hakim > Fix For: 1.4.0 > > Attachments: 29c45a5b-e2b9-72d6-89f2-d49ba88e2939.sys.drill > > > Even if a receiver has finished and informed the corresponding partition > sender, the sender will still try to send a "last batch" to the receiver when > it's done. In most cases this is fine as those batches will be silently > dropped by the receiving DataServer, but if a receiver has finished +10 > minutes ago, DataServer will throw an exception as it couldn't find the > corresponding FragmentManager (WorkEventBus has a 10 minutes recentlyFinished > cache). > DRILL-2274 is a reproduction for this case (after the corresponding fix is > applied). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4041) Parquet library update causing random "Buffer has negative reference count"
[ https://issues.apache.org/jira/browse/DRILL-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992873#comment-14992873 ] Steven Phillips commented on DRILL-4041: The first error definitely looks like the same thing. As for the accounting error, I don't know if that's related. It could just be a side-effect of the first. > Parquet library update causing random "Buffer has negative reference count" > --- > > Key: DRILL-4041 > URL: https://issues.apache.org/jira/browse/DRILL-4041 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.3.0 >Reporter: Rahul Challapalli >Assignee: Steven Phillips >Priority: Critical > > git commit # 39582bd60c9e9b16aba4f099d434e927e7e5 > After the parquet library update commit, we started seeing the below error > randomly causing failures in the Extended Functional Suite. > {code} > Failed with exception > java.lang.IllegalArgumentException: Buffer has negative reference count. > at > oadd.com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) > at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:250) > at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:259) > at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:259) > at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:259) > at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:239) > at > oadd.org.apache.drill.exec.vector.BaseDataValueVector.clear(BaseDataValueVector.java:39) > at > oadd.org.apache.drill.exec.vector.NullableIntVector.clear(NullableIntVector.java:150) > at > oadd.org.apache.drill.exec.record.SimpleVectorWrapper.clear(SimpleVectorWrapper.java:84) > at > oadd.org.apache.drill.exec.record.VectorContainer.zeroVectors(VectorContainer.java:312) > at > oadd.org.apache.drill.exec.record.VectorContainer.clear(VectorContainer.java:296) > at > oadd.org.apache.drill.exec.record.RecordBatchLoader.clear(RecordBatchLoader.java:183) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.cleanup(DrillResultSetImpl.java:139) > at org.apache.drill.jdbc.impl.DrillCursor.close(DrillCursor.java:333) > at > oadd.net.hydromatic.avatica.AvaticaResultSet.close(AvaticaResultSet.java:110) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.close(DrillResultSetImpl.java:169) > at > org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:233) > at > org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:89) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3992) Unable to query Oracle DB using JDBC Storage Plug-In
[ https://issues.apache.org/jira/browse/DRILL-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984635#comment-14984635 ] Steven Phillips commented on DRILL-3992: +1 > Unable to query Oracle DB using JDBC Storage Plug-In > > > Key: DRILL-3992 > URL: https://issues.apache.org/jira/browse/DRILL-3992 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.2.0 > Environment: Windows 7 Enterprise 64-bit, Oracle 10g, Teradata 15.00 >Reporter: Eric Roma >Priority: Minor > Labels: newbie > Fix For: 1.2.0 > > > *See External Issue URL for Stack Overflow Post* > *Appears to be similar issue at > http://stackoverflow.com/questions/33370438/apache-drill-1-2-and-sql-server-jdbc* > Using Apache Drill v1.2 and Oracle Database 10g Enterprise Edition Release > 10.2.0.4.0 - 64bit in embedded mode. > I'm curious if anyone has had any success connecting Apache Drill to an > Oracle DB. I've updated the drill-override.conf with the following > configurations (per documents): > drill.exec: { > cluster-id: "drillbits1", > zk.connect: "localhost:2181", > drill.exec.sys.store.provider.local.path = "/mypath" > } > and placed the ojdbc6.jar in \apache-drill-1.2.0\jars\3rdparty. I can > successfully create the storage plug-in: > { > "type": "jdbc", > "driver": "oracle.jdbc.driver.OracleDriver", > "url": "jdbc:oracle:thin:@::", > "username": "USERNAME", > "password": "PASSWORD", > "enabled": true > } > but when I issue a query such as: > select * from ..`dual`; > I get the following error: > Query Failed: An Error Occurred > org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: > From line 1, column 15 to line 1, column 20: Table > '..dual' not found [Error Id: > 57a4153c-6378-4026-b90c-9bb727e131ae on :]. > I've tried to query other schema/tables and get a similar result. I've also > tried connecting to Teradata and get the same error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3956) TEXT MySQL type unsupported
[ https://issues.apache.org/jira/browse/DRILL-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984629#comment-14984629 ] Steven Phillips commented on DRILL-3956: +1 > TEXT MySQL type unsupported > --- > > Key: DRILL-3956 > URL: https://issues.apache.org/jira/browse/DRILL-3956 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other >Affects Versions: 1.2.0 >Reporter: Andrew >Assignee: Steven Phillips > Attachments: DRILL-3956.patch > > > The JDBC storage plugin will fail with an NPE when querying a MySQL table > that has a 'TEXT' column. The underlying problem appears to be that Calcite > has no notion of this type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3995) Scalar replacement bug with Common Subexpression Elimination
Steven Phillips created DRILL-3995: -- Summary: Scalar replacement bug with Common Subexpression Elimination Key: DRILL-3995 URL: https://issues.apache.org/jira/browse/DRILL-3995 Project: Apache Drill Issue Type: Bug Reporter: Steven Phillips The following query: {code} select t1.full_name from cp.`employee.json` t1, cp.`department.json` t2 where t1.department_id = t2.department_id and t1.position_id = t2.department_id {code} fails with the following: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: RuntimeException: Error at instruction 43: Expected an object reference, but found . setValue(II)V 0 R I I . . . . : :L0 1 R I I . . . . : : LINENUMBER 249 L0 2 R I I . . . . : : ICONST_0 3 R I I . . . . : I : ISTORE 3 4 R I I I . . . : : LCONST_0 5 R I I I . . . : J : LSTORE 4 6 R I I I J . . : :L1 7 R I I I J . . : : LINENUMBER 251 L1 8 R I I I J . . : : ALOAD 0 9 R I I I J . . : R : GETFIELD org/apache/drill/exec/test/generated/HashTableGen2$BatchHolder.vv20 : Lorg/apache/drill/exec/vector/NullableBigIntVector; 00010 R I I I J . . : R : INVOKEVIRTUAL org/apache/drill/exec/vector/NullableBigIntVector.getAccessor ()Lorg/apache/drill/exec/vector/NullableBigIntVector$Accessor; 00011 R I I I J . . : R : ILOAD 1 00012 R I I I J . . : R I : INVOKEVIRTUAL org/apache/drill/exec/vector/NullableBigIntVector$Accessor.isSet (I)I 00013 R I I I J . . : I : ISTORE 3 00014 R I I I J . . : :L2 00015 R I I I J . . : : LINENUMBER 252 L2 00016 R I I I J . . : : ILOAD 3 00017 R I I I J . . : I : ICONST_1 00018 R I I I J . . : I I : IF_ICMPNE L3 00019 R I I I J . . : :L4 00020 ? : LINENUMBER 253 L4 00021 ? : ALOAD 0 00022 ? : GETFIELD org/apache/drill/exec/test/generated/HashTableGen2$BatchHolder.vv20 : Lorg/apache/drill/exec/vector/NullableBigIntVector; 00023 ? : INVOKEVIRTUAL org/apache/drill/exec/vector/NullableBigIntVector.getAccessor ()Lorg/apache/drill/exec/vector/NullableBigIntVector$Accessor; 00024 ? : ILOAD 1 00025 ? : INVOKEVIRTUAL org/apache/drill/exec/vector/NullableBigIntVector$Accessor.get (I)J 00026 ? : LSTORE 4 00027 R I I I J . . : :L3 00028 R I I I J . . : : LINENUMBER 256 L3 00029 R I I I J . . : : ILOAD 3 00030 R I I I J . . : I : ICONST_0 00031 R I I I J . . : I I : IF_ICMPEQ L5 00032 R I I I J . . : :L6 00033 ? : LINENUMBER 257 L6 00034 ? : ALOAD 0 00035 ? : GETFIELD org/apache/drill/exec/test/generated/HashTableGen2$BatchHolder.vv24 : Lorg/apache/drill/exec/vector/NullableBigIntVector; 00036 ? : INVOKEVIRTUAL org/apache/drill/exec/vector/NullableBigIntVector.getMutator ()Lorg/apache/drill/exec/vector/NullableBigIntVector$Mutator; 00037 ? : ILOAD 2 00038 ? : ILOAD 3 00039 ? : LLOAD 4 00040 ? : INVOKEVIRTUAL org/apache/drill/exec/vector/NullableBigIntVector$Mutator.set (IIJ)V 00041 R I I I J . . : :L5 00042 R I I I J . . : : LINENUMBER 259 L5 00043 R I I I J . . : : ALOAD 6 00044 ? : GETFIELD org/apache/drill/exec/expr/holders/NullableBigIntHolder.isSet : I 00045 ? : ICONST_0 00046 ? : IF_ICMPEQ L7 00047 ? :L8 00048 ? : LINENUMBER 260 L8 00049 ? : ALOAD 0 00050 ? : GETFIELD org/apache/drill/exec/test/generated/HashTableGen2$BatchHolder.vv27 : Lorg/apache/drill/exec/vector/NullableBigIntVector; 00051 ? : INVOKEVIRTUAL org/apache/drill/exec/vector/NullableBigIntVector.getMutator ()Lorg/apache/drill/exec/vector/NullableBigIntVector$Mutator; 00052 ? : ILOAD 2 00053 ? : ALOAD 6 00054 ? : GETFIELD org/apache/drill/exec/expr/holders/NullableBigIntHolder.isSet : I 00055 ? : ALOAD 6 00056 ? : GETFIELD org/apache/drill/exec/expr/holders/NullableBigIntHolder.value : J 00057 ? : INVOKEVIRTUAL org/apache/drill/exec/vector/NullableBigIntVector$Mutator.set (IIJ)V 00058 ? :L7 00059 ? : LINENUMBER 245 L7 00060 ? : RETURN 00061 ? :L9 when common subexpressions are eliminated (see DRILL-3912). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3975) Partition Planning rule causes query failure due to IndexOutOfBoundsException on HDFS
[ https://issues.apache.org/jira/browse/DRILL-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972984#comment-14972984 ] Steven Phillips commented on DRILL-3975: My approach has been to remove the scheme and authority from the paths any time I encounter code that uses the path as a key, or does any sort of string comparison. This is an area where I think we need to clean up. I don't think we are very consistent throughout the code base in how was handle paths. The usual trick I use to strip away the schema and authority is the method Path.getPathWithoutSchemeAndAuthority(Path p). If I have String objects and not Path objects, I will convert the String to a path, use the utility method to remove scheme and authority, and then call toString(). > Partition Planning rule causes query failure due to IndexOutOfBoundsException > on HDFS > - > > Key: DRILL-3975 > URL: https://issues.apache.org/jira/browse/DRILL-3975 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Jacques Nadeau > > In attempting to run the extended test suite provided by MapR, there are a > large number of queries that fail due to issues in the PruneScanRule and > specifically the DFSPartitionLocation constructor line 31. It is likely due > to issues with the code that are related to running on HDFS where this code > path has apparently not been tested. > An example test query this type of failure occurred: > /src/drill-test-framework/resources/Functional/ctas/ctas_auto_partition/tpch0.01_multiple_partitions/data/q11.q > Example stack trace below: > {code} > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > StringIndexOutOfBoundsException: String index out of range: -12 > [Error Id: f2941267-49b1-4f67-a17f-610ffb13fcb7 on > ip-172-31-30-32.us-west-2.compute.internal:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534) > ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:742) > [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:841) > [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:786) > [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) > [drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:788) > [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:894) > [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255) > [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_85] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_85] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85] > Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected > exception during fragment initialization: Internal error: Error while > applying rule PruneScanRule:Filter_On_Scan_Parquet, args > [rel#43148:DrillFilterRel.LOGICAL.ANY([]).[](input=rel#43147:Subset#4.LOGICAL.ANY([]).[],condition==($0, > 1)), rel#43241:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, > ctasAutoPartition, > tpch_multiple_partitions/lineitem_twopart_ordered2],groupscan=ParquetGroupScan > [entries=[ReadEntryWithPath > [path=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2]], > > selectionRoot=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2, > numFiles=1, usedMetadataFile=false, columns=[`l_modline`, `l_moddate`]])] > ... 4 common frames omitted > Caused by: java.lang.AssertionError: Internal error: Error while applying > rule PruneScanRule:Filter_On_Scan_Parquet, args > [rel#43148:DrillFilterRel.LOGICAL.ANY([]).[](input=rel#43147:Subset#4.LOGICAL.ANY([]).[],condition==($0, > 1)), rel#43241:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, > ctasAutoPartition, > tpch_multiple_partitions/lineitem_twopart_ordered2],groupscan=ParquetGroupScan > [entries=[ReadEntryWithPath >
[jira] [Commented] (DRILL-3229) Create a new EmbeddedVector
[ https://issues.apache.org/jira/browse/DRILL-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971766#comment-14971766 ] Steven Phillips commented on DRILL-3229: Regarding the list writer, I know it is a bit confusing, so I will try to give a better explanation for how it works. It confuses me at times as well. The type promotion was designed with the possibility of allowing other promotions in mind, but I am currently only doing promotion to Union. We should have a discussion about what other promotions we want to allow. Screen currently returns a Union type to the user. This is an area that will require additional enhancement. The DrillClient has no problem dealing with a Union vector. The jdbc driver, on the other hand, has only limited support for a Union type, currently. I think we might need to add a feature similar to what we have with complex types, which will determine if the client is able to handle Union types, and convert to json if it doesn't. So metadata queries will also return a Union type. As for case statements, I am leaning more toward a general philosophy of trying as much as we can to not fail queries, and so if there is something Drill can do to execute a query, it should do that. So I am leaning toward option 3. An untyped-null type is supported as part of a Union vector. This null value is encoded in the 'type' vector. This patch does not introduce a standalone Untyped Null Vector. That will be a separate patch. I will update the design document with what I have said here. > Create a new EmbeddedVector > --- > > Key: DRILL-3229 > URL: https://issues.apache.org/jira/browse/DRILL-3229 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Codegen, Execution - Data Types, Execution - > Relational Operators, Functions - Drill >Reporter: Jacques Nadeau >Assignee: Hanifi Gunes > Fix For: Future > > > Embedded Vector will leverage a binary encoding for holding information about > type for each individual field. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3912) Common subexpression elimination in code generation
[ https://issues.apache.org/jira/browse/DRILL-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-3912: --- Issue Type: Improvement (was: Bug) > Common subexpression elimination in code generation > --- > > Key: DRILL-3912 > URL: https://issues.apache.org/jira/browse/DRILL-3912 > Project: Apache Drill > Issue Type: Improvement >Reporter: Steven Phillips >Assignee: Jinfeng Ni > > Drill currently will evaluate the full expression tree, even if there are > redundant subtrees. Many of these redundant evaluations can be eliminated by > reusing the results from previously evaluated expression trees. > For example, > {code} > select a + 1, (a + 1)* (a - 1) from t > {code} > Will compute the entire (a + 1) expression twice. With CSE, it will only be > evaluated once. > The benefit will be reducing the work done when evaluating expressions, as > well as reducing the amount of code that is generated, which could also lead > to better JIT optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3963) Read raw key value bytes from sequence files
[ https://issues.apache.org/jira/browse/DRILL-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-3963: --- Issue Type: New Feature (was: Bug) > Read raw key value bytes from sequence files > > > Key: DRILL-3963 > URL: https://issues.apache.org/jira/browse/DRILL-3963 > Project: Apache Drill > Issue Type: New Feature >Reporter: amit hadke >Assignee: amit hadke > > Sequence files store list of key-value pairs. Keys/values are of type hadoop > writable. > Provide a format plugin that reads raw bytes out of sequence files which can > be further deserialized by a udf(from hadoop writable -> drill type) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3232) Modify existing vectors to allow type promotion
[ https://issues.apache.org/jira/browse/DRILL-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969616#comment-14969616 ] Steven Phillips commented on DRILL-3232: Design document: https://gist.github.com/StevenMPhillips/41b4a1bd745943d508d2 > Modify existing vectors to allow type promotion > --- > > Key: DRILL-3232 > URL: https://issues.apache.org/jira/browse/DRILL-3232 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Codegen, Execution - Data Types, Execution - > Relational Operators, Functions - Drill >Reporter: Steven Phillips >Assignee: Hanifi Gunes > Fix For: 1.3.0 > > > Support the ability for existing vectors to be promoted similar to supported > implicit casting rules. > For example: > INT > DOUBLE > STRING > EMBEDDED -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3229) Create a new EmbeddedVector
[ https://issues.apache.org/jira/browse/DRILL-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969613#comment-14969613 ] Steven Phillips commented on DRILL-3229: Design document: https://gist.github.com/StevenMPhillips/41b4a1bd745943d508d2 > Create a new EmbeddedVector > --- > > Key: DRILL-3229 > URL: https://issues.apache.org/jira/browse/DRILL-3229 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Codegen, Execution - Data Types, Execution - > Relational Operators, Functions - Drill >Reporter: Jacques Nadeau >Assignee: Steven Phillips > Fix For: Future > > > Embedded Vector will leverage a binary encoding for holding information about > type for each individual field. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3228) Implement Embedded Type
[ https://issues.apache.org/jira/browse/DRILL-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968144#comment-14968144 ] Steven Phillips commented on DRILL-3228: Design document for Union Type: https://gist.github.com/StevenMPhillips/41b4a1bd745943d508d2 > Implement Embedded Type > --- > > Key: DRILL-3228 > URL: https://issues.apache.org/jira/browse/DRILL-3228 > Project: Apache Drill > Issue Type: Task > Components: Execution - Codegen, Execution - Data Types, Execution - > Relational Operators, Functions - Drill >Reporter: Jacques Nadeau >Assignee: Steven Phillips > Fix For: 1.3.0 > > > An Umbrella for the implementation of Embedded types within Drill. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3232) Modify existing vectors to allow type promotion
[ https://issues.apache.org/jira/browse/DRILL-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-3232: --- Fix Version/s: (was: Future) 1.3.0 > Modify existing vectors to allow type promotion > --- > > Key: DRILL-3232 > URL: https://issues.apache.org/jira/browse/DRILL-3232 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Codegen, Execution - Data Types, Execution - > Relational Operators, Functions - Drill >Reporter: Steven Phillips > Fix For: 1.3.0 > > > Support the ability for existing vectors to be promoted similar to supported > implicit casting rules. > For example: > INT > DOUBLE > STRING > EMBEDDED -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3232) Modify existing vectors to allow type promotion
[ https://issues.apache.org/jira/browse/DRILL-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964034#comment-14964034 ] Steven Phillips commented on DRILL-3232: PR at https://github.com/apache/drill/pull/207 > Modify existing vectors to allow type promotion > --- > > Key: DRILL-3232 > URL: https://issues.apache.org/jira/browse/DRILL-3232 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Codegen, Execution - Data Types, Execution - > Relational Operators, Functions - Drill >Reporter: Steven Phillips > Fix For: Future > > > Support the ability for existing vectors to be promoted similar to supported > implicit casting rules. > For example: > INT > DOUBLE > STRING > EMBEDDED -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3232) Modify existing vectors to allow type promotion
[ https://issues.apache.org/jira/browse/DRILL-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964036#comment-14964036 ] Steven Phillips commented on DRILL-3232: [~hgunes], could you please review this PR? > Modify existing vectors to allow type promotion > --- > > Key: DRILL-3232 > URL: https://issues.apache.org/jira/browse/DRILL-3232 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Codegen, Execution - Data Types, Execution - > Relational Operators, Functions - Drill >Reporter: Steven Phillips >Assignee: Hanifi Gunes > Fix For: 1.3.0 > > > Support the ability for existing vectors to be promoted similar to supported > implicit casting rules. > For example: > INT > DOUBLE > STRING > EMBEDDED -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-3232) Modify existing vectors to allow type promotion
[ https://issues.apache.org/jira/browse/DRILL-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips reassigned DRILL-3232: -- Assignee: Steven Phillips > Modify existing vectors to allow type promotion > --- > > Key: DRILL-3232 > URL: https://issues.apache.org/jira/browse/DRILL-3232 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Codegen, Execution - Data Types, Execution - > Relational Operators, Functions - Drill >Reporter: Steven Phillips >Assignee: Steven Phillips > Fix For: 1.3.0 > > > Support the ability for existing vectors to be promoted similar to supported > implicit casting rules. > For example: > INT > DOUBLE > STRING > EMBEDDED -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3232) Modify existing vectors to allow type promotion
[ https://issues.apache.org/jira/browse/DRILL-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-3232: --- Assignee: Hanifi Gunes (was: Steven Phillips) > Modify existing vectors to allow type promotion > --- > > Key: DRILL-3232 > URL: https://issues.apache.org/jira/browse/DRILL-3232 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Codegen, Execution - Data Types, Execution - > Relational Operators, Functions - Drill >Reporter: Steven Phillips >Assignee: Hanifi Gunes > Fix For: 1.3.0 > > > Support the ability for existing vectors to be promoted similar to supported > implicit casting rules. > For example: > INT > DOUBLE > STRING > EMBEDDED -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3953) Apache Drill - Memory Issue when using against hbase db on Windows machine
[ https://issues.apache.org/jira/browse/DRILL-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964190#comment-14964190 ] Steven Phillips commented on DRILL-3953: How much data is in the "tsdb" table, and what does it look like? If it is an OpenTSDB table, there could be thousands of unique column names, and Drill will create a vector and allocate memory for each one. It's possible that this is causing the problem. > Apache Drill - Memory Issue when using against hbase db on Windows machine > -- > > Key: DRILL-3953 > URL: https://issues.apache.org/jira/browse/DRILL-3953 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Reporter: Pete > > Trying a sandbox run using Drill on a Windows laptop with 4gbs of memory. > The Drill Explorer connection shows a successful execution to database (test > button). When trying to connect it shows processing but never comes back. > When trying to run query against database Drill on drill prompt it blows up > with out of memory. The query is simple enough that it shouldn't blow > up.. > select * from tsdbdatabase.tsdb limit 1; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-3233) Update code generation & function code to support reading and writing embedded type
[ https://issues.apache.org/jira/browse/DRILL-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips reassigned DRILL-3233: -- Assignee: Steven Phillips > Update code generation & function code to support reading and writing > embedded type > --- > > Key: DRILL-3233 > URL: https://issues.apache.org/jira/browse/DRILL-3233 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Codegen, Execution - Data Types, Execution - > Relational Operators, Functions - Drill >Reporter: Jacques Nadeau >Assignee: Steven Phillips > Fix For: Future > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3749) Upgrade Hadoop dependency to latest version (2.7.1)
[ https://issues.apache.org/jira/browse/DRILL-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-3749: --- Assignee: Jason Altekruse (was: Steven Phillips) > Upgrade Hadoop dependency to latest version (2.7.1) > --- > > Key: DRILL-3749 > URL: https://issues.apache.org/jira/browse/DRILL-3749 > Project: Apache Drill > Issue Type: New Feature > Components: Tools, Build & Test >Affects Versions: 1.1.0 >Reporter: Venki Korukanti >Assignee: Jason Altekruse > Fix For: Future > > > Logging a JIRA to track and discuss upgrading Drill's Hadoop dependency > version. Currently Drill depends on Hadoop 2.5.0 version. Newer version of > Hadoop (2.7.1) has following features. > 1) Better S3 support > 2) Ability to check if a user has certain permissions on file/directory > without performing operations on the file/dir. Useful for cases like > DRILL-3467. > As Drill is going to use higher version of Hadoop fileclient, there could be > potential issues when interacting with Hadoop services (such as HDFS) of > lower version than the fileclient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3712) Drill does not recognize UTF-16-LE encoding
[ https://issues.apache.org/jira/browse/DRILL-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949503#comment-14949503 ] Steven Phillips commented on DRILL-3712: I think one solution would be to write a UDF to convert from utf16 to utf8. We already have a function that does the reverse: CastVarCharVar16Char . > Drill does not recognize UTF-16-LE encoding > --- > > Key: DRILL-3712 > URL: https://issues.apache.org/jira/browse/DRILL-3712 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Text & CSV >Affects Versions: 1.1.0 > Environment: OSX, likely Linux. >Reporter: Edmon Begoli > Fix For: Future > > > We are unable to process files that OSX identifies as character sete UTF16LE. > After unzipping and converting to UTF8, we are able to process one fine. > There are CONVERT_TO and CONVERT_FROM commands that appear to address the > issue, but we were unable to make them work on a gzipped or unzipped version > of the UTF16 file. We were able to use CONVERT_FROM ok, but when we tried > to wrap the results of that to cast as a date, or anything else, it failed. > Trying to work with it natively caused the double-byte nature to appear (a > substring 1,4 only return the first two characters). > I cannot post the data because it is proprietary in nature, but I am posting > this code that might be useful in re-creating an issue: > {noformat} > #!/usr/bin/env python > """ Generates a test psv file with some text fields encoded as UTF-16-LE. """ > def write_utf16le_encoded_psv(): > total_lines = 10 > encoded = "Encoded B".encode("utf-16-le") > with open("test.psv","wb") as csv_file: > csv_file.write("header 1|header 2|header 3\n") > for i in xrange(total_lines): > csv_file.write("value > A"+str(i)+"|"+encoded+"|value C"+str(i)+"\n") > if __name__ == "__main__": > write_utf16le_encoded_psv() > {noformat} > then: > tar zcvf test.psv.tar.gz test.psv -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3712) Drill does not recognize UTF-16-LE encoding
[ https://issues.apache.org/jira/browse/DRILL-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949499#comment-14949499 ] Steven Phillips commented on DRILL-3712: The second column is utf16 encoded. I don't think any of our cast functions will deal with it properly. Nor will any of the string functions. > Drill does not recognize UTF-16-LE encoding > --- > > Key: DRILL-3712 > URL: https://issues.apache.org/jira/browse/DRILL-3712 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Text & CSV >Affects Versions: 1.1.0 > Environment: OSX, likely Linux. >Reporter: Edmon Begoli > Fix For: Future > > > We are unable to process files that OSX identifies as character sete UTF16LE. > After unzipping and converting to UTF8, we are able to process one fine. > There are CONVERT_TO and CONVERT_FROM commands that appear to address the > issue, but we were unable to make them work on a gzipped or unzipped version > of the UTF16 file. We were able to use CONVERT_FROM ok, but when we tried > to wrap the results of that to cast as a date, or anything else, it failed. > Trying to work with it natively caused the double-byte nature to appear (a > substring 1,4 only return the first two characters). > I cannot post the data because it is proprietary in nature, but I am posting > this code that might be useful in re-creating an issue: > {noformat} > #!/usr/bin/env python > """ Generates a test psv file with some text fields encoded as UTF-16-LE. """ > def write_utf16le_encoded_psv(): > total_lines = 10 > encoded = "Encoded B".encode("utf-16-le") > with open("test.psv","wb") as csv_file: > csv_file.write("header 1|header 2|header 3\n") > for i in xrange(total_lines): > csv_file.write("value > A"+str(i)+"|"+encoded+"|value C"+str(i)+"\n") > if __name__ == "__main__": > write_utf16le_encoded_psv() > {noformat} > then: > tar zcvf test.psv.tar.gz test.psv -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-3912) Common subexpression elimination
[ https://issues.apache.org/jira/browse/DRILL-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips reassigned DRILL-3912: -- Assignee: Steven Phillips > Common subexpression elimination > > > Key: DRILL-3912 > URL: https://issues.apache.org/jira/browse/DRILL-3912 > Project: Apache Drill > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Steven Phillips > > Drill currently will evaluate the full expression tree, even if there are > redundant subtrees. Many of these redundant evaluations can be eliminated by > reusing the results from previously evaluated expression trees. > For example, > {code} > select a + 1, (a + 1)* (a - 1) from t > {code} > Will compute the entire (a + 1) expression twice. With CSE, it will only be > evaluated once. > The benefit will be reducing the work done when evaluating expressions, as > well as reducing the amount of code that is generated, which could also lead > to better JIT optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3912) Common subexpression elimination
Steven Phillips created DRILL-3912: -- Summary: Common subexpression elimination Key: DRILL-3912 URL: https://issues.apache.org/jira/browse/DRILL-3912 Project: Apache Drill Issue Type: Bug Reporter: Steven Phillips Drill currently will evaluate the full expression tree, even if there are redundant subtrees. Many of these redundant evaluations can be eliminated by reusing the results from previously evaluated expression trees. For example, {code} select a + 1, (a + 1)* (a - 1) from t {code} Will compute the entire (a + 1) expression twice. With CSE, it will only be evaluated once. The benefit will be reducing the work done when evaluating expressions, as well as reducing the amount of code that is generated, which could also lead to better JIT optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3912) Common subexpression elimination in code generation
[ https://issues.apache.org/jira/browse/DRILL-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947810#comment-14947810 ] Steven Phillips commented on DRILL-3912: It looks like your patch does a subset of my patch. It will eliminate common vector read expressions in the same JBlock. My patch will eliminate any redundant expression as long as the previously evaluated expression is in scope. For example, with filter: ( a + b > 0 and ( a + b = c or a + b = d)) the expression (a + b) would currently have to be computed 3 times, and each reference to a and b would require accessing the corresponding vectors. With my patch, (a + b) would only be calculated once. > Common subexpression elimination in code generation > --- > > Key: DRILL-3912 > URL: https://issues.apache.org/jira/browse/DRILL-3912 > Project: Apache Drill > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Jinfeng Ni > > Drill currently will evaluate the full expression tree, even if there are > redundant subtrees. Many of these redundant evaluations can be eliminated by > reusing the results from previously evaluated expression trees. > For example, > {code} > select a + 1, (a + 1)* (a - 1) from t > {code} > Will compute the entire (a + 1) expression twice. With CSE, it will only be > evaluated once. > The benefit will be reducing the work done when evaluating expressions, as > well as reducing the amount of code that is generated, which could also lead > to better JIT optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3909) Decimal round functions corrupts input data
Steven Phillips created DRILL-3909: -- Summary: Decimal round functions corrupts input data Key: DRILL-3909 URL: https://issues.apache.org/jira/browse/DRILL-3909 Project: Apache Drill Issue Type: Bug Reporter: Steven Phillips Fix For: 1.3.0 The Decimal 28 and 38 round functions, instead of creating a new buffer and copying data from the incoming buffer, set the output buffer equal to the input buffer, and then subsequently mutate the data in that buffer. This causes the data in the input buffer to be corrupted. A simple example to reproduce: {code} $ cat a.json { a : "9.95678" } 0: jdbc:drill:drillbit=localhost> create table a as select cast(a as decimal(38,18)) a from `a.json`; +---++ | Fragment | Number of records written | +---++ | 0_0 | 1 | +---++ 1 row selected (0.206 seconds) 0: jdbc:drill:drillbit=localhost> select round(a, 9) from a; +---+ |EXPR$0 | +---+ | 10.0 | +---+ 1 row selected (0.121 seconds) 0: jdbc:drill:drillbit=localhost> select round(a, 11) from a; ++ | EXPR$0 | ++ | 9.957 | ++ 1 row selected (0.115 seconds) 0: jdbc:drill:drillbit=localhost> select round(a, 9), round(a, 11) from a; +---++ |EXPR$0 | EXPR$1 | +---++ | 10.0 | 1.000 | +---++ {code} In the third example, there are two round expressions operating on the same incoming decimal vector, and you can see that the result for the second expression is incorrect. Not critical because Decimal type is considered alpha right now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3912) Common subexpression elimination in code generation
[ https://issues.apache.org/jira/browse/DRILL-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-3912: --- Assignee: Jinfeng Ni (was: Steven Phillips) > Common subexpression elimination in code generation > --- > > Key: DRILL-3912 > URL: https://issues.apache.org/jira/browse/DRILL-3912 > Project: Apache Drill > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Jinfeng Ni > > Drill currently will evaluate the full expression tree, even if there are > redundant subtrees. Many of these redundant evaluations can be eliminated by > reusing the results from previously evaluated expression trees. > For example, > {code} > select a + 1, (a + 1)* (a - 1) from t > {code} > Will compute the entire (a + 1) expression twice. With CSE, it will only be > evaluated once. > The benefit will be reducing the work done when evaluating expressions, as > well as reducing the amount of code that is generated, which could also lead > to better JIT optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3912) Common subexpression elimination
[ https://issues.apache.org/jira/browse/DRILL-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947731#comment-14947731 ] Steven Phillips commented on DRILL-3912: Yes, Drill physical plans are currently trees only. What you are suggesting require a more general DAG execution. This patch only deals with common expressions within operators, and does its work right at code-generation time. > Common subexpression elimination > > > Key: DRILL-3912 > URL: https://issues.apache.org/jira/browse/DRILL-3912 > Project: Apache Drill > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Steven Phillips > > Drill currently will evaluate the full expression tree, even if there are > redundant subtrees. Many of these redundant evaluations can be eliminated by > reusing the results from previously evaluated expression trees. > For example, > {code} > select a + 1, (a + 1)* (a - 1) from t > {code} > Will compute the entire (a + 1) expression twice. With CSE, it will only be > evaluated once. > The benefit will be reducing the work done when evaluating expressions, as > well as reducing the amount of code that is generated, which could also lead > to better JIT optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3912) Common subexpression elimination in code generation
[ https://issues.apache.org/jira/browse/DRILL-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947906#comment-14947906 ] Steven Phillips commented on DRILL-3912: 1) I had not enabled CSE in hash join, so it didn't have that problem. Now that I have enabled in hash join, I am seeing the same SR error. 2) In this case, it looks like the ConstantFilter is causing the '1 + 2' and '1 + 3' parts of the expressions to be resolved first, and then 'a + 1' is no longer common. Duplicate vectors reads are removed, though. I think this behavior is probably fine. 3) I am not targeting this for 1.2. Probably for 1.3. My main motivation here was to solve a problem I was running into in my Union-type work. Function resolution when there is Union type for the input involves case statements that check the current type of the input, and then executes a branch based on that type. In this case, both the condition expression as well as both branches will reference the input. For example, 1 + a would become something like {code} case when typeOf(a) = int then 1 + cast(a as int) when typeOf(a) = varchar then 1 + cast(cast(a as varchar) as int) end {code} So you can see that a single reference to 'a' becomes 3 references. And 'a' might not just be a ValueVectorReadExpression, it could be the output from some other expression tree. And if an input has more than 2 types, or if a function has multiple Union-type inputs, the complexity of the expression increases dramatically, and the amount of generated code gets to be quite large. I needed to find some way to fix this. > Common subexpression elimination in code generation > --- > > Key: DRILL-3912 > URL: https://issues.apache.org/jira/browse/DRILL-3912 > Project: Apache Drill > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Jinfeng Ni > > Drill currently will evaluate the full expression tree, even if there are > redundant subtrees. Many of these redundant evaluations can be eliminated by > reusing the results from previously evaluated expression trees. > For example, > {code} > select a + 1, (a + 1)* (a - 1) from t > {code} > Will compute the entire (a + 1) expression twice. With CSE, it will only be > evaluated once. > The benefit will be reducing the work done when evaluating expressions, as > well as reducing the amount of code that is generated, which could also lead > to better JIT optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3901) Performance regression with doing Explain of COUNT(*) over 100K files
[ https://issues.apache.org/jira/browse/DRILL-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945559#comment-14945559 ] Steven Phillips commented on DRILL-3901: I'm not sure about doing the directory expansion twice, but I do know that in the case where there is a metadata file, we are loading the file twice. The first time we read the metadata file, we should pass the metadata object to ParquetGroupScan, and continue passing the metadata object to any clones of the ParquetGroupScan, so that we don't have to read and deserialize the file more than once. I didn't think this was a big enough deal to stop the release, but it looking at these numbers, it might be worth fixing now rather than putting off to the next release. > Performance regression with doing Explain of COUNT(*) over 100K files > - > > Key: DRILL-3901 > URL: https://issues.apache.org/jira/browse/DRILL-3901 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Aman Sinha >Assignee: Mehant Baid > > We are seeing a performance regression when doing an Explain of SELECT > COUNT(*) over 100K files in a flat directory (no subdirectories) on latest > master branch compared to a run that was done on Sept 26. Some initial > details (I will have more later): > {code} > master branch on Sept 26 >No metadata cache: 71.452 secs >With metadata cache: 15.804 secs > Latest master branch >No metadata cache: 110 secs >With metadata cache: 32 secs > {code} > So, both cases show regression. > [~mehant] and I took an initial look at this and it appears we might be doing > the directory expansion twice. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3892) Metadata cache not being leveraged when partition pruning is taking place
[ https://issues.apache.org/jira/browse/DRILL-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943027#comment-14943027 ] Steven Phillips commented on DRILL-3892: +1 > Metadata cache not being leveraged when partition pruning is taking place > - > > Key: DRILL-3892 > URL: https://issues.apache.org/jira/browse/DRILL-3892 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.2.0 >Reporter: Rahul Challapalli >Assignee: Aman Sinha > Fix For: 1.3.0 > > Attachments: > 0001-DRILL-3892-Once-usedMetadataFile-is-set-to-true-don-.patch, > lineitem_deletecache.tgz > > > git.commit.id.abbrev=92638dc > As we can see from the below plan, metadata cache is not being leveraged even > when the cache file is being present > {code} > 0: jdbc:drill:zk=10.10.100.190:5181> refresh table metadata > dfs.`/drill/testdata/metadata_caching/lineitem_deletecache`; > +---+-+ > | ok | summary > | > +---+-+ > | true | Successfully updated metadata for table > /drill/testdata/metadata_caching/lineitem_deletecache. | > +---+-+ > 1 row selected (0.402 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> explain plan for select count(*) from > dfs.`/drill/testdata/metadata_caching/lineitem_deletecache` where dir0=2006 > group by l_linestatus; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(EXPR$0=[$1]) > 00-02HashAgg(group=[{0}], EXPR$0=[COUNT()]) > 00-03 Project(l_linestatus=[$0]) > 00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:/drill/testdata/metadata_caching/lineitem_deletecache/2006/1/lineitem_999.parquet]], > selectionRoot=maprfs:/drill/testdata/metadata_caching/lineitem_deletecache, > numFiles=1, usedMetadataFile=false, columns=[`l_linestatus`, `dir0`]]]) > {code} > I attached the data set used. Let me know if you need anything more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3887) Parquet metadata cache not being used
[ https://issues.apache.org/jira/browse/DRILL-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940799#comment-14940799 ] Steven Phillips commented on DRILL-3887: See https://github.com/apache/drill/pull/186 > Parquet metadata cache not being used > - > > Key: DRILL-3887 > URL: https://issues.apache.org/jira/browse/DRILL-3887 > Project: Apache Drill > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Steven Phillips >Priority: Critical > > The fix for DRILL-3788 causes a directory to be expanded to its list of files > early in the query, and this change causes the ParquetGroupScan to no longer > use the parquet metadata file, even when it is there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3887) Parquet metadata cache not being used
[ https://issues.apache.org/jira/browse/DRILL-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-3887: --- Assignee: Mehant Baid (was: Steven Phillips) > Parquet metadata cache not being used > - > > Key: DRILL-3887 > URL: https://issues.apache.org/jira/browse/DRILL-3887 > Project: Apache Drill > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Mehant Baid >Priority: Critical > > The fix for DRILL-3788 causes a directory to be expanded to its list of files > early in the query, and this change causes the ParquetGroupScan to no longer > use the parquet metadata file, even when it is there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3867) Metadata Caching : Moving a directory which contains a cache file causes subsequent queries to fail
[ https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-3867: --- Assignee: Mehant Baid (was: Steven Phillips) > Metadata Caching : Moving a directory which contains a cache file causes > subsequent queries to fail > --- > > Key: DRILL-3867 > URL: https://issues.apache.org/jira/browse/DRILL-3867 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.2.0 >Reporter: Rahul Challapalli >Assignee: Mehant Baid > Fix For: 1.2.0 > > > git.commit.id.abbrev=cf4f745 > git.commit.time=29.09.2015 @ 23\:19\:52 UTC > The below sequence of steps reproduces the issue > 1. Create the cache file > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata > dfs.`/drill/testdata/metadata_caching/lineitem`; > +---+-+ > | ok | summary > | > +---+-+ > | true | Successfully updated metadata for table > /drill/testdata/metadata_caching/lineitem. | > +---+-+ > 1 row selected (1.558 seconds) > {code} > 2. Move the directory > {code} > hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/ > {code} > 3. Now run a query on top of it > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit > 1; > Error: SYSTEM ERROR: FileNotFoundException: Requested file > maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist. > [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] > (state=,code=0) > {code} > This is obvious given the fact that we are storing absolute file paths in the > cache file -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3867) Metadata Caching : Moving a directory which contains a cache file causes subsequent queries to fail
[ https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940804#comment-14940804 ] Steven Phillips commented on DRILL-3867: See https://github.com/apache/drill/pull/186 > Metadata Caching : Moving a directory which contains a cache file causes > subsequent queries to fail > --- > > Key: DRILL-3867 > URL: https://issues.apache.org/jira/browse/DRILL-3867 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.2.0 >Reporter: Rahul Challapalli >Assignee: Steven Phillips > Fix For: 1.2.0 > > > git.commit.id.abbrev=cf4f745 > git.commit.time=29.09.2015 @ 23\:19\:52 UTC > The below sequence of steps reproduces the issue > 1. Create the cache file > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata > dfs.`/drill/testdata/metadata_caching/lineitem`; > +---+-+ > | ok | summary > | > +---+-+ > | true | Successfully updated metadata for table > /drill/testdata/metadata_caching/lineitem. | > +---+-+ > 1 row selected (1.558 seconds) > {code} > 2. Move the directory > {code} > hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/ > {code} > 3. Now run a query on top of it > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit > 1; > Error: SYSTEM ERROR: FileNotFoundException: Requested file > maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist. > [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] > (state=,code=0) > {code} > This is obvious given the fact that we are storing absolute file paths in the > cache file -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3887) Parquet metadata cache not being used
[ https://issues.apache.org/jira/browse/DRILL-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940803#comment-14940803 ] Steven Phillips commented on DRILL-3887: It was a detail in the code that I missed. I added the field "usedCache", which will show up in the physical plan. There is a unit test that tests this, and this can also be used by qa for functional testing. > Parquet metadata cache not being used > - > > Key: DRILL-3887 > URL: https://issues.apache.org/jira/browse/DRILL-3887 > Project: Apache Drill > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Steven Phillips >Priority: Critical > > The fix for DRILL-3788 causes a directory to be expanded to its list of files > early in the query, and this change causes the ParquetGroupScan to no longer > use the parquet metadata file, even when it is there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3820) Nested Directories : Metadata Cache in a directory stores information from sub-directories as well creating security issues
[ https://issues.apache.org/jira/browse/DRILL-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941735#comment-14941735 ] Steven Phillips commented on DRILL-3820: My initial thought was to simply set the permissions to 700 for the metadata file. But that would cause problems when there is impersonation, as the impersonated user would not be able to read the metadata file. I actually think the best approach is to have the REFRESH command run as the user who gave the command, not the drill process user. That way, only a user who has permission to read all of the subdirectories and files, as well as write to all of the directories, will be able to run the REFRESH command. The metadata file should have the same owner and permissions as the directory it is placed in. It should be documented that running this command will expose some amount of metadata in all underlying directories to anyone who has permission to read the top level directory. This will at the very least prevent someone from exploiting the REFRESH command in order to access metadata in a directory that don't have permission to read. > Nested Directories : Metadata Cache in a directory stores information from > sub-directories as well creating security issues > --- > > Key: DRILL-3820 > URL: https://issues.apache.org/jira/browse/DRILL-3820 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Reporter: Rahul Challapalli >Assignee: Steven Phillips >Priority: Critical > Fix For: 1.2.0 > > > git.commit.id.abbrev=3c89b30 > User A has access to lineitem folder and its subfolders > User B had access to lineitem folder but not its sub-folders. > Now when User A runs the "refresh table metadata lineitem" command, the cache > file gets created under lineitem folder. This file contains information from > the underlying sub-directories as well. > Now User B can download this file and get access to information which he > should not be seeing in the first place. > This can be very easily reproducible if impersonation is enabled on the > cluster. > Let me know if you need more information to reproduce this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3820) Nested Directories : Metadata Cache in a directory stores information from sub-directories as well creating security issues
[ https://issues.apache.org/jira/browse/DRILL-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-3820: --- Assignee: Aman Sinha (was: Steven Phillips) > Nested Directories : Metadata Cache in a directory stores information from > sub-directories as well creating security issues > --- > > Key: DRILL-3820 > URL: https://issues.apache.org/jira/browse/DRILL-3820 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Reporter: Rahul Challapalli >Assignee: Aman Sinha >Priority: Critical > Fix For: 1.2.0 > > > git.commit.id.abbrev=3c89b30 > User A has access to lineitem folder and its subfolders > User B had access to lineitem folder but not its sub-folders. > Now when User A runs the "refresh table metadata lineitem" command, the cache > file gets created under lineitem folder. This file contains information from > the underlying sub-directories as well. > Now User B can download this file and get access to information which he > should not be seeing in the first place. > This can be very easily reproducible if impersonation is enabled on the > cluster. > Let me know if you need more information to reproduce this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3820) Nested Directories : Metadata Cache in a directory stores information from sub-directories as well creating security issues
[ https://issues.apache.org/jira/browse/DRILL-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-3820: --- Fix Version/s: (was: 1.2.0) 1.3.0 > Nested Directories : Metadata Cache in a directory stores information from > sub-directories as well creating security issues > --- > > Key: DRILL-3820 > URL: https://issues.apache.org/jira/browse/DRILL-3820 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Reporter: Rahul Challapalli >Assignee: Aman Sinha >Priority: Critical > Fix For: 1.3.0 > > > git.commit.id.abbrev=3c89b30 > User A has access to lineitem folder and its subfolders > User B had access to lineitem folder but not its sub-folders. > Now when User A runs the "refresh table metadata lineitem" command, the cache > file gets created under lineitem folder. This file contains information from > the underlying sub-directories as well. > Now User B can download this file and get access to information which he > should not be seeing in the first place. > This can be very easily reproducible if impersonation is enabled on the > cluster. > Let me know if you need more information to reproduce this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3867) Metadata Caching : Moving a directory which contains a cache file causes subsequent queries to fail
[ https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-3867: --- Fix Version/s: (was: 1.2.0) 1.3.0 > Metadata Caching : Moving a directory which contains a cache file causes > subsequent queries to fail > --- > > Key: DRILL-3867 > URL: https://issues.apache.org/jira/browse/DRILL-3867 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.2.0 >Reporter: Rahul Challapalli >Assignee: Mehant Baid > Fix For: 1.3.0 > > > git.commit.id.abbrev=cf4f745 > git.commit.time=29.09.2015 @ 23\:19\:52 UTC > The below sequence of steps reproduces the issue > 1. Create the cache file > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata > dfs.`/drill/testdata/metadata_caching/lineitem`; > +---+-+ > | ok | summary > | > +---+-+ > | true | Successfully updated metadata for table > /drill/testdata/metadata_caching/lineitem. | > +---+-+ > 1 row selected (1.558 seconds) > {code} > 2. Move the directory > {code} > hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/ > {code} > 3. Now run a query on top of it > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit > 1; > Error: SYSTEM ERROR: FileNotFoundException: Requested file > maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist. > [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] > (state=,code=0) > {code} > This is obvious given the fact that we are storing absolute file paths in the > cache file -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3887) Parquet metadata cache not being used
[ https://issues.apache.org/jira/browse/DRILL-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips resolved DRILL-3887. Resolution: Fixed > Parquet metadata cache not being used > - > > Key: DRILL-3887 > URL: https://issues.apache.org/jira/browse/DRILL-3887 > Project: Apache Drill > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Mehant Baid >Priority: Critical > > The fix for DRILL-3788 causes a directory to be expanded to its list of files > early in the query, and this change causes the ParquetGroupScan to no longer > use the parquet metadata file, even when it is there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3887) Parquet metadata cache not being used
[ https://issues.apache.org/jira/browse/DRILL-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942007#comment-14942007 ] Steven Phillips commented on DRILL-3887: Fixed by 1cfd4c2 > Parquet metadata cache not being used > - > > Key: DRILL-3887 > URL: https://issues.apache.org/jira/browse/DRILL-3887 > Project: Apache Drill > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Mehant Baid >Priority: Critical > > The fix for DRILL-3788 causes a directory to be expanded to its list of files > early in the query, and this change causes the ParquetGroupScan to no longer > use the parquet metadata file, even when it is there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3229) Create a new EmbeddedVector
[ https://issues.apache.org/jira/browse/DRILL-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939640#comment-14939640 ] Steven Phillips commented on DRILL-3229: i) In this first iteration, Union types will be enabled with an option, and they will be created in Json Reader and Mongo reader automatically if the option is enabled. Everything will be a Union type in this case. A future patch will work on promoting from non-union once it is necessary to promote. ii) Your understanding is correct. One change from the earlier comment, there is no "bits" vector. The underlying primitive type vectors will have their own "bits" for tracking nulls. The type vector with a value of zero will also indicate null. Without going into much detail at this point, I can answer the next paragraph of question by saying that this patch will allow reading of any valid json. It also has a more literal representation of the json, e.g. null values will be treated as null, instead of empty maps/lists. The patch also includes functions for inspecting the type of a field, which can be used with case statements to handle the data based on which type it is. Though it may be somewhat cumbersome, with these tools you should be able to run almost any query against dynamic json data. This will generally involve using introspection and case statements to remove the Union types early in the query. Future work will eliminate the need for this in many cases. One notable exception is that flatten is not supported in this initial patch. > Create a new EmbeddedVector > --- > > Key: DRILL-3229 > URL: https://issues.apache.org/jira/browse/DRILL-3229 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Codegen, Execution - Data Types, Execution - > Relational Operators, Functions - Drill >Reporter: Jacques Nadeau >Assignee: Steven Phillips > Fix For: Future > > > Embedded Vector will leverage a binary encoding for holding information about > type for each individual field. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3820) Nested Directories : Metadata Cache in a directory stores information from sub-directories as well creating security issues
[ https://issues.apache.org/jira/browse/DRILL-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-3820: --- Assignee: (was: Steven Phillips) > Nested Directories : Metadata Cache in a directory stores information from > sub-directories as well creating security issues > --- > > Key: DRILL-3820 > URL: https://issues.apache.org/jira/browse/DRILL-3820 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Reporter: Rahul Challapalli >Priority: Critical > Fix For: 1.2.0 > > > git.commit.id.abbrev=3c89b30 > User A has access to lineitem folder and its subfolders > User B had access to lineitem folder but not its sub-folders. > Now when User A runs the "refresh table metadata lineitem" command, the cache > file gets created under lineitem folder. This file contains information from > the underlying sub-directories as well. > Now User B can download this file and get access to information which he > should not be seeing in the first place. > This can be very easily reproducible if impersonation is enabled on the > cluster. > Let me know if you need more information to reproduce this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3468) CTAS IOB
[ https://issues.apache.org/jira/browse/DRILL-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-3468: --- Assignee: (was: Steven Phillips) > CTAS IOB > > > Key: DRILL-3468 > URL: https://issues.apache.org/jira/browse/DRILL-3468 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.2.0 >Reporter: Khurram Faraaz >Priority: Critical > > I am seeing a IOB when I use same table name in CTAS, after deleting the > previously create parquet file. > {code} > 0: jdbc:drill:schema=dfs.tmp> CREATE TABLE tbl_allData AS SELECT > CAST(columns[0] as INT ), CAST(columns[1] as BIGINT ), CAST(columns[2] as > CHAR(2) ), CAST(columns[3] as VARCHAR(52) ), CAST(columns[4] as TIMESTAMP ), > CAST(columns[5] as DATE ), CAST(columns[6] as BOOLEAN ), CAST(columns[7] as > DOUBLE), CAST( columns[8] as TIME) FROM `allData.csv`; > +---++ > | Fragment | Number of records written | > +---++ > | 0_0 | 11196 | > +---++ > 1 row selected (1.864 seconds) > {code} > Remove the parquet file that was created by the above CTAS. > {code} > [root@centos-01 aggregates]# hadoop fs -ls /tmp/tbl_allData > Found 1 items > -rwxr-xr-x 3 mapr mapr 397868 2015-07-07 21:08 > /tmp/tbl_allData/0_0_0.parquet > [root@centos-01 aggregates]# hadoop fs -rm /tmp/tbl_allData/0_0_0.parquet > 15/07/07 21:10:47 INFO Configuration.deprecation: io.bytes.per.checksum is > deprecated. Instead, use dfs.bytes-per-checksum > 15/07/07 21:10:47 INFO fs.TrashPolicyDefault: Namenode trash configuration: > Deletion interval = 0 minutes, Emptier interval = 0 minutes. > Deleted /tmp/tbl_allData/0_0_0.parquet > {code} > I see a IOB when I CTAS with same table name as the one that was removed in > the above step. > {code} > 0: jdbc:drill:schema=dfs.tmp> CREATE TABLE tbl_allData AS SELECT > CAST(columns[0] as INT ), CAST(columns[1] as BIGINT ), CAST(columns[2] as > CHAR(2) ), CAST(columns[3] as VARCHAR(52) ), CAST(columns[4] as TIMESTAMP ), > CAST(columns[5] as DATE ), CAST(columns[6] as BOOLEAN ), CAST(columns[7] as > DOUBLE), CAST( columns[8] as TIME) FROM `lessData.csv`; > Error: SYSTEM ERROR: IndexOutOfBoundsException: Index: 0, Size: 0 > [Error Id: 6d6df8e9-699c-4475-8ad3-183c0a91dc99 on centos-02.qa.lab:31010] > (state=,code=0) > {code} > stack trace from drillbit.log > {code} > org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception > during fragment initialization: Failure while trying to check if a table or > view with given name [tbl_allData] already exists in schema [dfs.tmp]: Index: > 0, Size: 0 > at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:253) > [drill-java-exec-1.1.0.jar:1.1.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_45] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] > Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Failure > while trying to check if a table or view with given name [tbl_allData] > already exists in schema [dfs.tmp]: Index: 0, Size: 0 > at > org.apache.drill.exec.planner.sql.handlers.SqlHandlerUtil.getTableFromSchema(SqlHandlerUtil.java:222) > ~[drill-java-exec-1.1.0.jar:1.1.0] > at > org.apache.drill.exec.planner.sql.handlers.CreateTableHandler.getPlan(CreateTableHandler.java:88) > ~[drill-java-exec-1.1.0.jar:1.1.0] > at > org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:178) > ~[drill-java-exec-1.1.0.jar:1.1.0] > at > org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:903) > [drill-java-exec-1.1.0.jar:1.1.0] > at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:242) > [drill-java-exec-1.1.0.jar:1.1.0] > ... 3 common frames omitted > Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[na:1.7.0_45] > at java.util.ArrayList.get(ArrayList.java:411) ~[na:1.7.0_45] > at > org.apache.drill.exec.store.dfs.FileSelection.getFirstPath(FileSelection.java:100) > ~[drill-java-exec-1.1.0.jar:1.1.0] > at > org.apache.drill.exec.store.dfs.BasicFormatMatcher.isReadable(BasicFormatMatcher.java:75) > ~[drill-java-exec-1.1.0.jar:1.1.0] > at > org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create(WorkspaceSchemaFactory.java:303) > ~[drill-java-exec-1.1.0.jar:1.1.0] > at >
[jira] [Updated] (DRILL-2475) Handle IterOutcome.NONE correctly in operators
[ https://issues.apache.org/jira/browse/DRILL-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-2475: --- Assignee: (was: Steven Phillips) > Handle IterOutcome.NONE correctly in operators > -- > > Key: DRILL-2475 > URL: https://issues.apache.org/jira/browse/DRILL-2475 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 0.8.0 >Reporter: Venki Korukanti > Fix For: 1.2.0 > > > Currently not all operators are handling the NONE (with no OK_NEW_SCHEMA) > correctly. This JIRA is to go through the operators and check if it handling > the NONE correctly or not and modify accordingly. > (from DRILL-2453) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2975) Extended Json : Time type reporting data which is dependent on the system on which it ran
[ https://issues.apache.org/jira/browse/DRILL-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-2975: --- Assignee: (was: Steven Phillips) > Extended Json : Time type reporting data which is dependent on the system on > which it ran > - > > Key: DRILL-2975 > URL: https://issues.apache.org/jira/browse/DRILL-2975 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON >Affects Versions: 1.2.0 >Reporter: Rahul Challapalli >Priority: Critical > Fix For: 1.3.0 > > > git.commit.id.abbrev=3b19076 > Data : > {code} > { > "int_col" : {"$numberLong": 1}, > "date_col" : {"$dateDay": "2012-05-22"}, > "time_col" : {"$time": "19:20:30.45Z"} > } > {code} > System 1 : > {code} > 0: jdbc:drill:schema=dfs_eea> select time_col from `extended_json/data1.json` > d; > ++ > | time_col | > ++ > | 19:20:30.450 | > ++ > {code} > System 2 : > {code} > 0: jdbc:drill:schema=dfs.drillTestDirComplexP> select time_col from > `temp.json`; > ++ > | time_col | > ++ > | 11:20:30.450 | > ++ > {code} > The above results are inconsistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2385) count on complex objects failed with missing function implementation
[ https://issues.apache.org/jira/browse/DRILL-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-2385: --- Assignee: (was: Steven Phillips) > count on complex objects failed with missing function implementation > > > Key: DRILL-2385 > URL: https://issues.apache.org/jira/browse/DRILL-2385 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 0.8.0 >Reporter: Chun Chang >Priority: Minor > Fix For: 1.4.0 > > > #Wed Mar 04 01:23:42 EST 2015 > git.commit.id.abbrev=71b6bfe > Have a complex type looks like the following: > {code} > 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.sia from > `complex.json` t limit 1; > ++ > |sia | > ++ > | [1,11,101,1001] | > ++ > {code} > A count on the complex type will fail with missing function implementation: > {code} > 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.gbyi, count(t.sia) > countsia from `complex.json` t group by t.gbyi; > Query failed: RemoteRpcException: Failure while running fragment., Schema is > currently null. You must call buildSchema(SelectionVectorMode) before this > container can return a schema. [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on > qa-node119.qa.lab:31010 ] > [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on qa-node119.qa.lab:31010 ] > Error: exception while executing query: Failure while executing query. > (state=,code=0) > {code} > drillbit.log > {code} > 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] ERROR > o.a.drill.exec.ops.FragmentContext - Fragment Context received failure. > org.apache.drill.exec.exception.SchemaChangeException: Failure while > materializing expression. > Error in expression at index 0. Error: Missing function implementation: > [count(BIGINT-REPEATED)]. Full expression: null. > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregatorInternal(HashAggBatch.java:210) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregator(HashAggBatch.java:158) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema(HashAggBatch.java:101) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:130) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:114) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:121) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_45] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] > 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] WARN > o.a.d.e.w.fragment.FragmentExecutor - Error while initializing or executing > fragment > java.lang.NullPointerException: Schema is currently null. You must call > buildSchema(SelectionVectorMode) before this container can return a schema. > at > com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208) > ~[guava-14.0.1.jar:na] > at > org.apache.drill.exec.record.VectorContainer.getSchema(VectorContainer.java:261) > ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.getSchema(AbstractRecordBatch.java:155) > ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at >
[jira] [Updated] (DRILL-1681) select with limit on directory with csv files takes quite long to terminate
[ https://issues.apache.org/jira/browse/DRILL-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-1681: --- Assignee: (was: Steven Phillips) > select with limit on directory with csv files takes quite long to terminate > --- > > Key: DRILL-1681 > URL: https://issues.apache.org/jira/browse/DRILL-1681 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Text & CSV >Reporter: Suresh Ollala >Priority: Minor > Fix For: 1.3.0 > > > query like select * from `/drill/data` limit 100 takes quite long to > terminate, about 20+ seconds. > /drill/data includes overall 1100 csv files, all in single directory. > select * from `/drill/data/d2.csv` limit 100; terminates in 0.2 seconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2428) Drill Build failed : git.properties isn't a file.
[ https://issues.apache.org/jira/browse/DRILL-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-2428: --- Assignee: (was: Steven Phillips) > Drill Build failed : git.properties isn't a file. > - > > Key: DRILL-2428 > URL: https://issues.apache.org/jira/browse/DRILL-2428 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test >Reporter: Praveen >Priority: Critical > Fix For: Future > > > I am build the Drill from source . i am getting the following error. > Applied patch provide for the same issue. but not working . Can you please > provide the solution. > ties to archive location: apache-drill-0.7.0-SNAPSHOT/git.properties > [INFO] > > [INFO] Reactor Summary: > [INFO] > [INFO] Apache Drill Root POM . SUCCESS [ 10.186 s] > [INFO] Drill Protocol SUCCESS [ 7.479 s] > [INFO] Common (Logical Plan, Base expressions) ... SUCCESS [ 10.150 s] > [INFO] contrib/Parent Pom SUCCESS [ 2.490 s] > [INFO] contrib/data/Parent Pom ... SUCCESS [ 0.302 s] > [INFO] contrib/data/tpch-sample-data . SUCCESS [ 3.259 s] > [INFO] exec/Parent Pom ... SUCCESS [ 3.465 s] > [INFO] exec/Java Execution Engine SUCCESS [02:12 min] > [INFO] contrib/hive-storage-plugin/Parent Pom SUCCESS [ 2.250 s] > [INFO] contrib/hive-storage-plugin/hive-exec-shaded .. SUCCESS [ 32.738 s] > [INFO] contrib/hive-storage-plugin/core .. SUCCESS [ 9.415 s] > [INFO] exec/JDBC Driver using dependencies ... SUCCESS [ 7.383 s] > [INFO] JDBC JAR with all dependencies SUCCESS [01:47 min] > [INFO] exec/Drill expression interpreter . SUCCESS [ 20.441 s] > [INFO] contrib/mongo-storage-plugin .. SUCCESS [ 7.914 s] > [INFO] contrib/hbase-storage-plugin .. SUCCESS [ 8.501 s] > [INFO] Packaging and Distribution Assembly ... FAILURE [ 2.770 s] > [INFO] contrib/sqlline ... SKIPPED > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 06:09 min > [INFO] Finished at: 2015-03-11T16:22:49+05:30 > [INFO] Final Memory: 69M/526M > [INFO] > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-assembly-plugin:2. > 4:single (distro-assembly) on project distribution: Failed to create > assembly: E > rror adding file to archive: > D:\drill\drill-0.7.0\distribution\target\classes\gi > t.properties isn't a file. -> [Help 1] > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute > goal o > rg.apache.maven.plugins:maven-assembly-plugin:2.4:single (distro-assembly) on > pr > oject distribution: Failed to create assembly: Error adding file to archive: > D:\ > drill\drill-0.7.0\distribution\target\classes\git.properties isn't a file. > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor > .java:216) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor > .java:153) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor > .java:145) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProje > ct(LifecycleModuleBuilder.java:108) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProje > ct(LifecycleModuleBuilder.java:76) > at > org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThre > adedBuilder.build(SingleThreadedBuilder.java:51) > at > org.apache.maven.lifecycle.internal.LifecycleStarter.execute(Lifecycl > eStarter.java:116) > at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:361) > at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155) > at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584) > at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:213) > at org.apache.maven.cli.MavenCli.main(MavenCli.java:157) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. > java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces > sorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at >
[jira] [Commented] (DRILL-3867) Metadata Caching : Moving a directory which contains a cache file causes subsequent queries to fail
[ https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940523#comment-14940523 ] Steven Phillips commented on DRILL-3867: We should store the paths relative to the directory containing the metadata file. > Metadata Caching : Moving a directory which contains a cache file causes > subsequent queries to fail > --- > > Key: DRILL-3867 > URL: https://issues.apache.org/jira/browse/DRILL-3867 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.2.0 >Reporter: Rahul Challapalli >Assignee: Steven Phillips > Fix For: 1.2.0 > > > git.commit.id.abbrev=cf4f745 > git.commit.time=29.09.2015 @ 23\:19\:52 UTC > The below sequence of steps reproduces the issue > 1. Create the cache file > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata > dfs.`/drill/testdata/metadata_caching/lineitem`; > +---+-+ > | ok | summary > | > +---+-+ > | true | Successfully updated metadata for table > /drill/testdata/metadata_caching/lineitem. | > +---+-+ > 1 row selected (1.558 seconds) > {code} > 2. Move the directory > {code} > hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/ > {code} > 3. Now run a query on top of it > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit > 1; > Error: SYSTEM ERROR: FileNotFoundException: Requested file > maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist. > [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] > (state=,code=0) > {code} > This is obvious given the fact that we are storing absolute file paths in the > cache file -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-3820) Nested Directories : Metadata Cache in a directory stores information from sub-directories as well creating security issues
[ https://issues.apache.org/jira/browse/DRILL-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips reassigned DRILL-3820: -- Assignee: Steven Phillips > Nested Directories : Metadata Cache in a directory stores information from > sub-directories as well creating security issues > --- > > Key: DRILL-3820 > URL: https://issues.apache.org/jira/browse/DRILL-3820 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Reporter: Rahul Challapalli >Assignee: Steven Phillips >Priority: Critical > Fix For: 1.2.0 > > > git.commit.id.abbrev=3c89b30 > User A has access to lineitem folder and its subfolders > User B had access to lineitem folder but not its sub-folders. > Now when User A runs the "refresh table metadata lineitem" command, the cache > file gets created under lineitem folder. This file contains information from > the underlying sub-directories as well. > Now User B can download this file and get access to information which he > should not be seeing in the first place. > This can be very easily reproducible if impersonation is enabled on the > cluster. > Let me know if you need more information to reproduce this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-3887) Parquet metadata cache not being used
[ https://issues.apache.org/jira/browse/DRILL-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips reassigned DRILL-3887: -- Assignee: Steven Phillips > Parquet metadata cache not being used > - > > Key: DRILL-3887 > URL: https://issues.apache.org/jira/browse/DRILL-3887 > Project: Apache Drill > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Steven Phillips >Priority: Critical > > The fix for DRILL-3788 causes a directory to be expanded to its list of files > early in the query, and this change causes the ParquetGroupScan to no longer > use the parquet metadata file, even when it is there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3887) Parquet metadata cache not being used
Steven Phillips created DRILL-3887: -- Summary: Parquet metadata cache not being used Key: DRILL-3887 URL: https://issues.apache.org/jira/browse/DRILL-3887 Project: Apache Drill Issue Type: Bug Reporter: Steven Phillips Priority: Critical The fix for DRILL-3788 causes a directory to be expanded to its list of files early in the query, and this change causes the ParquetGroupScan to no longer use the parquet metadata file, even when it is there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3844) HOME and END keys do not work in drill console
[ https://issues.apache.org/jira/browse/DRILL-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933684#comment-14933684 ] Steven Phillips commented on DRILL-3844: They are working fine on my system (Mac OS yosemite) What system are you using? drill console is based no sqlline, which is based on jline, so it seems it could be related to this: https://github.com/jline/jline2/issues/54 > HOME and END keys do not work in drill console > -- > > Key: DRILL-3844 > URL: https://issues.apache.org/jira/browse/DRILL-3844 > Project: Apache Drill > Issue Type: Bug > Components: Client - CLI >Affects Versions: 1.1.0, 1.2.0 >Reporter: Philip Deegan > > Is there a reason for this? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3229) Create a new EmbeddedVector
[ https://issues.apache.org/jira/browse/DRILL-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804637#comment-14804637 ] Steven Phillips commented on DRILL-3229: Basic design outline: A Union type represents a field where the type can vary between records. The data for a field of type Union will be stored in a UnionVector. h4. UnionVector Internally uses a MapVector to hold the vectors for the various types. The types include all of the MinorTypes, including List and Map. For example, the internal MapVector will have a subfield named "bigInt", which will refer to a NullableBigIntVector. In addition to the vectors corresponding to the minor types, there will be two additional fields, both represented by UInt1Vectors. These are "bits" and "types", which will represent the nullability and types of the underlying data. The "bits" vector will work the same way it works in other nullable vectors. The "types" vector will store the number corresponding to the value of the MinorType as defined in the protobuf definition. There will be mutator methods for setting null and type. h4. UnionWriter The UnionWriter implements and overwrites all of the methods of FieldWriter. It holds field writers corresponding to each of the types included in the underly UnionVector, and delegates the method calls for each type to the corresponding writer. For example, the BigIntWriter interface: {code} public interface BigIntWriter extends BaseWriter { public void write(BigIntHolder h); public void writeBigInt(long value); } {code} UnionWriter overwrites these methods: {code} @Override public void writeBigInt(long value) { data.getMutator().setType(idx(), MinorType.BIGINT); data.getMutator().setNotNull(idx()); getBigIntWriter().setPosition(idx()); getBigIntWriter().writeBigInt(value); } @Override public void writeBigInt(BigIntHolder h) { data.getMutator().setType(idx(), MinorType.BIGINT); data.getMutator().setNotNull(idx()); getBigIntWriter().setPosition(idx()); getBigIntWriter().writeBigInt(holder.value); } {code} This requires users of the interface to go through the UnionWriter, rather than using the underlying BigIntWriter directly. Otherwise, the "type" and "bits" vector would not get set correctly. h4. UnionReader Much the same as the UnionWriter, the UnionReader overwrites the methods of FieldReader, and delegates to a corresponding specific FieldReader implementation depending on which type the current value is. h4. UnionListVector UnionListVector extends BaseRepeatedVector. It works much the same as other Repeated vectors; there is a data vector and an offset vector. The data vector in this case is a UnionVector. h4. UnionListWriter The UnionListWriter overrides all FieldWriter methods. When starting a new list, the startList() method is called. This calls the startNewValue(int index) method of the underlying UnionListVector.Mutator. Subsequent calls to the ListWriter methods (such as bigint()), return the UnionListWriter itself, and calls to write are handled by calling the appropriate method on the underlying UnionListVector.Mutator, which handles updating the offset vector. In the case that the map() method is called (i.e. repeated map), the UnionListWriter is itself returned, but a state variable is updated to indicate that it should oeprate as a MapWriter. While in MapWriter mode, calls to the MapWriter methods will also return the UnionListWriter itself, but will also update the field indicating what the name of the current field is. Subsequent writes to the ScalarWriter methods will write to the underlying UnionVector using the UnionWriter interface. For example, {code} UnionListWriter list; ... list.startList(); list.map().bigInt("a").writeBigInt(1); {code} This code first indicates that a new list is starting. By doing this, the offset vector is correctly set. Calling map() sets the internal state of the writer to "MAP". bigInt("a") sets the current field of the writer to "a", and writeBigInt(1) writes the value 1 to the underlying UnionVector. Another example: {code} MapWriter mapWriter = list.map().map("a") {code} In this case, the final call to map("a") delegates to the underlying UnionWriter, and returns a new MapWriter, with the position set according to the current offset. > Create a new EmbeddedVector > --- > > Key: DRILL-3229 > URL: https://issues.apache.org/jira/browse/DRILL-3229 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Codegen, Execution - Data Types, Execution - > Relational Operators, Functions - Drill >Reporter:
[jira] [Assigned] (DRILL-3228) Implement Embedded Type
[ https://issues.apache.org/jira/browse/DRILL-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips reassigned DRILL-3228: -- Assignee: Steven Phillips (was: Jacques Nadeau) > Implement Embedded Type > --- > > Key: DRILL-3228 > URL: https://issues.apache.org/jira/browse/DRILL-3228 > Project: Apache Drill > Issue Type: Task > Components: Execution - Codegen, Execution - Data Types, Execution - > Relational Operators, Functions - Drill >Reporter: Jacques Nadeau >Assignee: Steven Phillips > Fix For: 1.3.0 > > > An Umbrella for the implementation of Embedded types within Drill. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-3229) Create a new EmbeddedVector
[ https://issues.apache.org/jira/browse/DRILL-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips reassigned DRILL-3229: -- Assignee: Steven Phillips (was: Jacques Nadeau) > Create a new EmbeddedVector > --- > > Key: DRILL-3229 > URL: https://issues.apache.org/jira/browse/DRILL-3229 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Codegen, Execution - Data Types, Execution - > Relational Operators, Functions - Drill >Reporter: Jacques Nadeau >Assignee: Steven Phillips > Fix For: Future > > > Embedded Vector will leverage a binary encoding for holding information about > type for each individual field. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3788) Partition Pruning not taking place with metadata caching when we have ~20k files
[ https://issues.apache.org/jira/browse/DRILL-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746645#comment-14746645 ] Steven Phillips commented on DRILL-3788: I am a bit confused. This jira seems to be related to directory-based partition pruning, not single-valued column based pruning. As far as I know they should both be working, though. I would have to use a debugger to find out why it's failing. > Partition Pruning not taking place with metadata caching when we have ~20k > files > > > Key: DRILL-3788 > URL: https://issues.apache.org/jira/browse/DRILL-3788 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.2.0 >Reporter: Rahul Challapalli >Assignee: Aman Sinha >Priority: Critical > Fix For: 1.2.0 > > Attachments: plan.txt > > > git.commit.id.abbrev=240a455 > Partition Pruning did not take place for the below query after I executed the > "refresh table metadata command" > {code} > explain plan for > select > l_returnflag, > l_linestatus > from > `lineitem/2006/1` > where > dir0=1 or dir0=2 > {code} > The logs did not indicate that "pruning did not take place" > Before executing the refresh table metadata command, partition pruning did > take effect > I am not attaching the data set as it is larger than 10MB. Reach out to me if > you need more information -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3180) Apache Drill JDBC storage plugin to query rdbms systems such as MySQL and Netezza from Apache Drill
[ https://issues.apache.org/jira/browse/DRILL-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742880#comment-14742880 ] Steven Phillips commented on DRILL-3180: +1 > Apache Drill JDBC storage plugin to query rdbms systems such as MySQL and > Netezza from Apache Drill > --- > > Key: DRILL-3180 > URL: https://issues.apache.org/jira/browse/DRILL-3180 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.0.0 >Reporter: Magnus Pierre >Assignee: Jacques Nadeau > Labels: Drill, JDBC, plugin > Fix For: 1.3.0 > > Attachments: patch.diff, pom.xml, storage-mpjdbc.zip > > Original Estimate: 1m > Remaining Estimate: 1m > > I have developed the base code for a JDBC storage-plugin for Apache Drill. > The code is primitive but consitutes a good starting point for further > coding. Today it provides primitive support for SELECT against RDBMS with > JDBC. > The goal is to provide complete SELECT support against RDBMS with push down > capabilities. > Currently the code is using standard JDBC classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3767) SchemaPath.getCompoundPath(String...strings) reverses it's input array
[ https://issues.apache.org/jira/browse/DRILL-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741328#comment-14741328 ] Steven Phillips commented on DRILL-3767: I think the side effect should be removed, rather than documented. > SchemaPath.getCompoundPath(String...strings) reverses it's input array > -- > > Key: DRILL-3767 > URL: https://issues.apache.org/jira/browse/DRILL-3767 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Codegen >Reporter: Deneche A. Hakim >Assignee: Deneche A. Hakim >Priority: Minor > Fix For: 1.2.0 > > > If you pass an array of strings to {{SchemaPath.getCompoundPath()}}, the > input array will be reversed. This side effect is *undocumented* and has led > to at least one known bug DRILL-3758 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3723) RemoteServiceSet.getServiceSetWithFullCache() ignores arguments
[ https://issues.apache.org/jira/browse/DRILL-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720716#comment-14720716 ] Steven Phillips commented on DRILL-3723: That's leftover from the days when Drill had a distributed cache. When that was removed, we should have removed the WithFullCache method, as it no longer has any meaning. RemoteServiceSet.getServiceSetWithFullCache() ignores arguments --- Key: DRILL-3723 URL: https://issues.apache.org/jira/browse/DRILL-3723 Project: Apache Drill Issue Type: Bug Components: Execution - RPC Affects Versions: 1.1.0 Reporter: Andrew Assignee: Jacques Nadeau Priority: Minor Fix For: 1.2.0 RemoteServiceSet.getServiceSetWithFullCache() ignores both of its arguments and is therefore functionally equivalent to getLocalServiceSet(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2743) Parquet file metadata caching
[ https://issues.apache.org/jira/browse/DRILL-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710078#comment-14710078 ] Steven Phillips commented on DRILL-2743: No, this case is not dealt with, so it is possible for the locations to get out of date. This won't cause any wrong results, but could give non-optimal performance. The only work around is to manually rerun the refresh metadata command. Parquet file metadata caching - Key: DRILL-2743 URL: https://issues.apache.org/jira/browse/DRILL-2743 Project: Apache Drill Issue Type: New Feature Components: Storage - Parquet Reporter: Steven Phillips Assignee: Steven Phillips Fix For: 1.2.0 Attachments: DRILL-2743.patch, drill.parquet_metadata To run a query against parquet files, we have to first recursively search the directory tree for all of the files, get the block locations for each file, and read the footer from each file, and this is done during the planning phase. When there are many files, this can result in a very large delay in running the query, and it does not scale. However, there isn't really any need to read the footers during planning, if we instead treat each parquet file as a single work unit, all we need to know are the block locations for the file, the number of rows, and the columns. We should store only the information which we need for planning in a file located in the top directory for a given parquet table, and then we can delay reading of the footers until execution time, which can be done in parallel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2743) Parquet file metadata caching
[ https://issues.apache.org/jira/browse/DRILL-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704132#comment-14704132 ] Steven Phillips commented on DRILL-2743: They can come from any source. Parquet file metadata caching - Key: DRILL-2743 URL: https://issues.apache.org/jira/browse/DRILL-2743 Project: Apache Drill Issue Type: New Feature Components: Storage - Parquet Reporter: Steven Phillips Assignee: Aman Sinha Fix For: 1.2.0 Attachments: DRILL-2743.patch, drill.parquet_metadata To run a query against parquet files, we have to first recursively search the directory tree for all of the files, get the block locations for each file, and read the footer from each file, and this is done during the planning phase. When there are many files, this can result in a very large delay in running the query, and it does not scale. However, there isn't really any need to read the footers during planning, if we instead treat each parquet file as a single work unit, all we need to know are the block locations for the file, the number of rows, and the columns. We should store only the information which we need for planning in a file located in the top directory for a given parquet table, and then we can delay reading of the footers until execution time, which can be done in parallel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2743) Parquet file metadata caching
[ https://issues.apache.org/jira/browse/DRILL-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704027#comment-14704027 ] Steven Phillips commented on DRILL-2743: 1. Currently, there is no log message, but I could add one. 2. I am not sure what you mean by change anything, but the case of both files and directories is handled. 3. I don't think there will be changes to the format, but I can't guarantee that. I also expect there to be changes to the format in future releases. 4. Those permissions will allow anyone to read the file. I do see a potential problem, though. Currently, if a change is detected to the underlying files, the metadata is updated automatically when a query is run. If the user doesn't have write permission, this will cause a failure. Parquet file metadata caching - Key: DRILL-2743 URL: https://issues.apache.org/jira/browse/DRILL-2743 Project: Apache Drill Issue Type: New Feature Components: Storage - Parquet Reporter: Steven Phillips Assignee: Aman Sinha Fix For: 1.2.0 Attachments: DRILL-2743.patch, drill.parquet_metadata To run a query against parquet files, we have to first recursively search the directory tree for all of the files, get the block locations for each file, and read the footer from each file, and this is done during the planning phase. When there are many files, this can result in a very large delay in running the query, and it does not scale. However, there isn't really any need to read the footers during planning, if we instead treat each parquet file as a single work unit, all we need to know are the block locations for the file, the number of rows, and the columns. We should store only the information which we need for planning in a file located in the top directory for a given parquet table, and then we can delay reading of the footers until execution time, which can be done in parallel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2743) Parquet file metadata caching
[ https://issues.apache.org/jira/browse/DRILL-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-2743: --- Assignee: Aman Sinha (was: Steven Phillips) Parquet file metadata caching - Key: DRILL-2743 URL: https://issues.apache.org/jira/browse/DRILL-2743 Project: Apache Drill Issue Type: New Feature Components: Storage - Parquet Reporter: Steven Phillips Assignee: Aman Sinha Fix For: 1.2.0 Attachments: DRILL-2743.patch, drill.parquet_metadata To run a query against parquet files, we have to first recursively search the directory tree for all of the files, get the block locations for each file, and read the footer from each file, and this is done during the planning phase. When there are many files, this can result in a very large delay in running the query, and it does not scale. However, there isn't really any need to read the footers during planning, if we instead treat each parquet file as a single work unit, all we need to know are the block locations for the file, the number of rows, and the columns. We should store only the information which we need for planning in a file located in the top directory for a given parquet table, and then we can delay reading of the footers until execution time, which can be done in parallel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3353) Non data-type related schema changes errors
[ https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-3353: --- Assignee: Hanifi Gunes (was: Steven Phillips) Non data-type related schema changes errors --- Key: DRILL-3353 URL: https://issues.apache.org/jira/browse/DRILL-3353 Project: Apache Drill Issue Type: Bug Components: Storage - JSON Affects Versions: 1.0.0 Reporter: Oscar Bernal Assignee: Hanifi Gunes Fix For: 1.2.0 Attachments: i-bfbc0a5c-ios-PulsarEvent-2015-06-23_19.json.zip I'm having trouble querying a data set with varying schema for a nested object fields. The majority of my data for a specific type of record has the following nested data: {code} attributes:{daysSinceInstall:0,destination:none,logged:no,nth:1,type:organic,wearable:no}} {code} Among those records (hundreds of them) I have only two with a slightly different schema: {code} attributes:{adSet:Teste-Adwords-Engagement-Branch-iOS-230615-adset,campaign:Teste-Adwords-Engagement-Branch-iOS-230615,channel:Adwords,daysSinceInstall:0,destination:none,logged:no,nth:4,type:branch,wearable:no}} {code} When trying to query the new fields, my queries fail: With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code} {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = 'Teste-FB-Engagement-Puro-iOS-230615'; Error: SYSTEM ERROR: java.lang.NumberFormatException: Teste-FB-Engagement-Puro-iOS-230615 Fragment 0:0 [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) {noformat} With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code} {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE'; Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type when you are using a ValueWriter of type NullableVarCharWriterImpl. File file.json Record 35 Fragment 0:0 [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) {noformat} If I try to extract all attributes from those events, Drill will only return a subset of the fields, ignoring the others. {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App'; ++ | EXPR$0 | ++ | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:}| | {logged:no,wearable:no,type:} | ++ {noformat} What I find strange is that I have thousands of records in the same file with different schema for different record types and all other queries seem run well. Is there something about how Drill infers schema that I might be missing here? Does it infer based on a sample % of the data and fail for records that were not taken into account while inferring schema? I suspect I wouldn't have this error if I had 100's of records with that other schema inside the file, but I can't find anything in the docs or code to support that hypothesis. Perhaps it's just a bug? Is it expected? Troubleshooting guide seems to mention something about this but it's very vague in implying Drill doesn't fully support schema changes. I thought that was for data type changes mostly, for which there are other well documented issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3150) Error when filtering non-existent field with a string
[ https://issues.apache.org/jira/browse/DRILL-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622954#comment-14622954 ] Steven Phillips commented on DRILL-3150: I actually think the correct thing is to use VARBINARY as the default. It's true that comapring a non-numeric string to a valid integer field would fail, but that's ok. Our rules for implicit cast require casting the VARCHAR to NUMERIC, since NUMERIC types have a higher precedence. So doing a comparison between a non-numeric string and a numeric type should fail. In that case, it is necessary to explicitly cast the int as a string. I actually filed DRILL-3477 the other day, without realizing this issue was here. Error when filtering non-existent field with a string - Key: DRILL-3150 URL: https://issues.apache.org/jira/browse/DRILL-3150 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.0.0 Reporter: Adam Gilmore Assignee: Parth Chandra Priority: Critical Fix For: 1.2.0 Attachments: DRILL-3150.1.patch.txt The following query throws an exception: {code} select count(*) from cp.`employee.json` where `blah` = 'test' {code} blah does not exist as a field in the JSON. The expected behaviour would be to filter out all rows as that field is not present (thus cannot equal the string 'test'). Instead, the following exception occurs: {code} org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: test Fragment 0:0 [Error Id: 5d6c9a82-8f87-41b2-a496-67b360302b76 on ip-10-1-50-208.ec2.internal:31010] {code} Apart from the fact the real error message is hidden, the issue is that we're trying to cast the varchar to int ('test' to an int). This seems to be because the projection out of the scan when a field is not found becomes INT:OPTIONAL. The filter should not fail on this - if the varchar fails to convert to an int, the filter should just simply not allow any records through. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3353) Non data-type related schema changes errors
[ https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621537#comment-14621537 ] Steven Phillips commented on DRILL-3353: There are several issues here. 1. {code} Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type when you are using a ValueWriter of type NullableVarCharWriterImpl. {code} This is due to the fact that in one of the records, the boolean value true has quotes around it. Thus, it is parsed as a string. Drill does not currently support changing the type of a specific field. See DRILL-3228 and DRILL-3229 for future work that will enhnace our flexibility in this regard. The current work around for this is to set all_text_mode to true, which you already know. 2. {code} Error: SYSTEM ERROR: java.lang.NumberFormatException: Teste-FB-Engagement-Puro-iOS-230615 {code} This is due to a problem with implicit cast and null fields. I filed DRILL-3477 for this issue. 3. Missing fields This is due to some bugs in Drill's processing of complex data that occurs in some operations when new fields are added. I will be posting a fix for this shortly. Non data-type related schema changes errors --- Key: DRILL-3353 URL: https://issues.apache.org/jira/browse/DRILL-3353 Project: Apache Drill Issue Type: Bug Components: Storage - JSON Affects Versions: 1.0.0 Reporter: Oscar Bernal Assignee: Steven Phillips Fix For: 1.2.0 Attachments: i-bfbc0a5c-ios-PulsarEvent-2015-06-23_19.json.zip I'm having trouble querying a data set with varying schema for a nested object fields. The majority of my data for a specific type of record has the following nested data: {code} attributes:{daysSinceInstall:0,destination:none,logged:no,nth:1,type:organic,wearable:no}} {code} Among those records (hundreds of them) I have only two with a slightly different schema: {code} attributes:{adSet:Teste-Adwords-Engagement-Branch-iOS-230615-adset,campaign:Teste-Adwords-Engagement-Branch-iOS-230615,channel:Adwords,daysSinceInstall:0,destination:none,logged:no,nth:4,type:branch,wearable:no}} {code} When trying to query the new fields, my queries fail: With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code} {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = 'Teste-FB-Engagement-Puro-iOS-230615'; Error: SYSTEM ERROR: java.lang.NumberFormatException: Teste-FB-Engagement-Puro-iOS-230615 Fragment 0:0 [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) {noformat} With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code} {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE'; Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type when you are using a ValueWriter of type NullableVarCharWriterImpl. File file.json Record 35 Fragment 0:0 [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0) {noformat} If I try to extract all attributes from those events, Drill will only return a subset of the fields, ignoring the others. {noformat} 0: jdbc:drill:zk=local select log.event.attributes from `dfs`.`root`.`/file.json` as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App'; ++ | EXPR$0 | ++ | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:} | | {logged:no,wearable:no,type:}| | {logged:no,wearable:no,type:} | ++ {noformat} What I find strange is that I have thousands of records in the same file with different schema for different record types and all other queries seem run well. Is there something about how Drill infers schema that I might be missing here? Does it infer based on a sample % of the data and fail for records that were not taken into account while inferring schema? I suspect I wouldn't have this error if I had 100's of records with that other schema inside the file, but I can't find anything in the docs or code to support that hypothesis. Perhaps it's just a bug? Is it expected? Troubleshooting guide seems to mention something about this but it's very vague in implying Drill doesn't fully support schema changes. I thought that was
[jira] [Created] (DRILL-3487) MaterializedField equality doesn't check if nested fields are equal
Steven Phillips created DRILL-3487: -- Summary: MaterializedField equality doesn't check if nested fields are equal Key: DRILL-3487 URL: https://issues.apache.org/jira/browse/DRILL-3487 Project: Apache Drill Issue Type: Bug Components: Metadata Reporter: Steven Phillips Assignee: Hanifi Gunes In several places, we use BatchSchema.equals() to determine if two schemas are the same. A BatchSchema is a set of MaterializedField objects. But ever since DRILL-1872, the child fields are no longer checked. What this means, essentially, is that BatchSchema.equals() is not valid for determining schema changes if the batch contains any nested fields. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3477) Using IntVector for null expressions causes problems with implicit cast
Steven Phillips created DRILL-3477: -- Summary: Using IntVector for null expressions causes problems with implicit cast Key: DRILL-3477 URL: https://issues.apache.org/jira/browse/DRILL-3477 Project: Apache Drill Issue Type: Bug Reporter: Steven Phillips Assignee: Steven Phillips See DRILL-3353, for example. A simple example is this: {code} select * from t where a = 's'; {code} If the first batch scanned from table t does not contain the column a, the expression materializer in Project defaults to Nullable Int as the type. The Filter then sees an Equals expression between a VarChar and an Int type, so it does an implicit cast. Implicit cast rules give Int higher precedence, so the literal 's' is cast to Int, which ends up throwing a NumberFormatException. In the class ResolverTypePrecedence, we see that Null type has the lowest precedence, which makes sense. But since we don't actually currently have an implementation for NullVector, we should materialize the Null type as the Vector with the lowest possible precedence, which is VarBinary. My suggestion is that we should use VarBinary as the default type in ExpressionMaterializer instead of Int. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3477) Using IntVector for null expressions causes problems with implicit cast
[ https://issues.apache.org/jira/browse/DRILL-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619762#comment-14619762 ] Steven Phillips commented on DRILL-3477: I was thinking that might be somewhat involved, but I guess it could be pretty simple. Just a simple implementation that would contain no buffers, and always return null when accessed. And cannot be written to. Using IntVector for null expressions causes problems with implicit cast --- Key: DRILL-3477 URL: https://issues.apache.org/jira/browse/DRILL-3477 Project: Apache Drill Issue Type: Bug Reporter: Steven Phillips Assignee: Jinfeng Ni See DRILL-3353, for example. A simple example is this: {code} select * from t where a = 's'; {code} If the first batch scanned from table t does not contain the column a, the expression materializer in Project defaults to Nullable Int as the type. The Filter then sees an Equals expression between a VarChar and an Int type, so it does an implicit cast. Implicit cast rules give Int higher precedence, so the literal 's' is cast to Int, which ends up throwing a NumberFormatException. In the class ResolverTypePrecedence, we see that Null type has the lowest precedence, which makes sense. But since we don't actually currently have an implementation for NullVector, we should materialize the Null type as the Vector with the lowest possible precedence, which is VarBinary. My suggestion is that we should use VarBinary as the default type in ExpressionMaterializer instead of Int. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3477) Using IntVector for null expressions causes problems with implicit cast
[ https://issues.apache.org/jira/browse/DRILL-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619772#comment-14619772 ] Steven Phillips commented on DRILL-3477: I haven't run the test yet, just posting now to get some feedback on the idea. Are there places in the code that are expecting it to be an IntVector? I thought it was a somewhat arbitrary choice, and that using a different type wouldn't cause any additional problems. Using IntVector for null expressions causes problems with implicit cast --- Key: DRILL-3477 URL: https://issues.apache.org/jira/browse/DRILL-3477 Project: Apache Drill Issue Type: Bug Reporter: Steven Phillips Assignee: Jinfeng Ni See DRILL-3353, for example. A simple example is this: {code} select * from t where a = 's'; {code} If the first batch scanned from table t does not contain the column a, the expression materializer in Project defaults to Nullable Int as the type. The Filter then sees an Equals expression between a VarChar and an Int type, so it does an implicit cast. Implicit cast rules give Int higher precedence, so the literal 's' is cast to Int, which ends up throwing a NumberFormatException. In the class ResolverTypePrecedence, we see that Null type has the lowest precedence, which makes sense. But since we don't actually currently have an implementation for NullVector, we should materialize the Null type as the Vector with the lowest possible precedence, which is VarBinary. My suggestion is that we should use VarBinary as the default type in ExpressionMaterializer instead of Int. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3477) Using IntVector for null expressions causes problems with implicit cast
[ https://issues.apache.org/jira/browse/DRILL-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips updated DRILL-3477: --- Assignee: Jinfeng Ni (was: Steven Phillips) Using IntVector for null expressions causes problems with implicit cast --- Key: DRILL-3477 URL: https://issues.apache.org/jira/browse/DRILL-3477 Project: Apache Drill Issue Type: Bug Reporter: Steven Phillips Assignee: Jinfeng Ni See DRILL-3353, for example. A simple example is this: {code} select * from t where a = 's'; {code} If the first batch scanned from table t does not contain the column a, the expression materializer in Project defaults to Nullable Int as the type. The Filter then sees an Equals expression between a VarChar and an Int type, so it does an implicit cast. Implicit cast rules give Int higher precedence, so the literal 's' is cast to Int, which ends up throwing a NumberFormatException. In the class ResolverTypePrecedence, we see that Null type has the lowest precedence, which makes sense. But since we don't actually currently have an implementation for NullVector, we should materialize the Null type as the Vector with the lowest possible precedence, which is VarBinary. My suggestion is that we should use VarBinary as the default type in ExpressionMaterializer instead of Int. -- This message was sent by Atlassian JIRA (v6.3.4#6332)