from:"Steven Phillips \(JIRA\)"

[jira] [Commented] (DRILL-3522) IllegalStateException from Mongo storage plugin

2016-04-21 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253059#comment-15253059
 ] 

Steven Phillips commented on DRILL-3522:


Just merged.

commit id:
a07f4de


> IllegalStateException from Mongo storage plugin
> ---
>
> Key: DRILL-3522
> URL: https://issues.apache.org/jira/browse/DRILL-3522
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - MongoDB
>Affects Versions: 1.1.0
>Reporter: Adam Gilmore
>Assignee: Adam Gilmore
>Priority: Critical
> Attachments: DRILL-3522.1.patch.txt
>
>
> With a Mongo storage plugin enabled, we are sporadically getting the 
> following exception when running queries (even not against the Mongo storage 
> plugin):
> {code}
> SYSTEM ERROR: IllegalStateException: state should be: open
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: 
> org.apache.drill.common.exceptions.DrillRuntimeException: state should be: 
> open
> org.apache.drill.exec.work.foreman.Foreman.run():253
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
>   Caused By (com.google.common.util.concurrent.UncheckedExecutionException) 
> org.apache.drill.common.exceptions.DrillRuntimeException: state should be: 
> open
> com.google.common.cache.LocalCache$Segment.get():2263
> com.google.common.cache.LocalCache.get():4000
> com.google.common.cache.LocalCache.getOrLoad():4004
> com.google.common.cache.LocalCache$LocalLoadingCache.get():4874
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$MongoSchema.getSubSchemaNames():172
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$MongoSchema.setHolder():159
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory.registerSchemas():127
> org.apache.drill.exec.store.mongo.MongoStoragePlugin.registerSchemas():86
> 
> org.apache.drill.exec.store.StoragePluginRegistry$DrillSchemaFactory.registerSchemas():328
> org.apache.drill.exec.ops.QueryContext.getRootSchema():165
> org.apache.drill.exec.ops.QueryContext.getRootSchema():154
> org.apache.drill.exec.ops.QueryContext.getRootSchema():142
> org.apache.drill.exec.ops.QueryContext.getNewDefaultSchema():128
> org.apache.drill.exec.planner.sql.DrillSqlWorker.():91
> org.apache.drill.exec.work.foreman.Foreman.runSQL():901
> org.apache.drill.exec.work.foreman.Foreman.run():242
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
>   Caused By (org.apache.drill.common.exceptions.DrillRuntimeException) state 
> should be: open
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$DatabaseLoader.load():98
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$DatabaseLoader.load():82
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture():3599
> com.google.common.cache.LocalCache$Segment.loadSync():2379
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad():2342
> com.google.common.cache.LocalCache$Segment.get():2257
> com.google.common.cache.LocalCache.get():4000
> com.google.common.cache.LocalCache.getOrLoad():4004
> com.google.common.cache.LocalCache$LocalLoadingCache.get():4874
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$MongoSchema.getSubSchemaNames():172
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$MongoSchema.setHolder():159
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory.registerSchemas():127
> org.apache.drill.exec.store.mongo.MongoStoragePlugin.registerSchemas():86
> 
> org.apache.drill.exec.store.StoragePluginRegistry$DrillSchemaFactory.registerSchemas():328
> org.apache.drill.exec.ops.QueryContext.getRootSchema():165
> org.apache.drill.exec.ops.QueryContext.getRootSchema():154
> org.apache.drill.exec.ops.QueryContext.getRootSchema():142
> org.apache.drill.exec.ops.QueryContext.getNewDefaultSchema():128
> org.apache.drill.exec.planner.sql.DrillSqlWorker.():91
> org.apache.drill.exec.work.foreman.Foreman.runSQL():901
> org.apache.drill.exec.work.foreman.Foreman.run():242
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
>   Caused By (java.lang.IllegalStateException) state should be: open
> com.mongodb.assertions.Assertions.isTrue():70
> com.mongodb.connection.BaseCluster.selectServer():79
> com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.():75
>

[jira] [Commented] (DRILL-4615) Support directory names in schema

2016-04-19 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248163#comment-15248163
 ] 

Steven Phillips commented on DRILL-4615:


It seems what you are describing is an alternative way of interpreting 
directory attributes. Drill's current approach is to create the columns dir0, 
dir1, etc, which contain the string value of the directory names. These column 
names and values are currently used in two different places in drill. The first 
is for partition pruning during the planning stage, and then in the columns are 
materialized during the actual execution of the scan. You can see examples of 
these uses in the classes: FileSystemPartitionDescriptor, and 
ParquetScanBatchCreator.

We should probably refactor and make abstract the code which materializes the 
partition column names and values into some sort of Attribute Provider, and 
then we could implement an alternate version which interprets the directories 
the way Spark and Hive do.

If this is something you are interested in working on, I can help out.

> Support directory names in schema
> -
>
> Key: DRILL-4615
> URL: https://issues.apache.org/jira/browse/DRILL-4615
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jesse Yates
>
> In Spark, partitioned parquet output is written with directories like:
> {code}
> /column1=1
>   /column2=hello
>  /data.parquet
>   /column2=world
>  /moredata.parquet
> /column1=2
> {code}
> However, when querying these files with Drill we end up interpreting the 
> directories as strings when what they really are is column names + values. In 
> the data files we only have the remaining columns. Querying this with drill 
> means that you can really only have a couple of data types (far short of what 
> spark/parquet supports) in the column and still have correct operations.
> Given the size of the data, I don't want to have to CTAS all the parquet 
> files (especially as they are being periodically updated). 
> I think this ends up being a nice addition for general file directory reads 
> as well since many people already encode meaning into their directory 
> structure, but having self describing directories is even better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4558) When a query returns diacritics in a string, the string is cut

2016-03-30 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15218975#comment-15218975
 ] 

Steven Phillips commented on DRILL-4558:


This looks like a problem in the BsonRecordReader:

{code}
  private void writeString(String readString, final MapOrListWriterImpl writer, 
String fieldName, boolean isList) {
final int length = readString.length();
final VarCharHolder vh = new VarCharHolder();
ensure(length);
try {
  workBuf.setBytes(0, readString.getBytes("UTF-8"));
} catch (UnsupportedEncodingException e) {
  throw new DrillRuntimeException("Unable to read string value for field: " 
+ fieldName, e);
}
vh.buffer = workBuf;
vh.start = 0;
vh.end = length;
if (isList == false) {
  writer.varChar(fieldName).write(vh);
} else {
  writer.list.varChar().write(vh);
}
  }
{code}

the length variable should be the length of the byte array, not the length of 
the String.

A quick work-around would be to disable the bson reader:

set store.mongo.bson.record.reader = false;

> When a query returns diacritics in a string, the string is cut
> --
>
> Key: DRILL-4558
> URL: https://issues.apache.org/jira/browse/DRILL-4558
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - MongoDB
> Environment: Apache Drill 1.6
> MongoDB 3.2.1
>Reporter: Vincent Uribe
>
> With the given document in a collection "Test" from a database testDb :
> {
> "_id" : ObjectId("56e7f1bd0944228aab06d0e2"),
> "ID_ATTRIBUT" : "3",
> "VAL_ATTRIBUT" : "Végétaux",
> "UPDATED" : ISODate("2016-01-09T23:00:00.000Z")
> }
> When querying select * from mongoStorage.testDb.Test I get 
> _id: [B@affb65
> ID_ATTRIBUT: 3
> VAL_ATTRIBUT: *Végéta*
> UPDATED: 2016-01-09T23:00:00.000Z
> As you can see, the two 'é' cut the string "végétaux" by 2 characters, giving 
> végéta.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4566) Add TDigest functions for computing median and quantile

2016-03-30 Thread Steven Phillips (JIRA)

Steven Phillips created DRILL-4566:
--

 Summary: Add TDigest functions for computing median and quantile
 Key: DRILL-4566
 URL: https://issues.apache.org/jira/browse/DRILL-4566
 Project: Apache Drill
  Issue Type: New Feature
Reporter: Steven Phillips
Assignee: Steven Phillips


The tdigest library can be used by Drill to compute approximate value and 
percentiles with using too much memory or spilling to disk, which would be 
required to compute exactly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4562) NPE when evaluating expression on nested union type

2016-03-30 Thread Steven Phillips (JIRA)

Steven Phillips created DRILL-4562:
--

 Summary: NPE when evaluating expression on nested union type
 Key: DRILL-4562
 URL: https://issues.apache.org/jira/browse/DRILL-4562
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips
Assignee: Steven Phillips


A simple reproduction:
{code}
select typeof(t.a.b) c from `f.json` t
{code}
where f.json contains:
{code}
{a : { b : 1 }}
{a : { b: "hello" }}
{a : { b: { c : 2} }}
{code}
Fails with following:
{code}
(java.lang.NullPointerException) null

org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatchesUnion():40
org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatches():141
org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldId():207
org.apache.drill.exec.record.SimpleVectorWrapper.getFieldIdIfMatches():101
org.apache.drill.exec.record.VectorContainer.getValueVectorId():269
org.apache.drill.exec.physical.impl.ScanBatch.getValueVectorId():325

org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.getValueVectorId():182

org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitSchemaPath():628

org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitSchemaPath():217
org.apache.drill.common.expression.SchemaPath.accept():152

org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitFunctionCall():274

org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitFunctionCall():217
org.apache.drill.common.expression.FunctionCall.accept():60
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3317) when ProtobufLengthDecoder couldn't allocate a new DrillBuf, this error is just logged and nothing else is done

2016-03-30 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217658#comment-15217658
 ] 

Steven Phillips commented on DRILL-3317:


Until a few months ago, the OutOfMemoryHandler would cause a message to 
propogated to the operators, which could potentially be handled by 
ExternalSort. But with the allocator changes in DRILL-4134 
(809f4620d7d82c72240212de13b993049550959d), this is no longer happening, and 
now the it just logs a message.

Was there a particular reason that functionality was removed? In the comments 
for DRILL-3241, [~jnadeau] says that the funcionality should be removed because 
it does not work. Do you mean that it's not working now because it was removed? 
Or that it didn't work even before it was removed?

> when ProtobufLengthDecoder couldn't allocate a new DrillBuf, this error is 
> just logged and nothing else is done
> ---
>
> Key: DRILL-3317
> URL: https://issues.apache.org/jira/browse/DRILL-3317
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Reporter: Deneche A. Hakim
>Assignee: Jacques Nadeau
> Fix For: 1.7.0
>
>
> Trying to reproduce DRILL-3241 I sometimes get the following error in the 
> logs:
> {noformat}
> ERROR: Out of memory outside any particular fragment.
>   at 
> org.apache.drill.exec.rpc.data.DataResponseHandlerImpl.informOutOfMemory(DataResponseHandlerImpl.java:40)
>   at 
> org.apache.drill.exec.rpc.data.DataServer$2.handle(DataServer.java:227)
>   at 
> org.apache.drill.exec.rpc.ProtobufLengthDecoder.decode(ProtobufLengthDecoder.java:87)
>   at 
> org.apache.drill.exec.rpc.data.DataProtobufLengthDecoder$Server.decode(DataProtobufLengthDecoder.java:52)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:315)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:229)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> WARN: Failure allocating buffer on incoming stream due to memory limits.  
> Current Allocation: 1372678764.
>   at 
> org.apache.drill.exec.rpc.ProtobufLengthDecoder.decode(ProtobufLengthDecoder.java:85)
>   at 
> org.apache.drill.exec.rpc.data.DataProtobufLengthDecoder$Server.decode(DataProtobufLengthDecoder.java:52)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:315)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:229)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> {noformat}
> ProtobufLengthDecoder.decode() does call OutOfMemoryHandler.handle() which 
> calls DataResponseHandlerImpl.informOutOfMemory() which just logs the error 
> in the logs.
> If we have fragments waiting for data they will be stuck waiting forever, and 
> the query will hang (behavior observed in DRILL-3241



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-4489) Add ValueVector tests from Drill

2016-03-08 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips closed DRILL-4489.
--
Resolution: Invalid

This jira should be in the Arrow project, not Drill

> Add ValueVector tests from Drill
> 
>
> Key: DRILL-4489
> URL: https://issues.apache.org/jira/browse/DRILL-4489
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>
> There are some simple ValueVector tests that should be included in the Arrow 
> project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4489) Add ValueVector tests from Drill

2016-03-08 Thread Steven Phillips (JIRA)

Steven Phillips created DRILL-4489:
--

 Summary: Add ValueVector tests from Drill
 Key: DRILL-4489
 URL: https://issues.apache.org/jira/browse/DRILL-4489
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips


There are some simple ValueVector tests that should be included in the Arrow 
project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4486) Expression serializer incorrectly serializes escaped characters

2016-03-07 Thread Steven Phillips (JIRA)

Steven Phillips created DRILL-4486:
--

 Summary: Expression serializer incorrectly serializes escaped 
characters
 Key: DRILL-4486
 URL: https://issues.apache.org/jira/browse/DRILL-4486
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips
Assignee: Steven Phillips


the drill expression parser requires backslashes to be escaped. But the 
ExpressionStringBuilder is not properly escaping them. This causes problems, 
especially in the case of regex expressions run with parallel execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4455) Depend on Apache Arrow for Vector and Memory

2016-02-29 Thread Steven Phillips (JIRA)

Steven Phillips created DRILL-4455:
--

 Summary: Depend on Apache Arrow for Vector and Memory
 Key: DRILL-4455
 URL: https://issues.apache.org/jira/browse/DRILL-4455
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips
Assignee: Steven Phillips
 Fix For: 1.7.0


The code for value vectors and memory has been split and contributed to the 
apache arrow repository. In order to help this project advance, Drill should 
depend on the arrow project instead of internal value vector code.

This change will require recompiling any external code, such as UDFs and 
StoragePlugins. The changes will mainly just involve renaming the classes to 
the org.apache.arrow namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4382) Remove dependency on drill-logical from vector submodule

2016-02-10 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-4382:
---
Assignee: Hanifi Gunes  (was: Steven Phillips)

> Remove dependency on drill-logical from vector submodule
> 
>
> Key: DRILL-4382
> URL: https://issues.apache.org/jira/browse/DRILL-4382
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Hanifi Gunes
>
> This is in preparation for transitioning the code to the Apache Arrow project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4382) Remove dependency on drill-logical from vector submodule

2016-02-10 Thread Steven Phillips (JIRA)

Steven Phillips created DRILL-4382:
--

 Summary: Remove dependency on drill-logical from vector submodule
 Key: DRILL-4382
 URL: https://issues.apache.org/jira/browse/DRILL-4382
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips
Assignee: Steven Phillips


This is in preparation for transitioning the code to the Apache Arrow project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4297) Provide a new interface to send custom messages

2016-02-04 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133748#comment-15133748
 ] 

Steven Phillips commented on DRILL-4297:


+1

> Provide a new interface to send custom messages
> ---
>
> Key: DRILL-4297
> URL: https://issues.apache.org/jira/browse/DRILL-4297
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: amit hadke
>Assignee: Steven Phillips
> Attachments: DRILL-4297.patch
>
>
> Currently custom messages are restricted to protobuf messages.
> Provide a new interface to custom message that allows to send/receives 
> Pojos/bytes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4339) Avro Reader can not read records - Regression

2016-02-02 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129381#comment-15129381
 ] 

Steven Phillips commented on DRILL-4339:


I personally don't have much problem with having to recompile my code, I was 
just wondering if this would create problems for others.

If reverting the signature change could avoid a few headaches, and there is 
very little cost in making the change, I say we go ahead and merge it.

+1

> Avro Reader can not read records - Regression
> -
>
> Key: DRILL-4339
> URL: https://issues.apache.org/jira/browse/DRILL-4339
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.5.0
>Reporter: Stefán Baxter
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Simple reading of Avro records no longer works
> 0: jdbc:drill:zk=local> select * from dfs.asa.`/`;
> Exception in thread "drill-executor-2" java.lang.NoSuchMethodError: 
> org.apache.drill.exec.store.avro.AvroRecordReader.setColumns(Ljava/util/Collection;)V
>   at 
> org.apache.drill.exec.store.avro.AvroRecordReader.(AvroRecordReader.java:99)
>   at 
> org.apache.drill.exec.store.avro.AvroFormatPlugin.getRecordReader(AvroFormatPlugin.java:73)
>   at 
> org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin.getReaderBatch(EasyFormatPlugin.java:172)
>   at 
> org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:35)
>   at 
> org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:28)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:147)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:170)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:127)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:170)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:127)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:170)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreator.java:101)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:79)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:230)
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> We have been using the Avro reader for a while and this looks like a 
> regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4215) Transfer ownership of buffers when doing transfers

2015-12-21 Thread Steven Phillips (JIRA)

Steven Phillips created DRILL-4215:
--

 Summary: Transfer ownership of buffers when doing transfers
 Key: DRILL-4215
 URL: https://issues.apache.org/jira/browse/DRILL-4215
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips
Assignee: Steven Phillips


The new allocator has the feature of allowing the transfer of ownership of 
buffers from one allocator to another. We should make use of this feature by 
transferring ownership whenever we transfer buffers between vectors. This will 
allow better tracking of how much memory operators are holding on to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4208) Storage plugin configuration persistence not working for Apache Drill

2015-12-17 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063271#comment-15063271
 ] 

Steven Phillips commented on DRILL-4208:


The properties in drill-override.conf are hierarchical. Since you are already 
inside drill.exec, you don't include it in the key. So it should be like this:

drill.exec: {
  cluster-id: "drillbits1",
  zk.connect: "localhost:2181",
  sys.store.provider.local.path = "/home/dev/abc"
}

> Storage plugin configuration persistence not working for Apache Drill
> -
>
> Key: DRILL-4208
> URL: https://issues.apache.org/jira/browse/DRILL-4208
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.3.0
> Environment: Ubuntu 14.0.4 
>Reporter: Devender Yadav 
> Fix For: Future
>
>
> According to Drill's documentation :
> Drill uses /tmp/drill/sys.storage_plugins to store storage plugin 
> configurations. The temporary directory clears when you quit the Drill shell. 
> To save your storage plugin configurations from one session to the next, set 
> the following option in the drill-override.conf file if you are running Drill 
> in embedded mode.
> drill.exec.sys.store.provider.local.path = "/mypath"
> I checked /tmp/drill/sys.storage_plugins, there is some data in this file. 
> Then I modified drill-override.conf :
> drill.exec: {
>   cluster-id: "drillbits1",
>   zk.connect: "localhost:2181",
>   drill.exec.sys.store.provider.local.path = "/home/dev/abc"
> }
> I restarted drill & even restarted my machine. Nothing is created at this 
> location.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4159) TestCsvHeader sometimes fails due to ordering issue

2015-12-03 Thread Steven Phillips (JIRA)

Steven Phillips created DRILL-4159:
--

 Summary: TestCsvHeader sometimes fails due to ordering issue
 Key: DRILL-4159
 URL: https://issues.apache.org/jira/browse/DRILL-4159
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips
Assignee: Steven Phillips


This test should be rewritten to use the query test framework, rather than 
doing a string comparison of the entire result set. And it should be specified 
as unordered, so that results aren't affected by the random order in which 
files are read.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4160) Order By on a flattened column throws SchemaChangeException - Missing function implementation

2015-12-03 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039586#comment-15039586
 ] 

Steven Phillips commented on DRILL-4160:


This doesn't have anything to do with flatten.

You can't order by or group by a map type.

The message could be better, but it's tricky because the failure doesn't occur 
in the sort operator, it happens in the exchange before the data even gets to 
the sort.

> Order By on a flattened column throws SchemaChangeException - Missing 
> function implementation
> -
>
> Key: DRILL-4160
> URL: https://issues.apache.org/jira/browse/DRILL-4160
> Project: Apache Drill
>  Issue Type: Bug
>  Components: SQL Parser, Storage - JSON
>Reporter: Abhishek Girish
> Attachments: drillbit.log.txt
>
>
> Query with an ORDER BY clause on a flattened column fails:
> {code}
> > select `name`, `type`, flatten(kvgen(`compliments`)) as `compliments` from 
> > `user` order by `name`, `type`, `compliments` limit 2;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
> Error in expression at index 2.  Error: Missing function implementation: 
> [hash64asdouble(MAP-REQUIRED, BIGINT-REQUIRED)].  Full expression: null..
> Fragment 3:0
> [Error Id: 3b3d3224-953a-46a2-8caa-fa6949e58ffd on abhi1:31010] 
> (state=,code=0)
> {code}
> Query without order by on the flatten column executes fine. 
> {code}
> > select `name`, `type`, flatten(kvgen(`compliments`)) as `compliments` from 
> > `user` order by `name`, `type` limit 2;
> ++---++
> |name| type  |  compliments   |
> ++---++
> |  Kurt  | user  | {"key":"cute","value":1.0} |
> |  Kurt  | user  | {"key":"writer","value":1.0}   |
> ++---++
> 2 rows selected (4.239 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (DRILL-4160) Order By on a flattened column throws SchemaChangeException - Missing function implementation

2015-12-03 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-4160:
---
Comment: was deleted

(was: never mind, i misread the query, you are not ordering by compliments.)

> Order By on a flattened column throws SchemaChangeException - Missing 
> function implementation
> -
>
> Key: DRILL-4160
> URL: https://issues.apache.org/jira/browse/DRILL-4160
> Project: Apache Drill
>  Issue Type: Bug
>  Components: SQL Parser, Storage - JSON
>Reporter: Abhishek Girish
> Attachments: drillbit.log.txt
>
>
> Query with an ORDER BY clause on a flattened column fails:
> {code}
> > select `name`, `type`, flatten(kvgen(`compliments`)) as `compliments` from 
> > `user` order by `name`, `type`, `compliments` limit 2;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
> Error in expression at index 2.  Error: Missing function implementation: 
> [hash64asdouble(MAP-REQUIRED, BIGINT-REQUIRED)].  Full expression: null..
> Fragment 3:0
> [Error Id: 3b3d3224-953a-46a2-8caa-fa6949e58ffd on abhi1:31010] 
> (state=,code=0)
> {code}
> Query without order by on the flatten column executes fine. 
> {code}
> > select `name`, `type`, flatten(kvgen(`compliments`)) as `compliments` from 
> > `user` order by `name`, `type` limit 2;
> ++---++
> |name| type  |  compliments   |
> ++---++
> |  Kurt  | user  | {"key":"cute","value":1.0} |
> |  Kurt  | user  | {"key":"writer","value":1.0}   |
> ++---++
> 2 rows selected (4.239 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-4145) IndexOutOfBoundsException raised during select * query on S3 csv file

2015-12-02 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips reassigned DRILL-4145:
--

Assignee: Steven Phillips

> IndexOutOfBoundsException raised during select * query on S3 csv file
> -
>
> Key: DRILL-4145
> URL: https://issues.apache.org/jira/browse/DRILL-4145
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.3.0
> Environment: Drill 1.3.0 on a 3 node distributed-mode cluster on AWS.
> Data files on S3.
> S3 storage plugin configuration:
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "s3a://",
>   "workspaces": {
> "root": {
>   "location": "/",
>   "writable": false,
>   "defaultInputFormat": null
> },
> "views": {
>   "location": "/processed",
>   "writable": true,
>   "defaultInputFormat": null
> },
> "tmp": {
>   "location": "/tmp",
>   "writable": true,
>   "defaultInputFormat": null
> }
>   },
>   "formats": {
> "psv": {
>   "type": "text",
>   "extensions": [
> "tbl"
>   ],
>   "delimiter": "|"
> },
> "csv": {
>   "type": "text",
>   "extensions": [
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> },
> "tsv": {
>   "type": "text",
>   "extensions": [
> "tsv"
>   ],
>   "delimiter": "\t"
> },
> "parquet": {
>   "type": "parquet"
> },
> "json": {
>   "type": "json"
> },
> "avro": {
>   "type": "avro"
> },
> "sequencefile": {
>   "type": "sequencefile",
>   "extensions": [
> "seq"
>   ]
> },
> "csvh": {
>   "type": "text",
>   "extensions": [
> "csvh",
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> }
>   }
> }
>Reporter: Peter McTaggart
>Assignee: Steven Phillips
> Attachments: apps1-bad.csv, apps1.csv
>
>
> When trying to query (via sqlline or WebUI) a .csv file I am getting an 
> IndexOutofBoundsException:
> {noformat} 0: jdbc:drill:> select * from 
> s3data.root.`staging/data/apps1-bad.csv` limit 1;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 
> (expected: range(0, 16384))
> Fragment 0:0
> [Error Id: be9856d2-0b80-4b9c-94a4-a1ca38ec5db0 on 
> ip-X.compute.internal:31010] (state=,code=0)
> 0: jdbc:drill:> select * from s3data.root.`staging/data/apps1.csv` limit 1;
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
> | FIELD_1  |   FIELD_2| FIELD_3  | FIELD_4  | FIELD_5  |  FIELD_6 
>   | FIELD_7  |  FIELD_8   | FIELD_9  |   FIELD_10   | FIELD_11  |   
> FIELD_12   | FIELD_13  | FIELD_14  | FIELD_15  | FIELD_16  | FIELD_17  | 
> FIELD_18  | FIELD_19  |   FIELD_20   | FIELD_21  | FIELD_22  | 
> FIELD_23  | FIELD_24  | FIELD_25  | FIELD_26  | FIELD_27  | FIELD_28  | 
> FIELD_29  | FIELD_30  | FIELD_31  | FIELD_32  | FIELD_33  | FIELD_34  | 
> FIELD_35  |
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
> | 489517   | 27/10/2015 02:05:27  | 261  | 1130232  | 0| 
> 925630488  | 0| 925630488  | -1   | 19531580547  |   | 
> 27/10/2015 02:00:00  |   | 30| 300   | 0 | 0  
>|   |   | 27/10/2015 02:05:27  | 0 | 1 | 0 
> | 35.0  |   |   |   | 505   | 872.0   
>   |   | aBc   |   |   |   |   |
>

[jira] [Commented] (DRILL-4145) IndexOutOfBoundsException raised during select * query on S3 csv file

2015-12-02 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035589#comment-15035589
 ] 

Steven Phillips commented on DRILL-4145:


There is a bug in the case where there is an empty string for the last field. 
Basically, when the parser sees the pattern , 
the parser calls the "endEmptyField()" method of the TextInput. This was ok 
when using the RepeatedVarCharInput, because calling this method resulted in an 
empty string element being added to the array. But in the FieldVarCharOutput, 
ending the field doesn't do anything unless you first start the field.

> IndexOutOfBoundsException raised during select * query on S3 csv file
> -
>
> Key: DRILL-4145
> URL: https://issues.apache.org/jira/browse/DRILL-4145
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.3.0
> Environment: Drill 1.3.0 on a 3 node distributed-mode cluster on AWS.
> Data files on S3.
> S3 storage plugin configuration:
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "s3a://",
>   "workspaces": {
> "root": {
>   "location": "/",
>   "writable": false,
>   "defaultInputFormat": null
> },
> "views": {
>   "location": "/processed",
>   "writable": true,
>   "defaultInputFormat": null
> },
> "tmp": {
>   "location": "/tmp",
>   "writable": true,
>   "defaultInputFormat": null
> }
>   },
>   "formats": {
> "psv": {
>   "type": "text",
>   "extensions": [
> "tbl"
>   ],
>   "delimiter": "|"
> },
> "csv": {
>   "type": "text",
>   "extensions": [
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> },
> "tsv": {
>   "type": "text",
>   "extensions": [
> "tsv"
>   ],
>   "delimiter": "\t"
> },
> "parquet": {
>   "type": "parquet"
> },
> "json": {
>   "type": "json"
> },
> "avro": {
>   "type": "avro"
> },
> "sequencefile": {
>   "type": "sequencefile",
>   "extensions": [
> "seq"
>   ]
> },
> "csvh": {
>   "type": "text",
>   "extensions": [
> "csvh",
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> }
>   }
> }
>Reporter: Peter McTaggart
> Attachments: apps1-bad.csv, apps1.csv
>
>
> When trying to query (via sqlline or WebUI) a .csv file I am getting an 
> IndexOutofBoundsException:
> {noformat} 0: jdbc:drill:> select * from 
> s3data.root.`staging/data/apps1-bad.csv` limit 1;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 
> (expected: range(0, 16384))
> Fragment 0:0
> [Error Id: be9856d2-0b80-4b9c-94a4-a1ca38ec5db0 on 
> ip-X.compute.internal:31010] (state=,code=0)
> 0: jdbc:drill:> select * from s3data.root.`staging/data/apps1.csv` limit 1;
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
> | FIELD_1  |   FIELD_2| FIELD_3  | FIELD_4  | FIELD_5  |  FIELD_6 
>   | FIELD_7  |  FIELD_8   | FIELD_9  |   FIELD_10   | FIELD_11  |   
> FIELD_12   | FIELD_13  | FIELD_14  | FIELD_15  | FIELD_16  | FIELD_17  | 
> FIELD_18  | FIELD_19  |   FIELD_20   | FIELD_21  | FIELD_22  | 
> FIELD_23  | FIELD_24  | FIELD_25  | FIELD_26  | FIELD_27  | FIELD_28  | 
> FIELD_29  | FIELD_30  | FIELD_31  | FIELD_32  | FIELD_33  | FIELD_34  | 
> FIELD_35  |
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
> | 489517   | 27/10/2015 02:05:27  | 261  | 1130232  | 0| 
> 925630488  | 0| 925630488  | -1   | 19531580547  |   | 
> 27/10/2015 02:00:00  |   | 30| 300   | 0 | 0  
>|   |   | 27/10/2015 02:05:27  | 0 | 1 | 0 
> | 35.0  |   |   |   | 505   | 872.0   
>   |   | aBc   |   |   |   |   |
>

[jira] [Updated] (DRILL-4145) IndexOutOfBoundsException raised during select * query on S3 csv file

2015-12-02 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-4145:
---
Assignee: Jacques Nadeau  (was: Steven Phillips)

> IndexOutOfBoundsException raised during select * query on S3 csv file
> -
>
> Key: DRILL-4145
> URL: https://issues.apache.org/jira/browse/DRILL-4145
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.3.0
> Environment: Drill 1.3.0 on a 3 node distributed-mode cluster on AWS.
> Data files on S3.
> S3 storage plugin configuration:
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "s3a://",
>   "workspaces": {
> "root": {
>   "location": "/",
>   "writable": false,
>   "defaultInputFormat": null
> },
> "views": {
>   "location": "/processed",
>   "writable": true,
>   "defaultInputFormat": null
> },
> "tmp": {
>   "location": "/tmp",
>   "writable": true,
>   "defaultInputFormat": null
> }
>   },
>   "formats": {
> "psv": {
>   "type": "text",
>   "extensions": [
> "tbl"
>   ],
>   "delimiter": "|"
> },
> "csv": {
>   "type": "text",
>   "extensions": [
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> },
> "tsv": {
>   "type": "text",
>   "extensions": [
> "tsv"
>   ],
>   "delimiter": "\t"
> },
> "parquet": {
>   "type": "parquet"
> },
> "json": {
>   "type": "json"
> },
> "avro": {
>   "type": "avro"
> },
> "sequencefile": {
>   "type": "sequencefile",
>   "extensions": [
> "seq"
>   ]
> },
> "csvh": {
>   "type": "text",
>   "extensions": [
> "csvh",
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> }
>   }
> }
>Reporter: Peter McTaggart
>Assignee: Jacques Nadeau
> Attachments: apps1-bad.csv, apps1.csv
>
>
> When trying to query (via sqlline or WebUI) a .csv file I am getting an 
> IndexOutofBoundsException:
> {noformat} 0: jdbc:drill:> select * from 
> s3data.root.`staging/data/apps1-bad.csv` limit 1;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 
> (expected: range(0, 16384))
> Fragment 0:0
> [Error Id: be9856d2-0b80-4b9c-94a4-a1ca38ec5db0 on 
> ip-X.compute.internal:31010] (state=,code=0)
> 0: jdbc:drill:> select * from s3data.root.`staging/data/apps1.csv` limit 1;
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
> | FIELD_1  |   FIELD_2| FIELD_3  | FIELD_4  | FIELD_5  |  FIELD_6 
>   | FIELD_7  |  FIELD_8   | FIELD_9  |   FIELD_10   | FIELD_11  |   
> FIELD_12   | FIELD_13  | FIELD_14  | FIELD_15  | FIELD_16  | FIELD_17  | 
> FIELD_18  | FIELD_19  |   FIELD_20   | FIELD_21  | FIELD_22  | 
> FIELD_23  | FIELD_24  | FIELD_25  | FIELD_26  | FIELD_27  | FIELD_28  | 
> FIELD_29  | FIELD_30  | FIELD_31  | FIELD_32  | FIELD_33  | FIELD_34  | 
> FIELD_35  |
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
> | 489517   | 27/10/2015 02:05:27  | 261  | 1130232  | 0| 
> 925630488  | 0| 925630488  | -1   | 19531580547  |   | 
> 27/10/2015 02:00:00  |   | 30| 300   | 0 | 0  
>|   |   | 27/10/2015 02:05:27  | 0 | 1 | 0 
> | 35.0  |   |   |   | 505   | 872.0   
>   |   | aBc   |   |   |   |   |
>

[jira] [Commented] (DRILL-2419) UDF that returns string representation of expression type

2015-12-01 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034828#comment-15034828
 ] 

Steven Phillips commented on DRILL-2419:


Yes, the typeof function is in the UnionFunctions class.

> UDF that returns string representation of expression type
> -
>
> Key: DRILL-2419
> URL: https://issues.apache.org/jira/browse/DRILL-2419
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Victoria Markman
>Assignee: Mehant Baid
> Fix For: Future
>
>
> Suggested name: typeof (credit goes to Aman)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4081) Handle schema changes in ExternalSort

2015-11-12 Thread Steven Phillips (JIRA)

Steven Phillips created DRILL-4081:
--

 Summary: Handle schema changes in ExternalSort
 Key: DRILL-4081
 URL: https://issues.apache.org/jira/browse/DRILL-4081
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Steven Phillips
Assignee: Steven Phillips


This improvement will make use of the Union vector to handle schema changes. 
When a new schema appears, the schema will be "merged" with the previous 
schema. The result will be a new schema that uses Union type to store the 
columns where this is a type conflict. All of the batches (including the 
batches that have already arrived) will be coerced into this new schema.

A new comparison function will be included to handle the comparison of Union 
type. Comparison of union type will work as follows:

1. All numeric types can be mutually compared, and will be compared using Drill 
implicit cast rules.

2. All other types will not be compared against other types, but only among 
values of the same type.

3. There will be an overall precedence of types with regards to ordering. This 
precedence is not yet defined, but will be as part of the work on this issue.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4054) convert_from(,'JSON') gives JsonParseException

2015-11-09 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997926#comment-14997926
 ] 

Steven Phillips commented on DRILL-4054:


(1) I agree the message is bad. If that's the issue, please be explicit. It 
wasn't clear from the description whether there was an actual bug or just a 
request for better error message.

(2) This doesn't seem to have anything to do with the original bug. In fact, 
this isn't even a bug. The convert_from function requires a varbinary or 
varchar input. It is not possible to perform this function against a MAP type.

> convert_from(,'JSON') gives JsonParseException
> ---
>
> Key: DRILL-4054
> URL: https://issues.apache.org/jira/browse/DRILL-4054
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.3.0
>Reporter: Khurram Faraaz
>
> convert_from(,'JSON') gives JsonParseException
> sys.version => 3a73f098
> Drill 1.3
> 4 node cluster CentOS
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select columns[3], convert_from(CAST(columns[3] 
> AS VARCHAR(64)),'JSON') json FROM `allData.csv`;
> Error: SYSTEM ERROR: JsonParseException: Unrecognized token 
> 'AXCB': was expecting 
> ('true', 'false' or 'null')
>  at [Source: 
> org.apache.drill.exec.vector.complex.fn.DrillBufInputStream@5441715d; line: 
> 1, column: 105]
> Fragment 0:0
> [Error Id: 7f8cb677-20e9-4e99-bbec-3ada707671ee on centos-03.qa.lab:31010] 
> (state=,code=0)
> Stack trace from drillbit.log
> [Error Id: 7f8cb677-20e9-4e99-bbec-3ada707671ee on centos-03.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> JsonParseException: Unrecognized token 
> 'AXCB': was expecting 
> ('true', 'false' or 'null')
>  at [Source: 
> org.apache.drill.exec.vector.complex.fn.DrillBufInputStream@5441715d; line: 
> 1, column: 105]
> Fragment 0:0
> [Error Id: 7f8cb677-20e9-4e99-bbec-3ada707671ee on centos-03.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.3.0.jar:1.3.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_85]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_85]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85]
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error 
> while converting from JSON.
> at 
> org.apache.drill.exec.test.generated.ProjectorGen6.doEval(ProjectorTemplate.java:126)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.ProjectorGen6.projectRecords(ProjectorTemplate.java:62)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:174)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:131)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:156)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
> ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:80)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
> ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:256)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:250)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at java.security.AccessController.doPrivileged(Native Method) 
> ~[na:1.7.0_85]
> at javax.security.auth.Subject.doAs(Subject.java:415)

[jira] [Commented] (DRILL-4054) convert_from(,'JSON') gives JsonParseException

2015-11-09 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997896#comment-14997896
 ] 

Steven Phillips commented on DRILL-4054:


Could you explain what the bug is here?

> convert_from(,'JSON') gives JsonParseException
> ---
>
> Key: DRILL-4054
> URL: https://issues.apache.org/jira/browse/DRILL-4054
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.3.0
>Reporter: Khurram Faraaz
>
> convert_from(,'JSON') gives JsonParseException
> sys.version => 3a73f098
> Drill 1.3
> 4 node cluster CentOS
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select columns[3], convert_from(CAST(columns[3] 
> AS VARCHAR(64)),'JSON') json FROM `allData.csv`;
> Error: SYSTEM ERROR: JsonParseException: Unrecognized token 
> 'AXCB': was expecting 
> ('true', 'false' or 'null')
>  at [Source: 
> org.apache.drill.exec.vector.complex.fn.DrillBufInputStream@5441715d; line: 
> 1, column: 105]
> Fragment 0:0
> [Error Id: 7f8cb677-20e9-4e99-bbec-3ada707671ee on centos-03.qa.lab:31010] 
> (state=,code=0)
> Stack trace from drillbit.log
> [Error Id: 7f8cb677-20e9-4e99-bbec-3ada707671ee on centos-03.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> JsonParseException: Unrecognized token 
> 'AXCB': was expecting 
> ('true', 'false' or 'null')
>  at [Source: 
> org.apache.drill.exec.vector.complex.fn.DrillBufInputStream@5441715d; line: 
> 1, column: 105]
> Fragment 0:0
> [Error Id: 7f8cb677-20e9-4e99-bbec-3ada707671ee on centos-03.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.3.0.jar:1.3.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_85]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_85]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85]
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error 
> while converting from JSON.
> at 
> org.apache.drill.exec.test.generated.ProjectorGen6.doEval(ProjectorTemplate.java:126)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.ProjectorGen6.projectRecords(ProjectorTemplate.java:62)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:174)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:131)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:156)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
> ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:80)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
> ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:256)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:250)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at java.security.AccessController.doPrivileged(Native Method) 
> ~[na:1.7.0_85]
> at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_85]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
>  ~[hadoop-common-2.7.0-mapr-1506.jar:na]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> ... 4 common frames omitted
> Caused by:

[jira] [Commented] (DRILL-3845) Partition sender shouldn't send the "last batch" to a receiver that sent a "receiver finished" to the sender

2015-11-05 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992319#comment-14992319
 ] 

Steven Phillips commented on DRILL-3845:


Based on your comment, it seems like the upstream fragment that runs for an 
hour is supposed to be terminated. Is that correct? If so, that seems to be the 
real problem. Why is it not terminating?

> Partition sender shouldn't send the "last batch" to a receiver that sent a 
> "receiver finished" to the sender
> 
>
> Key: DRILL-3845
> URL: https://issues.apache.org/jira/browse/DRILL-3845
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
> Fix For: 1.4.0
>
> Attachments: 29c45a5b-e2b9-72d6-89f2-d49ba88e2939.sys.drill
>
>
> Even if a receiver has finished and informed the corresponding partition 
> sender, the sender will still try to send a "last batch" to the receiver when 
> it's done. In most cases this is fine as those batches will be silently 
> dropped by the receiving DataServer, but if a receiver has finished +10 
> minutes ago, DataServer will throw an exception as it couldn't find the 
> corresponding FragmentManager (WorkEventBus has a 10 minutes recentlyFinished 
> cache).
> DRILL-2274 is a reproduction for this case (after the corresponding fix is 
> applied).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4041) Parquet library update causing random "Buffer has negative reference count"

2015-11-05 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992873#comment-14992873
 ] 

Steven Phillips commented on DRILL-4041:


The first error definitely looks like the same thing. As for the accounting 
error, I don't know if that's related. It could just be a side-effect of the 
first.

> Parquet library update causing random "Buffer has negative reference count"
> ---
>
> Key: DRILL-4041
> URL: https://issues.apache.org/jira/browse/DRILL-4041
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.3.0
>Reporter: Rahul Challapalli
>Assignee: Steven Phillips
>Priority: Critical
>
> git commit # 39582bd60c9e9b16aba4f099d434e927e7e5
> After the parquet library update commit, we started seeing the below error 
> randomly causing failures in the  Extended Functional Suite.
> {code}
> Failed with exception
> java.lang.IllegalArgumentException: Buffer has negative reference count.
>   at 
> oadd.com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
>   at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:250)
>   at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:259)
>   at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:259)
>   at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:259)
>   at oadd.io.netty.buffer.DrillBuf.release(DrillBuf.java:239)
>   at 
> oadd.org.apache.drill.exec.vector.BaseDataValueVector.clear(BaseDataValueVector.java:39)
>   at 
> oadd.org.apache.drill.exec.vector.NullableIntVector.clear(NullableIntVector.java:150)
>   at 
> oadd.org.apache.drill.exec.record.SimpleVectorWrapper.clear(SimpleVectorWrapper.java:84)
>   at 
> oadd.org.apache.drill.exec.record.VectorContainer.zeroVectors(VectorContainer.java:312)
>   at 
> oadd.org.apache.drill.exec.record.VectorContainer.clear(VectorContainer.java:296)
>   at 
> oadd.org.apache.drill.exec.record.RecordBatchLoader.clear(RecordBatchLoader.java:183)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.cleanup(DrillResultSetImpl.java:139)
>   at org.apache.drill.jdbc.impl.DrillCursor.close(DrillCursor.java:333)
>   at 
> oadd.net.hydromatic.avatica.AvaticaResultSet.close(AvaticaResultSet.java:110)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.close(DrillResultSetImpl.java:169)
>   at 
> org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:233)
>   at 
> org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:89)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3992) Unable to query Oracle DB using JDBC Storage Plug-In

2015-11-01 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984635#comment-14984635
 ] 

Steven Phillips commented on DRILL-3992:


+1

> Unable to query Oracle DB using JDBC Storage Plug-In
> 
>
> Key: DRILL-3992
> URL: https://issues.apache.org/jira/browse/DRILL-3992
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
> Environment: Windows 7 Enterprise 64-bit, Oracle 10g, Teradata 15.00
>Reporter: Eric Roma
>Priority: Minor
>  Labels: newbie
> Fix For: 1.2.0
>
>
> *See External Issue URL for Stack Overflow Post*
> *Appears to be similar issue at 
> http://stackoverflow.com/questions/33370438/apache-drill-1-2-and-sql-server-jdbc*
> Using Apache Drill v1.2 and Oracle Database 10g Enterprise Edition Release 
> 10.2.0.4.0 - 64bit in embedded mode.
> I'm curious if anyone has had any success connecting Apache Drill to an 
> Oracle DB. I've updated the drill-override.conf with the following 
> configurations (per documents):
> drill.exec: {
>   cluster-id: "drillbits1",
>   zk.connect: "localhost:2181",
>   drill.exec.sys.store.provider.local.path = "/mypath"
> }
> and placed the ojdbc6.jar in \apache-drill-1.2.0\jars\3rdparty. I can 
> successfully create the storage plug-in:
> {
>   "type": "jdbc",
>   "driver": "oracle.jdbc.driver.OracleDriver",
>   "url": "jdbc:oracle:thin:@::",
>   "username": "USERNAME",
>   "password": "PASSWORD",
>   "enabled": true
> }
> but when I issue a query such as:
> select * from ..`dual`; 
> I get the following error:
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: 
> From line 1, column 15 to line 1, column 20: Table 
> '..dual' not found [Error Id: 
> 57a4153c-6378-4026-b90c-9bb727e131ae on :].
> I've tried to query other schema/tables and get a similar result. I've also 
> tried connecting to Teradata and get the same error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3956) TEXT MySQL type unsupported

2015-11-01 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984629#comment-14984629
 ] 

Steven Phillips commented on DRILL-3956:


+1

> TEXT MySQL type unsupported
> ---
>
> Key: DRILL-3956
> URL: https://issues.apache.org/jira/browse/DRILL-3956
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.2.0
>Reporter: Andrew
>Assignee: Steven Phillips
> Attachments: DRILL-3956.patch
>
>
> The JDBC storage plugin will fail with an NPE when querying a MySQL table 
> that has a 'TEXT' column. The underlying problem appears to be that Calcite 
> has no notion of this type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3995) Scalar replacement bug with Common Subexpression Elimination

2015-10-28 Thread Steven Phillips (JIRA)

Steven Phillips created DRILL-3995:
--

 Summary: Scalar replacement bug with Common Subexpression 
Elimination
 Key: DRILL-3995
 URL: https://issues.apache.org/jira/browse/DRILL-3995
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips


The following query:
 {code}
select t1.full_name from cp.`employee.json` t1, cp.`department.json` t2 where 
t1.department_id = t2.department_id and t1.position_id = t2.department_id
{code}

fails with the following:

org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
RuntimeException: Error at instruction 43: Expected an object reference, but 
found . setValue(II)V
0 R I I . . . .  :  :L0
1 R I I . . . .  :  : LINENUMBER 249 L0
2 R I I . . . .  :  : ICONST_0
3 R I I . . . .  : I  : ISTORE 3
4 R I I I . . .  :  : LCONST_0
5 R I I I . . .  : J  : LSTORE 4
6 R I I I J . .  :  :L1
7 R I I I J . .  :  : LINENUMBER 251 L1
8 R I I I J . .  :  : ALOAD 0
9 R I I I J . .  : R  : GETFIELD 
org/apache/drill/exec/test/generated/HashTableGen2$BatchHolder.vv20 : 
Lorg/apache/drill/exec/vector/NullableBigIntVector;
00010 R I I I J . .  : R  : INVOKEVIRTUAL 
org/apache/drill/exec/vector/NullableBigIntVector.getAccessor 
()Lorg/apache/drill/exec/vector/NullableBigIntVector$Accessor;
00011 R I I I J . .  : R  : ILOAD 1
00012 R I I I J . .  : R I  : INVOKEVIRTUAL 
org/apache/drill/exec/vector/NullableBigIntVector$Accessor.isSet (I)I
00013 R I I I J . .  : I  : ISTORE 3
00014 R I I I J . .  :  :L2
00015 R I I I J . .  :  : LINENUMBER 252 L2
00016 R I I I J . .  :  : ILOAD 3
00017 R I I I J . .  : I  : ICONST_1
00018 R I I I J . .  : I I  : IF_ICMPNE L3
00019 R I I I J . .  :  :L4
00020 ? : LINENUMBER 253 L4
00021 ? : ALOAD 0
00022 ? : GETFIELD 
org/apache/drill/exec/test/generated/HashTableGen2$BatchHolder.vv20 : 
Lorg/apache/drill/exec/vector/NullableBigIntVector;
00023 ? : INVOKEVIRTUAL 
org/apache/drill/exec/vector/NullableBigIntVector.getAccessor 
()Lorg/apache/drill/exec/vector/NullableBigIntVector$Accessor;
00024 ? : ILOAD 1
00025 ? : INVOKEVIRTUAL 
org/apache/drill/exec/vector/NullableBigIntVector$Accessor.get (I)J
00026 ? : LSTORE 4
00027 R I I I J . .  :  :L3
00028 R I I I J . .  :  : LINENUMBER 256 L3
00029 R I I I J . .  :  : ILOAD 3
00030 R I I I J . .  : I  : ICONST_0
00031 R I I I J . .  : I I  : IF_ICMPEQ L5
00032 R I I I J . .  :  :L6
00033 ? : LINENUMBER 257 L6
00034 ? : ALOAD 0
00035 ? : GETFIELD 
org/apache/drill/exec/test/generated/HashTableGen2$BatchHolder.vv24 : 
Lorg/apache/drill/exec/vector/NullableBigIntVector;
00036 ? : INVOKEVIRTUAL 
org/apache/drill/exec/vector/NullableBigIntVector.getMutator 
()Lorg/apache/drill/exec/vector/NullableBigIntVector$Mutator;
00037 ? : ILOAD 2
00038 ? : ILOAD 3
00039 ? : LLOAD 4
00040 ? : INVOKEVIRTUAL 
org/apache/drill/exec/vector/NullableBigIntVector$Mutator.set (IIJ)V
00041 R I I I J . .  :  :L5
00042 R I I I J . .  :  : LINENUMBER 259 L5
00043 R I I I J . .  :  : ALOAD 6
00044 ? : GETFIELD 
org/apache/drill/exec/expr/holders/NullableBigIntHolder.isSet : I
00045 ? : ICONST_0
00046 ? : IF_ICMPEQ L7
00047 ? :L8
00048 ? : LINENUMBER 260 L8
00049 ? : ALOAD 0
00050 ? : GETFIELD 
org/apache/drill/exec/test/generated/HashTableGen2$BatchHolder.vv27 : 
Lorg/apache/drill/exec/vector/NullableBigIntVector;
00051 ? : INVOKEVIRTUAL 
org/apache/drill/exec/vector/NullableBigIntVector.getMutator 
()Lorg/apache/drill/exec/vector/NullableBigIntVector$Mutator;
00052 ? : ILOAD 2
00053 ? : ALOAD 6
00054 ? : GETFIELD 
org/apache/drill/exec/expr/holders/NullableBigIntHolder.isSet : I
00055 ? : ALOAD 6
00056 ? : GETFIELD 
org/apache/drill/exec/expr/holders/NullableBigIntHolder.value : J
00057 ? : INVOKEVIRTUAL 
org/apache/drill/exec/vector/NullableBigIntVector$Mutator.set (IIJ)V
00058 ? :L7
00059 ? : LINENUMBER 245 L7
00060 ? : RETURN
00061 ? :L9

when common subexpressions are eliminated (see DRILL-3912).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3975) Partition Planning rule causes query failure due to IndexOutOfBoundsException on HDFS

2015-10-24 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972984#comment-14972984
 ] 

Steven Phillips commented on DRILL-3975:


My approach has been to remove the scheme and authority from the paths any time 
I encounter code that uses the path as a key, or does any sort of string 
comparison. This is an area where I think we need to clean up. I don't think we 
are very consistent throughout the code base in how was handle paths.

The usual trick I use to strip away the schema and authority is the method 
Path.getPathWithoutSchemeAndAuthority(Path p). If I have String objects and not 
Path objects, I will convert the String to a path, use the utility method to 
remove scheme and authority, and then call toString().

> Partition Planning rule causes query failure due to IndexOutOfBoundsException 
> on HDFS
> -
>
> Key: DRILL-3975
> URL: https://issues.apache.org/jira/browse/DRILL-3975
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jacques Nadeau
>
> In attempting to run the extended test suite provided by MapR, there are a 
> large number of queries that fail due to issues in the PruneScanRule and 
> specifically the DFSPartitionLocation constructor line 31. It is likely due 
> to issues with the code that are related to running on HDFS where this code 
> path has apparently not been tested.
> An example test query this type of failure occurred: 
> /src/drill-test-framework/resources/Functional/ctas/ctas_auto_partition/tpch0.01_multiple_partitions/data/q11.q
> Example stack trace below:
> {code}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> StringIndexOutOfBoundsException: String index out of range: -12
> [Error Id: f2941267-49b1-4f67-a17f-610ffb13fcb7 on 
> ip-172-31-30-32.us-west-2.compute.internal:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:742)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:841)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:786)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) 
> [drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:788)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:894) 
> [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255) 
> [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_85]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_85]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85]
> Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
> exception during fragment initialization: Internal error: Error while 
> applying rule PruneScanRule:Filter_On_Scan_Parquet, args 
> [rel#43148:DrillFilterRel.LOGICAL.ANY([]).[](input=rel#43147:Subset#4.LOGICAL.ANY([]).[],condition==($0,
>  1)), rel#43241:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, 
> ctasAutoPartition, 
> tpch_multiple_partitions/lineitem_twopart_ordered2],groupscan=ParquetGroupScan
>  [entries=[ReadEntryWithPath 
> [path=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2]],
>  
> selectionRoot=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2,
>  numFiles=1, usedMetadataFile=false, columns=[`l_modline`, `l_moddate`]])]
> ... 4 common frames omitted
> Caused by: java.lang.AssertionError: Internal error: Error while applying 
> rule PruneScanRule:Filter_On_Scan_Parquet, args 
> [rel#43148:DrillFilterRel.LOGICAL.ANY([]).[](input=rel#43147:Subset#4.LOGICAL.ANY([]).[],condition==($0,
>  1)), rel#43241:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, 
> ctasAutoPartition, 
> tpch_multiple_partitions/lineitem_twopart_ordered2],groupscan=ParquetGroupScan
>  [entries=[ReadEntryWithPath 
>

[jira] [Commented] (DRILL-3229) Create a new EmbeddedVector

2015-10-23 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971766#comment-14971766
 ] 

Steven Phillips commented on DRILL-3229:


Regarding the list writer, I know it is a bit confusing, so I will try to give 
a better explanation for how it works. It confuses me at times as well.

The type promotion was designed with the possibility of allowing other 
promotions in mind, but I am currently only doing promotion to Union. We should 
have a discussion about what other promotions we want to allow.

Screen currently returns a Union type to the user. This is an area that will 
require additional enhancement. The DrillClient has no problem dealing with a 
Union vector. The jdbc driver, on the other hand, has only limited support for 
a Union type, currently. I think we might need to add a feature similar to what 
we have with complex types, which will determine if the client is able to 
handle Union types, and convert to json if it doesn't. So metadata queries will 
also return a Union type.

As for case statements, I am leaning more toward a general philosophy of trying 
as much as we can to not fail queries, and so if there is something Drill can 
do to execute a query, it should do that. So I am leaning toward option 3.

An untyped-null type is supported as part of a Union vector. This null value is 
encoded in the 'type' vector. This patch does not introduce a standalone 
Untyped Null Vector. That will be a separate patch.

I will update the design document with what I have said here.

> Create a new EmbeddedVector
> ---
>
> Key: DRILL-3229
> URL: https://issues.apache.org/jira/browse/DRILL-3229
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Codegen, Execution - Data Types, Execution - 
> Relational Operators, Functions - Drill
>Reporter: Jacques Nadeau
>Assignee: Hanifi Gunes
> Fix For: Future
>
>
> Embedded Vector will leverage a binary encoding for holding information about 
> type for each individual field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3912) Common subexpression elimination in code generation

2015-10-23 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3912:
---
Issue Type: Improvement  (was: Bug)

> Common subexpression elimination in code generation
> ---
>
> Key: DRILL-3912
> URL: https://issues.apache.org/jira/browse/DRILL-3912
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Steven Phillips
>Assignee: Jinfeng Ni
>
> Drill currently will evaluate the full expression tree, even if there are 
> redundant subtrees. Many of these redundant evaluations can be eliminated by 
> reusing the results from previously evaluated expression trees.
> For example,
> {code}
> select a + 1, (a + 1)* (a - 1) from t
> {code}
> Will compute the entire (a + 1) expression twice. With CSE, it will only be 
> evaluated once.
> The benefit will be reducing the work done when evaluating expressions, as 
> well as reducing the amount of code that is generated, which could also lead 
> to better JIT optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3963) Read raw key value bytes from sequence files

2015-10-23 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3963:
---
Issue Type: New Feature  (was: Bug)

> Read raw key value bytes from sequence files
> 
>
> Key: DRILL-3963
> URL: https://issues.apache.org/jira/browse/DRILL-3963
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: amit hadke
>Assignee: amit hadke
>
> Sequence files store list of key-value pairs. Keys/values are of type hadoop 
> writable.
> Provide a format plugin that reads raw bytes out of sequence files which can 
> be further deserialized by a udf(from hadoop writable -> drill type)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3232) Modify existing vectors to allow type promotion

2015-10-22 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969616#comment-14969616
 ] 

Steven Phillips commented on DRILL-3232:


Design document: https://gist.github.com/StevenMPhillips/41b4a1bd745943d508d2

> Modify existing vectors to allow type promotion
> ---
>
> Key: DRILL-3232
> URL: https://issues.apache.org/jira/browse/DRILL-3232
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Codegen, Execution - Data Types, Execution - 
> Relational Operators, Functions - Drill
>Reporter: Steven Phillips
>Assignee: Hanifi Gunes
> Fix For: 1.3.0
>
>
> Support the ability for existing vectors to be promoted similar to supported 
> implicit casting rules.
> For example:
> INT > DOUBLE > STRING > EMBEDDED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3229) Create a new EmbeddedVector

2015-10-22 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969613#comment-14969613
 ] 

Steven Phillips commented on DRILL-3229:


Design document: https://gist.github.com/StevenMPhillips/41b4a1bd745943d508d2

> Create a new EmbeddedVector
> ---
>
> Key: DRILL-3229
> URL: https://issues.apache.org/jira/browse/DRILL-3229
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Codegen, Execution - Data Types, Execution - 
> Relational Operators, Functions - Drill
>Reporter: Jacques Nadeau
>Assignee: Steven Phillips
> Fix For: Future
>
>
> Embedded Vector will leverage a binary encoding for holding information about 
> type for each individual field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3228) Implement Embedded Type

2015-10-21 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968144#comment-14968144
 ] 

Steven Phillips commented on DRILL-3228:


Design document for Union Type:

https://gist.github.com/StevenMPhillips/41b4a1bd745943d508d2

> Implement Embedded Type
> ---
>
> Key: DRILL-3228
> URL: https://issues.apache.org/jira/browse/DRILL-3228
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Codegen, Execution - Data Types, Execution - 
> Relational Operators, Functions - Drill
>Reporter: Jacques Nadeau
>Assignee: Steven Phillips
> Fix For: 1.3.0
>
>
> An Umbrella for the implementation of Embedded types within Drill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3232) Modify existing vectors to allow type promotion

2015-10-19 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3232:
---
Fix Version/s: (was: Future)
   1.3.0

> Modify existing vectors to allow type promotion
> ---
>
> Key: DRILL-3232
> URL: https://issues.apache.org/jira/browse/DRILL-3232
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Codegen, Execution - Data Types, Execution - 
> Relational Operators, Functions - Drill
>Reporter: Steven Phillips
> Fix For: 1.3.0
>
>
> Support the ability for existing vectors to be promoted similar to supported 
> implicit casting rules.
> For example:
> INT > DOUBLE > STRING > EMBEDDED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3232) Modify existing vectors to allow type promotion

2015-10-19 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964034#comment-14964034
 ] 

Steven Phillips commented on DRILL-3232:


PR at https://github.com/apache/drill/pull/207

> Modify existing vectors to allow type promotion
> ---
>
> Key: DRILL-3232
> URL: https://issues.apache.org/jira/browse/DRILL-3232
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Codegen, Execution - Data Types, Execution - 
> Relational Operators, Functions - Drill
>Reporter: Steven Phillips
> Fix For: Future
>
>
> Support the ability for existing vectors to be promoted similar to supported 
> implicit casting rules.
> For example:
> INT > DOUBLE > STRING > EMBEDDED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3232) Modify existing vectors to allow type promotion

2015-10-19 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964036#comment-14964036
 ] 

Steven Phillips commented on DRILL-3232:


[~hgunes], could you please review this PR?

> Modify existing vectors to allow type promotion
> ---
>
> Key: DRILL-3232
> URL: https://issues.apache.org/jira/browse/DRILL-3232
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Codegen, Execution - Data Types, Execution - 
> Relational Operators, Functions - Drill
>Reporter: Steven Phillips
>Assignee: Hanifi Gunes
> Fix For: 1.3.0
>
>
> Support the ability for existing vectors to be promoted similar to supported 
> implicit casting rules.
> For example:
> INT > DOUBLE > STRING > EMBEDDED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-3232) Modify existing vectors to allow type promotion

2015-10-19 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips reassigned DRILL-3232:
--

Assignee: Steven Phillips

> Modify existing vectors to allow type promotion
> ---
>
> Key: DRILL-3232
> URL: https://issues.apache.org/jira/browse/DRILL-3232
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Codegen, Execution - Data Types, Execution - 
> Relational Operators, Functions - Drill
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.3.0
>
>
> Support the ability for existing vectors to be promoted similar to supported 
> implicit casting rules.
> For example:
> INT > DOUBLE > STRING > EMBEDDED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3232) Modify existing vectors to allow type promotion

2015-10-19 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3232:
---
Assignee: Hanifi Gunes  (was: Steven Phillips)

> Modify existing vectors to allow type promotion
> ---
>
> Key: DRILL-3232
> URL: https://issues.apache.org/jira/browse/DRILL-3232
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Codegen, Execution - Data Types, Execution - 
> Relational Operators, Functions - Drill
>Reporter: Steven Phillips
>Assignee: Hanifi Gunes
> Fix For: 1.3.0
>
>
> Support the ability for existing vectors to be promoted similar to supported 
> implicit casting rules.
> For example:
> INT > DOUBLE > STRING > EMBEDDED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3953) Apache Drill - Memory Issue when using against hbase db on Windows machine

2015-10-19 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964190#comment-14964190
 ] 

Steven Phillips commented on DRILL-3953:


How much data is in the "tsdb" table, and what does it look like?

If it is an OpenTSDB table, there could be thousands of unique column names, 
and Drill will create a vector and allocate memory for each one. It's possible 
that this is causing the problem.

> Apache Drill - Memory Issue when using against hbase db on Windows machine
> --
>
> Key: DRILL-3953
> URL: https://issues.apache.org/jira/browse/DRILL-3953
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Pete
>
> Trying a sandbox run using Drill on a Windows laptop with 4gbs of memory.  
> The Drill Explorer connection shows a successful execution to database (test 
> button).  When trying to connect it shows processing but never comes back.  
> When trying to run query against database Drill on drill prompt it blows up 
> with out of memory.  The query is simple enough that it shouldn't blow 
> up..
> select * from tsdbdatabase.tsdb limit 1;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-3233) Update code generation & function code to support reading and writing embedded type

2015-10-15 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips reassigned DRILL-3233:
--

Assignee: Steven Phillips

> Update code generation & function code to support reading and writing 
> embedded type
> ---
>
> Key: DRILL-3233
> URL: https://issues.apache.org/jira/browse/DRILL-3233
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Codegen, Execution - Data Types, Execution - 
> Relational Operators, Functions - Drill
>Reporter: Jacques Nadeau
>Assignee: Steven Phillips
> Fix For: Future
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3749) Upgrade Hadoop dependency to latest version (2.7.1)

2015-10-14 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3749:
---
Assignee: Jason Altekruse  (was: Steven Phillips)

> Upgrade Hadoop dependency to latest version (2.7.1)
> ---
>
> Key: DRILL-3749
> URL: https://issues.apache.org/jira/browse/DRILL-3749
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Tools, Build & Test
>Affects Versions: 1.1.0
>Reporter: Venki Korukanti
>Assignee: Jason Altekruse
> Fix For: Future
>
>
> Logging a JIRA to track and discuss upgrading Drill's Hadoop dependency 
> version. Currently Drill depends on Hadoop 2.5.0 version. Newer version of 
> Hadoop (2.7.1) has following features.
> 1) Better S3 support
> 2) Ability to check if a user has certain permissions on file/directory 
> without performing operations on the file/dir. Useful for cases like 
> DRILL-3467.
> As Drill is going to use higher version of Hadoop fileclient, there could be 
> potential issues when interacting with Hadoop services (such as HDFS) of 
> lower version than the fileclient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3712) Drill does not recognize UTF-16-LE encoding

2015-10-08 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949503#comment-14949503
 ] 

Steven Phillips commented on DRILL-3712:


I think one solution would be to write a UDF to convert from utf16 to utf8. We 
already have a function that does the reverse: CastVarCharVar16Char . 

> Drill does not recognize UTF-16-LE encoding
> ---
>
> Key: DRILL-3712
> URL: https://issues.apache.org/jira/browse/DRILL-3712
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text & CSV
>Affects Versions: 1.1.0
> Environment: OSX, likely Linux. 
>Reporter: Edmon Begoli
> Fix For: Future
>
>
> We are unable to process files that OSX identifies as character sete UTF16LE. 
>  After unzipping and converting to UTF8, we are able to process one fine.  
> There are CONVERT_TO and CONVERT_FROM commands that appear to address the 
> issue, but we were unable to make them work on a gzipped or unzipped version 
> of the UTF16 file.  We were  able to use CONVERT_FROM ok, but when we tried 
> to wrap the results of that to cast as a date, or anything else, it failed.  
> Trying to work with it natively caused the double-byte nature to appear (a 
> substring 1,4 only return the first two characters).
> I cannot post the data because it is proprietary in nature, but I am posting 
> this code that might be useful in re-creating an issue:
> {noformat}
> #!/usr/bin/env python
> """ Generates a test psv file with some text fields encoded as UTF-16-LE. """
> def write_utf16le_encoded_psv():
>   total_lines = 10
>   encoded = "Encoded B".encode("utf-16-le")
>   with open("test.psv","wb") as csv_file:
>   csv_file.write("header 1|header 2|header 3\n")
>   for i in xrange(total_lines):
>   csv_file.write("value 
> A"+str(i)+"|"+encoded+"|value C"+str(i)+"\n")
> if __name__ == "__main__":
>   write_utf16le_encoded_psv()
> {noformat}
> then:
> tar zcvf test.psv.tar.gz test.psv



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3712) Drill does not recognize UTF-16-LE encoding

2015-10-08 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949499#comment-14949499
 ] 

Steven Phillips commented on DRILL-3712:


The second column is utf16 encoded. I don't think any of our cast functions 
will deal with it properly. Nor will any of the string functions.

> Drill does not recognize UTF-16-LE encoding
> ---
>
> Key: DRILL-3712
> URL: https://issues.apache.org/jira/browse/DRILL-3712
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text & CSV
>Affects Versions: 1.1.0
> Environment: OSX, likely Linux. 
>Reporter: Edmon Begoli
> Fix For: Future
>
>
> We are unable to process files that OSX identifies as character sete UTF16LE. 
>  After unzipping and converting to UTF8, we are able to process one fine.  
> There are CONVERT_TO and CONVERT_FROM commands that appear to address the 
> issue, but we were unable to make them work on a gzipped or unzipped version 
> of the UTF16 file.  We were  able to use CONVERT_FROM ok, but when we tried 
> to wrap the results of that to cast as a date, or anything else, it failed.  
> Trying to work with it natively caused the double-byte nature to appear (a 
> substring 1,4 only return the first two characters).
> I cannot post the data because it is proprietary in nature, but I am posting 
> this code that might be useful in re-creating an issue:
> {noformat}
> #!/usr/bin/env python
> """ Generates a test psv file with some text fields encoded as UTF-16-LE. """
> def write_utf16le_encoded_psv():
>   total_lines = 10
>   encoded = "Encoded B".encode("utf-16-le")
>   with open("test.psv","wb") as csv_file:
>   csv_file.write("header 1|header 2|header 3\n")
>   for i in xrange(total_lines):
>   csv_file.write("value 
> A"+str(i)+"|"+encoded+"|value C"+str(i)+"\n")
> if __name__ == "__main__":
>   write_utf16le_encoded_psv()
> {noformat}
> then:
> tar zcvf test.psv.tar.gz test.psv



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-3912) Common subexpression elimination

2015-10-07 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips reassigned DRILL-3912:
--

Assignee: Steven Phillips

> Common subexpression elimination
> 
>
> Key: DRILL-3912
> URL: https://issues.apache.org/jira/browse/DRILL-3912
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>
> Drill currently will evaluate the full expression tree, even if there are 
> redundant subtrees. Many of these redundant evaluations can be eliminated by 
> reusing the results from previously evaluated expression trees.
> For example,
> {code}
> select a + 1, (a + 1)* (a - 1) from t
> {code}
> Will compute the entire (a + 1) expression twice. With CSE, it will only be 
> evaluated once.
> The benefit will be reducing the work done when evaluating expressions, as 
> well as reducing the amount of code that is generated, which could also lead 
> to better JIT optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3912) Common subexpression elimination

2015-10-07 Thread Steven Phillips (JIRA)

Steven Phillips created DRILL-3912:
--

 Summary: Common subexpression elimination
 Key: DRILL-3912
 URL: https://issues.apache.org/jira/browse/DRILL-3912
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips


Drill currently will evaluate the full expression tree, even if there are 
redundant subtrees. Many of these redundant evaluations can be eliminated by 
reusing the results from previously evaluated expression trees.

For example,

{code}
select a + 1, (a + 1)* (a - 1) from t
{code}

Will compute the entire (a + 1) expression twice. With CSE, it will only be 
evaluated once.

The benefit will be reducing the work done when evaluating expressions, as well 
as reducing the amount of code that is generated, which could also lead to 
better JIT optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3912) Common subexpression elimination in code generation

2015-10-07 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947810#comment-14947810
 ] 

Steven Phillips commented on DRILL-3912:


It looks like your patch does a subset of my patch. It will eliminate common 
vector read expressions in the same JBlock.

My patch will eliminate any redundant expression as long as the previously 
evaluated expression is in scope. For example, with filter:

( a + b  > 0 and ( a + b = c or a + b = d))

the expression (a + b) would currently have to be computed 3 times, and each 
reference to a and b would require accessing the corresponding vectors.

With my patch, (a + b) would only be calculated once.

> Common subexpression elimination in code generation
> ---
>
> Key: DRILL-3912
> URL: https://issues.apache.org/jira/browse/DRILL-3912
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Jinfeng Ni
>
> Drill currently will evaluate the full expression tree, even if there are 
> redundant subtrees. Many of these redundant evaluations can be eliminated by 
> reusing the results from previously evaluated expression trees.
> For example,
> {code}
> select a + 1, (a + 1)* (a - 1) from t
> {code}
> Will compute the entire (a + 1) expression twice. With CSE, it will only be 
> evaluated once.
> The benefit will be reducing the work done when evaluating expressions, as 
> well as reducing the amount of code that is generated, which could also lead 
> to better JIT optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3909) Decimal round functions corrupts input data

2015-10-07 Thread Steven Phillips (JIRA)

Steven Phillips created DRILL-3909:
--

 Summary: Decimal round functions corrupts input data
 Key: DRILL-3909
 URL: https://issues.apache.org/jira/browse/DRILL-3909
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips
 Fix For: 1.3.0


The Decimal 28 and 38 round functions, instead of creating a new buffer and 
copying data from the incoming buffer, set the output buffer equal to the input 
buffer, and then subsequently mutate the data in that buffer. This causes the 
data in the input buffer to be corrupted.

A simple example to reproduce:
{code}
$ cat a.json
{ a : "9.95678" }


0: jdbc:drill:drillbit=localhost> create table a as select cast(a as 
decimal(38,18)) a from `a.json`;
+---++
| Fragment  | Number of records written  |
+---++
| 0_0   | 1  |
+---++
1 row selected (0.206 seconds)
0: jdbc:drill:drillbit=localhost> select round(a, 9) from a;
+---+
|EXPR$0 |
+---+
| 10.0  |
+---+
1 row selected (0.121 seconds)
0: jdbc:drill:drillbit=localhost> select round(a, 11) from a;
++
| EXPR$0 |
++
| 9.957  |
++
1 row selected (0.115 seconds)
0: jdbc:drill:drillbit=localhost> select round(a, 9), round(a, 11) from a;
+---++
|EXPR$0 | EXPR$1 |
+---++
| 10.0  | 1.000  |
+---++
{code}

In the third example, there are two round expressions operating on the same 
incoming decimal vector, and you can see that the result for the second 
expression is incorrect.

Not critical because Decimal type is considered alpha right now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3912) Common subexpression elimination in code generation

2015-10-07 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3912:
---
Assignee: Jinfeng Ni  (was: Steven Phillips)

> Common subexpression elimination in code generation
> ---
>
> Key: DRILL-3912
> URL: https://issues.apache.org/jira/browse/DRILL-3912
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Jinfeng Ni
>
> Drill currently will evaluate the full expression tree, even if there are 
> redundant subtrees. Many of these redundant evaluations can be eliminated by 
> reusing the results from previously evaluated expression trees.
> For example,
> {code}
> select a + 1, (a + 1)* (a - 1) from t
> {code}
> Will compute the entire (a + 1) expression twice. With CSE, it will only be 
> evaluated once.
> The benefit will be reducing the work done when evaluating expressions, as 
> well as reducing the amount of code that is generated, which could also lead 
> to better JIT optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3912) Common subexpression elimination

2015-10-07 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947731#comment-14947731
 ] 

Steven Phillips commented on DRILL-3912:


Yes, Drill physical plans are currently trees only. What you are suggesting 
require a more general DAG execution.

This patch only deals with common expressions within operators, and does its 
work right at code-generation time.

> Common subexpression elimination
> 
>
> Key: DRILL-3912
> URL: https://issues.apache.org/jira/browse/DRILL-3912
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>
> Drill currently will evaluate the full expression tree, even if there are 
> redundant subtrees. Many of these redundant evaluations can be eliminated by 
> reusing the results from previously evaluated expression trees.
> For example,
> {code}
> select a + 1, (a + 1)* (a - 1) from t
> {code}
> Will compute the entire (a + 1) expression twice. With CSE, it will only be 
> evaluated once.
> The benefit will be reducing the work done when evaluating expressions, as 
> well as reducing the amount of code that is generated, which could also lead 
> to better JIT optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3912) Common subexpression elimination in code generation

2015-10-07 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947906#comment-14947906
 ] 

Steven Phillips commented on DRILL-3912:


1) I had not enabled CSE in hash join, so it didn't have that problem. Now that 
I have enabled in hash join, I am seeing the same SR error.

2) In this case, it looks like the ConstantFilter is causing the '1 + 2' and '1 
+ 3' parts of the expressions to be resolved first, and then 'a + 1' is no 
longer common. Duplicate vectors reads are removed, though. I think this 
behavior is probably fine.

3) I am not targeting this for 1.2. Probably for 1.3. My main motivation here 
was to solve a problem I was running into in my Union-type work. Function 
resolution when there is Union type for the input involves case statements that 
check the current type of the input, and then executes a branch based on that 
type. In this case, both the condition expression as well as both branches will 
reference the input. For example, 

1 + a

would become something like

{code}
case when typeOf(a) = int
  then 1 + cast(a as int)
when typeOf(a) = varchar
  then 1 + cast(cast(a as varchar) as int)
end
{code}

So you can see that a single reference to 'a' becomes 3 references. And 'a' 
might not just be a ValueVectorReadExpression, it could be the output from some 
other expression tree. And if an input has more than 2 types, or if a function 
has multiple Union-type inputs, the complexity of the expression increases 
dramatically, and the amount of generated code gets to be quite large. I needed 
to find some way to fix this.



> Common subexpression elimination in code generation
> ---
>
> Key: DRILL-3912
> URL: https://issues.apache.org/jira/browse/DRILL-3912
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Jinfeng Ni
>
> Drill currently will evaluate the full expression tree, even if there are 
> redundant subtrees. Many of these redundant evaluations can be eliminated by 
> reusing the results from previously evaluated expression trees.
> For example,
> {code}
> select a + 1, (a + 1)* (a - 1) from t
> {code}
> Will compute the entire (a + 1) expression twice. With CSE, it will only be 
> evaluated once.
> The benefit will be reducing the work done when evaluating expressions, as 
> well as reducing the amount of code that is generated, which could also lead 
> to better JIT optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3901) Performance regression with doing Explain of COUNT(*) over 100K files

2015-10-06 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945559#comment-14945559
 ] 

Steven Phillips commented on DRILL-3901:


I'm not sure about doing the directory expansion twice, but I do know that in 
the case where there is a metadata file, we are loading the file twice. The 
first time we read the metadata file, we should pass the metadata object to 
ParquetGroupScan, and continue passing the metadata object to any clones of the 
ParquetGroupScan, so that we don't have to read and deserialize the file more 
than once. I didn't think this was a big enough deal to stop the release, but 
it looking at these numbers, it might be worth fixing now rather than putting 
off to the next release.

> Performance regression with doing Explain of COUNT(*) over 100K files
> -
>
> Key: DRILL-3901
> URL: https://issues.apache.org/jira/browse/DRILL-3901
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Aman Sinha
>Assignee: Mehant Baid
>
> We are seeing a performance regression when doing an Explain of SELECT 
> COUNT(*) over 100K files in a flat directory (no subdirectories) on latest 
> master branch compared to a run that was done on Sept 26.   Some initial 
> details (I will have more later): 
> {code}
> master branch on Sept 26
>No metadata cache: 71.452 secs
>With metadata cache: 15.804 secs
> Latest master branch 
>No metadata cache: 110 secs
>With metadata cache: 32 secs
> {code}
> So, both cases show regression.  
> [~mehant] and I took an initial look at this and it appears we might be doing 
> the directory expansion twice.  
>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3892) Metadata cache not being leveraged when partition pruning is taking place

2015-10-05 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943027#comment-14943027
 ] 

Steven Phillips commented on DRILL-3892:


+1

> Metadata cache not being leveraged when partition pruning is taking place
> -
>
> Key: DRILL-3892
> URL: https://issues.apache.org/jira/browse/DRILL-3892
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Aman Sinha
> Fix For: 1.3.0
>
> Attachments: 
> 0001-DRILL-3892-Once-usedMetadataFile-is-set-to-true-don-.patch, 
> lineitem_deletecache.tgz
>
>
> git.commit.id.abbrev=92638dc
> As we can see from the below plan, metadata cache is not being leveraged even 
> when the cache file is being present
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem_deletecache`;
> +---+-+
> |  ok   | summary 
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem_deletecache.  |
> +---+-+
> 1 row selected (0.402 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> explain plan for select count(*) from 
> dfs.`/drill/testdata/metadata_caching/lineitem_deletecache` where dir0=2006 
> group by l_linestatus;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(EXPR$0=[$1])
> 00-02HashAgg(group=[{0}], EXPR$0=[COUNT()])
> 00-03  Project(l_linestatus=[$0])
> 00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:/drill/testdata/metadata_caching/lineitem_deletecache/2006/1/lineitem_999.parquet]],
>  selectionRoot=maprfs:/drill/testdata/metadata_caching/lineitem_deletecache, 
> numFiles=1, usedMetadataFile=false, columns=[`l_linestatus`, `dir0`]]])
> {code}
> I attached the data set used. Let me know if you need anything more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3887) Parquet metadata cache not being used

2015-10-02 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940799#comment-14940799
 ] 

Steven Phillips commented on DRILL-3887:


See https://github.com/apache/drill/pull/186

> Parquet metadata cache not being used
> -
>
> Key: DRILL-3887
> URL: https://issues.apache.org/jira/browse/DRILL-3887
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>Priority: Critical
>
> The fix for DRILL-3788 causes a directory to be expanded to its list of files 
> early in the query, and this change causes the ParquetGroupScan to no longer 
> use the parquet metadata file, even when it is there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3887) Parquet metadata cache not being used

2015-10-02 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3887:
---
Assignee: Mehant Baid  (was: Steven Phillips)

> Parquet metadata cache not being used
> -
>
> Key: DRILL-3887
> URL: https://issues.apache.org/jira/browse/DRILL-3887
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Mehant Baid
>Priority: Critical
>
> The fix for DRILL-3788 causes a directory to be expanded to its list of files 
> early in the query, and this change causes the ParquetGroupScan to no longer 
> use the parquet metadata file, even when it is there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3867) Metadata Caching : Moving a directory which contains a cache file causes subsequent queries to fail

2015-10-02 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3867:
---
Assignee: Mehant Baid  (was: Steven Phillips)

> Metadata Caching : Moving a directory which contains a cache file causes 
> subsequent queries to fail
> ---
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Mehant Baid
> Fix For: 1.2.0
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3867) Metadata Caching : Moving a directory which contains a cache file causes subsequent queries to fail

2015-10-02 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940804#comment-14940804
 ] 

Steven Phillips commented on DRILL-3867:


See https://github.com/apache/drill/pull/186

> Metadata Caching : Moving a directory which contains a cache file causes 
> subsequent queries to fail
> ---
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Steven Phillips
> Fix For: 1.2.0
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3887) Parquet metadata cache not being used

2015-10-02 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940803#comment-14940803
 ] 

Steven Phillips commented on DRILL-3887:


It was a detail in the code that I missed. 

I added the field "usedCache", which will show up in the physical plan. There 
is a unit test that tests this, and this can also be used by qa for functional 
testing.

> Parquet metadata cache not being used
> -
>
> Key: DRILL-3887
> URL: https://issues.apache.org/jira/browse/DRILL-3887
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>Priority: Critical
>
> The fix for DRILL-3788 causes a directory to be expanded to its list of files 
> early in the query, and this change causes the ParquetGroupScan to no longer 
> use the parquet metadata file, even when it is there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3820) Nested Directories : Metadata Cache in a directory stores information from sub-directories as well creating security issues

2015-10-02 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941735#comment-14941735
 ] 

Steven Phillips commented on DRILL-3820:


My initial thought was to simply set the permissions to 700 for the metadata 
file. But that would cause problems when there is impersonation, as the 
impersonated user would not be able to read the metadata file.

I actually think the best approach is to have the REFRESH command run as the 
user who gave the command, not the drill process user. That way, only a user 
who has permission to read all of the subdirectories and files, as well as 
write to all of the directories, will be able to run the REFRESH command. The 
metadata file should have the same owner and permissions as the directory it is 
placed in. It should be documented that running this command will expose some 
amount of metadata in all underlying directories to anyone who has permission 
to read the top level directory.

This will at the very least prevent someone from exploiting the REFRESH command 
in order to access metadata in a directory that don't have permission to read.

> Nested Directories : Metadata Cache in a directory stores information from 
> sub-directories as well creating security issues
> ---
>
> Key: DRILL-3820
> URL: https://issues.apache.org/jira/browse/DRILL-3820
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Rahul Challapalli
>Assignee: Steven Phillips
>Priority: Critical
> Fix For: 1.2.0
>
>
> git.commit.id.abbrev=3c89b30
> User A has access to lineitem folder and its subfolders
> User B had access to lineitem folder but not its sub-folders.
> Now when User A runs the "refresh table metadata lineitem" command, the cache 
> file gets created under lineitem folder. This file contains information from 
> the underlying sub-directories as well.
> Now User B can download this file and get access to information which he 
> should not be seeing in the first place.
> This can be very easily reproducible if impersonation is enabled on the 
> cluster.
> Let me know if you need more information to reproduce this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3820) Nested Directories : Metadata Cache in a directory stores information from sub-directories as well creating security issues

2015-10-02 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3820:
---
Assignee: Aman Sinha  (was: Steven Phillips)

> Nested Directories : Metadata Cache in a directory stores information from 
> sub-directories as well creating security issues
> ---
>
> Key: DRILL-3820
> URL: https://issues.apache.org/jira/browse/DRILL-3820
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Rahul Challapalli
>Assignee: Aman Sinha
>Priority: Critical
> Fix For: 1.2.0
>
>
> git.commit.id.abbrev=3c89b30
> User A has access to lineitem folder and its subfolders
> User B had access to lineitem folder but not its sub-folders.
> Now when User A runs the "refresh table metadata lineitem" command, the cache 
> file gets created under lineitem folder. This file contains information from 
> the underlying sub-directories as well.
> Now User B can download this file and get access to information which he 
> should not be seeing in the first place.
> This can be very easily reproducible if impersonation is enabled on the 
> cluster.
> Let me know if you need more information to reproduce this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3820) Nested Directories : Metadata Cache in a directory stores information from sub-directories as well creating security issues

2015-10-02 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3820:
---
Fix Version/s: (was: 1.2.0)
   1.3.0

> Nested Directories : Metadata Cache in a directory stores information from 
> sub-directories as well creating security issues
> ---
>
> Key: DRILL-3820
> URL: https://issues.apache.org/jira/browse/DRILL-3820
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Rahul Challapalli
>Assignee: Aman Sinha
>Priority: Critical
> Fix For: 1.3.0
>
>
> git.commit.id.abbrev=3c89b30
> User A has access to lineitem folder and its subfolders
> User B had access to lineitem folder but not its sub-folders.
> Now when User A runs the "refresh table metadata lineitem" command, the cache 
> file gets created under lineitem folder. This file contains information from 
> the underlying sub-directories as well.
> Now User B can download this file and get access to information which he 
> should not be seeing in the first place.
> This can be very easily reproducible if impersonation is enabled on the 
> cluster.
> Let me know if you need more information to reproduce this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3867) Metadata Caching : Moving a directory which contains a cache file causes subsequent queries to fail

2015-10-02 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3867:
---
Fix Version/s: (was: 1.2.0)
   1.3.0

> Metadata Caching : Moving a directory which contains a cache file causes 
> subsequent queries to fail
> ---
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Mehant Baid
> Fix For: 1.3.0
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3887) Parquet metadata cache not being used

2015-10-02 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips resolved DRILL-3887.

Resolution: Fixed

> Parquet metadata cache not being used
> -
>
> Key: DRILL-3887
> URL: https://issues.apache.org/jira/browse/DRILL-3887
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Mehant Baid
>Priority: Critical
>
> The fix for DRILL-3788 causes a directory to be expanded to its list of files 
> early in the query, and this change causes the ParquetGroupScan to no longer 
> use the parquet metadata file, even when it is there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3887) Parquet metadata cache not being used

2015-10-02 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942007#comment-14942007
 ] 

Steven Phillips commented on DRILL-3887:


Fixed by 1cfd4c2

> Parquet metadata cache not being used
> -
>
> Key: DRILL-3887
> URL: https://issues.apache.org/jira/browse/DRILL-3887
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Mehant Baid
>Priority: Critical
>
> The fix for DRILL-3788 causes a directory to be expanded to its list of files 
> early in the query, and this change causes the ParquetGroupScan to no longer 
> use the parquet metadata file, even when it is there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3229) Create a new EmbeddedVector

2015-10-01 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939640#comment-14939640
 ] 

Steven Phillips commented on DRILL-3229:


i) In this first iteration, Union types will be enabled with an option, and 
they will be created in Json Reader and Mongo reader automatically if the 
option is enabled. Everything will be a Union type in this case. A future patch 
will work on promoting from non-union once it is necessary to promote.
ii) Your understanding is correct. One change from the earlier comment, there 
is no "bits" vector. The underlying primitive type vectors will have their own 
"bits" for tracking nulls. The type vector with a value of zero will also 
indicate null.

Without going into much detail at this point, I can answer the next paragraph 
of question by saying that this patch will allow reading of any valid json. It 
also has a more literal representation of the json, e.g. null values will be 
treated as null, instead of empty maps/lists. The patch also includes functions 
for inspecting the type of a field, which can be used with case statements to 
handle the data based on which type it is. Though it may be somewhat 
cumbersome, with these tools you should be able to run almost any query against 
dynamic json data. This will generally involve using introspection and case 
statements to remove the Union types early in the query. Future work will 
eliminate the need for this in many cases. One notable exception is that 
flatten is not supported in this initial patch.

> Create a new EmbeddedVector
> ---
>
> Key: DRILL-3229
> URL: https://issues.apache.org/jira/browse/DRILL-3229
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Codegen, Execution - Data Types, Execution - 
> Relational Operators, Functions - Drill
>Reporter: Jacques Nadeau
>Assignee: Steven Phillips
> Fix For: Future
>
>
> Embedded Vector will leverage a binary encoding for holding information about 
> type for each individual field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3820) Nested Directories : Metadata Cache in a directory stores information from sub-directories as well creating security issues

2015-10-01 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3820:
---
Assignee: (was: Steven Phillips)

> Nested Directories : Metadata Cache in a directory stores information from 
> sub-directories as well creating security issues
> ---
>
> Key: DRILL-3820
> URL: https://issues.apache.org/jira/browse/DRILL-3820
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Rahul Challapalli
>Priority: Critical
> Fix For: 1.2.0
>
>
> git.commit.id.abbrev=3c89b30
> User A has access to lineitem folder and its subfolders
> User B had access to lineitem folder but not its sub-folders.
> Now when User A runs the "refresh table metadata lineitem" command, the cache 
> file gets created under lineitem folder. This file contains information from 
> the underlying sub-directories as well.
> Now User B can download this file and get access to information which he 
> should not be seeing in the first place.
> This can be very easily reproducible if impersonation is enabled on the 
> cluster.
> Let me know if you need more information to reproduce this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3468) CTAS IOB

2015-10-01 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3468:
---
Assignee: (was: Steven Phillips)

> CTAS IOB
> 
>
> Key: DRILL-3468
> URL: https://issues.apache.org/jira/browse/DRILL-3468
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Khurram Faraaz
>Priority: Critical
>
> I am seeing a IOB when I use same table name in CTAS, after deleting the 
> previously create parquet file.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> CREATE TABLE tbl_allData AS SELECT 
> CAST(columns[0] as INT ), CAST(columns[1] as BIGINT ), CAST(columns[2] as 
> CHAR(2) ), CAST(columns[3] as VARCHAR(52) ), CAST(columns[4] as TIMESTAMP ), 
> CAST(columns[5] as DATE ), CAST(columns[6] as BOOLEAN ), CAST(columns[7] as 
> DOUBLE), CAST( columns[8] as TIME) FROM `allData.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 11196  |
> +---++
> 1 row selected (1.864 seconds)
> {code}
> Remove the parquet file that was created by the above CTAS.
> {code}
> [root@centos-01 aggregates]# hadoop fs -ls /tmp/tbl_allData
> Found 1 items
> -rwxr-xr-x   3 mapr mapr 397868 2015-07-07 21:08 
> /tmp/tbl_allData/0_0_0.parquet
> [root@centos-01 aggregates]# hadoop fs -rm /tmp/tbl_allData/0_0_0.parquet
> 15/07/07 21:10:47 INFO Configuration.deprecation: io.bytes.per.checksum is 
> deprecated. Instead, use dfs.bytes-per-checksum
> 15/07/07 21:10:47 INFO fs.TrashPolicyDefault: Namenode trash configuration: 
> Deletion interval = 0 minutes, Emptier interval = 0 minutes.
> Deleted /tmp/tbl_allData/0_0_0.parquet
> {code}
> I see a IOB when I CTAS with same table name as the one that was removed in 
> the above step.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> CREATE TABLE tbl_allData AS SELECT 
> CAST(columns[0] as INT ), CAST(columns[1] as BIGINT ), CAST(columns[2] as 
> CHAR(2) ), CAST(columns[3] as VARCHAR(52) ), CAST(columns[4] as TIMESTAMP ), 
> CAST(columns[5] as DATE ), CAST(columns[6] as BOOLEAN ), CAST(columns[7] as 
> DOUBLE), CAST( columns[8] as TIME) FROM `lessData.csv`;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: Index: 0, Size: 0
> [Error Id: 6d6df8e9-699c-4475-8ad3-183c0a91dc99 on centos-02.qa.lab:31010] 
> (state=,code=0)
> {code}
> stack trace from drillbit.log
> {code}
> org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception 
> during fragment initialization: Failure while trying to check if a table or 
> view with given name [tbl_allData] already exists in schema [dfs.tmp]: Index: 
> 0, Size: 0
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:253) 
> [drill-java-exec-1.1.0.jar:1.1.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Failure 
> while trying to check if a table or view with given name [tbl_allData] 
> already exists in schema [dfs.tmp]: Index: 0, Size: 0
> at 
> org.apache.drill.exec.planner.sql.handlers.SqlHandlerUtil.getTableFromSchema(SqlHandlerUtil.java:222)
>  ~[drill-java-exec-1.1.0.jar:1.1.0]
> at 
> org.apache.drill.exec.planner.sql.handlers.CreateTableHandler.getPlan(CreateTableHandler.java:88)
>  ~[drill-java-exec-1.1.0.jar:1.1.0]
> at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:178)
>  ~[drill-java-exec-1.1.0.jar:1.1.0]
> at 
> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:903) 
> [drill-java-exec-1.1.0.jar:1.1.0]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:242) 
> [drill-java-exec-1.1.0.jar:1.1.0]
> ... 3 common frames omitted
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[na:1.7.0_45]
> at java.util.ArrayList.get(ArrayList.java:411) ~[na:1.7.0_45]
> at 
> org.apache.drill.exec.store.dfs.FileSelection.getFirstPath(FileSelection.java:100)
>  ~[drill-java-exec-1.1.0.jar:1.1.0]
> at 
> org.apache.drill.exec.store.dfs.BasicFormatMatcher.isReadable(BasicFormatMatcher.java:75)
>  ~[drill-java-exec-1.1.0.jar:1.1.0]
> at 
> org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create(WorkspaceSchemaFactory.java:303)
>  ~[drill-java-exec-1.1.0.jar:1.1.0]
> at 
>

[jira] [Updated] (DRILL-2475) Handle IterOutcome.NONE correctly in operators

2015-10-01 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-2475:
---
Assignee: (was: Steven Phillips)

> Handle IterOutcome.NONE correctly in operators
> --
>
> Key: DRILL-2475
> URL: https://issues.apache.org/jira/browse/DRILL-2475
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 0.8.0
>Reporter: Venki Korukanti
> Fix For: 1.2.0
>
>
> Currently not all operators are handling the NONE (with no OK_NEW_SCHEMA) 
> correctly. This JIRA is to go through the operators and check if it handling 
> the NONE correctly or not and modify accordingly.
> (from DRILL-2453)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2975) Extended Json : Time type reporting data which is dependent on the system on which it ran

2015-10-01 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-2975:
---
Assignee: (was: Steven Phillips)

> Extended Json : Time type reporting data which is dependent on the system on 
> which it ran
> -
>
> Key: DRILL-2975
> URL: https://issues.apache.org/jira/browse/DRILL-2975
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Priority: Critical
> Fix For: 1.3.0
>
>
> git.commit.id.abbrev=3b19076
> Data :
> {code}
> {
>   "int_col" : {"$numberLong": 1},
>   "date_col" : {"$dateDay": "2012-05-22"},
>   "time_col"  : {"$time": "19:20:30.45Z"}
> }
> {code}
> System 1 :
> {code}
> 0: jdbc:drill:schema=dfs_eea> select time_col from `extended_json/data1.json` 
> d;
> ++
> |  time_col  |
> ++
> | 19:20:30.450 |
> ++
> {code}
> System 2 :
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexP> select time_col from 
> `temp.json`;
> ++
> |  time_col  |
> ++
> | 11:20:30.450 |
> ++
> {code}
> The above results are inconsistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2385) count on complex objects failed with missing function implementation

2015-10-01 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-2385:
---
Assignee: (was: Steven Phillips)

> count on complex objects failed with missing function implementation
> 
>
> Key: DRILL-2385
> URL: https://issues.apache.org/jira/browse/DRILL-2385
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 0.8.0
>Reporter: Chun Chang
>Priority: Minor
> Fix For: 1.4.0
>
>
> #Wed Mar 04 01:23:42 EST 2015
> git.commit.id.abbrev=71b6bfe
> Have a complex type looks like the following:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.sia from 
> `complex.json` t limit 1;
> ++
> |sia |
> ++
> | [1,11,101,1001] |
> ++
> {code}
> A count on the complex type will fail with missing function implementation:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.gbyi, count(t.sia) 
> countsia from `complex.json` t group by t.gbyi;
> Query failed: RemoteRpcException: Failure while running fragment., Schema is 
> currently null.  You must call buildSchema(SelectionVectorMode) before this 
> container can return a schema. [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on 
> qa-node119.qa.lab:31010 ]
> [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on qa-node119.qa.lab:31010 ]
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> {code}
> drillbit.log
> {code}
> 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] ERROR 
> o.a.drill.exec.ops.FragmentContext - Fragment Context received failure.
> org.apache.drill.exec.exception.SchemaChangeException: Failure while 
> materializing expression.
> Error in expression at index 0.  Error: Missing function implementation: 
> [count(BIGINT-REPEATED)].  Full expression: null.
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregatorInternal(HashAggBatch.java:210)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregator(HashAggBatch.java:158)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema(HashAggBatch.java:101)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:130)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67) 
> [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:114)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57) 
> [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:121)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Error while initializing or executing 
> fragment
> java.lang.NullPointerException: Schema is currently null.  You must call 
> buildSchema(SelectionVectorMode) before this container can return a schema.
> at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208) 
> ~[guava-14.0.1.jar:na]
> at 
> org.apache.drill.exec.record.VectorContainer.getSchema(VectorContainer.java:261)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.getSchema(AbstractRecordBatch.java:155)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
>

[jira] [Updated] (DRILL-1681) select with limit on directory with csv files takes quite long to terminate

2015-10-01 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-1681:
---
Assignee: (was: Steven Phillips)

> select with limit on directory with csv files takes quite long to terminate
> ---
>
> Key: DRILL-1681
> URL: https://issues.apache.org/jira/browse/DRILL-1681
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text & CSV
>Reporter: Suresh Ollala
>Priority: Minor
> Fix For: 1.3.0
>
>
> query like select * from `/drill/data` limit 100 takes quite long to 
> terminate, about 20+ seconds.
> /drill/data includes overall 1100 csv files, all in single directory.
> select * from `/drill/data/d2.csv` limit 100; terminates in 0.2 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2428) Drill Build failed : git.properties isn't a file.

2015-10-01 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-2428:
---
Assignee: (was: Steven Phillips)

> Drill Build failed : git.properties isn't a file.
> -
>
> Key: DRILL-2428
> URL: https://issues.apache.org/jira/browse/DRILL-2428
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Reporter: Praveen
>Priority: Critical
> Fix For: Future
>
>
> I am build the Drill from source . i am getting the following error.
> Applied patch provide for the same issue. but not working . Can you please 
> provide the solution.
> ties to archive location: apache-drill-0.7.0-SNAPSHOT/git.properties
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Apache Drill Root POM . SUCCESS [ 10.186 s]
> [INFO] Drill Protocol  SUCCESS [  7.479 s]
> [INFO] Common (Logical Plan, Base expressions) ... SUCCESS [ 10.150 s]
> [INFO] contrib/Parent Pom  SUCCESS [  2.490 s]
> [INFO] contrib/data/Parent Pom ... SUCCESS [  0.302 s]
> [INFO] contrib/data/tpch-sample-data . SUCCESS [  3.259 s]
> [INFO] exec/Parent Pom ... SUCCESS [  3.465 s]
> [INFO] exec/Java Execution Engine  SUCCESS [02:12 min]
> [INFO] contrib/hive-storage-plugin/Parent Pom  SUCCESS [  2.250 s]
> [INFO] contrib/hive-storage-plugin/hive-exec-shaded .. SUCCESS [ 32.738 s]
> [INFO] contrib/hive-storage-plugin/core .. SUCCESS [  9.415 s]
> [INFO] exec/JDBC Driver using dependencies ... SUCCESS [  7.383 s]
> [INFO] JDBC JAR with all dependencies  SUCCESS [01:47 min]
> [INFO] exec/Drill expression interpreter . SUCCESS [ 20.441 s]
> [INFO] contrib/mongo-storage-plugin .. SUCCESS [  7.914 s]
> [INFO] contrib/hbase-storage-plugin .. SUCCESS [  8.501 s]
> [INFO] Packaging and Distribution Assembly ... FAILURE [  2.770 s]
> [INFO] contrib/sqlline ... SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 06:09 min
> [INFO] Finished at: 2015-03-11T16:22:49+05:30
> [INFO] Final Memory: 69M/526M
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.
> 4:single (distro-assembly) on project distribution: Failed to create 
> assembly: E
> rror adding file to archive: 
> D:\drill\drill-0.7.0\distribution\target\classes\gi
> t.properties isn't a file. -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal o
> rg.apache.maven.plugins:maven-assembly-plugin:2.4:single (distro-assembly) on 
> pr
> oject distribution: Failed to create assembly: Error adding file to archive: 
> D:\
> drill\drill-0.7.0\distribution\target\classes\git.properties isn't a file.
> at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor
> .java:216)
> at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor
> .java:153)
> at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor
> .java:145)
> at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProje
> ct(LifecycleModuleBuilder.java:108)
> at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProje
> ct(LifecycleModuleBuilder.java:76)
> at 
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThre
> adedBuilder.build(SingleThreadedBuilder.java:51)
> at 
> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(Lifecycl
> eStarter.java:116)
> at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:361)
> at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
> at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
> at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:213)
> at org.apache.maven.cli.MavenCli.main(MavenCli.java:157)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
> java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
>

[jira] [Commented] (DRILL-3867) Metadata Caching : Moving a directory which contains a cache file causes subsequent queries to fail

2015-10-01 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940523#comment-14940523
 ] 

Steven Phillips commented on DRILL-3867:


We should store the paths relative to the directory containing the metadata 
file.

> Metadata Caching : Moving a directory which contains a cache file causes 
> subsequent queries to fail
> ---
>
> Key: DRILL-3867
> URL: https://issues.apache.org/jira/browse/DRILL-3867
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Steven Phillips
> Fix For: 1.2.0
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem`;
> +---+-+
> |  ok   |   summary   
> |
> +---+-+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem.  |
> +---+-+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 
> 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file 
> maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the 
> cache file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-3820) Nested Directories : Metadata Cache in a directory stores information from sub-directories as well creating security issues

2015-10-01 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips reassigned DRILL-3820:
--

Assignee: Steven Phillips

> Nested Directories : Metadata Cache in a directory stores information from 
> sub-directories as well creating security issues
> ---
>
> Key: DRILL-3820
> URL: https://issues.apache.org/jira/browse/DRILL-3820
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Rahul Challapalli
>Assignee: Steven Phillips
>Priority: Critical
> Fix For: 1.2.0
>
>
> git.commit.id.abbrev=3c89b30
> User A has access to lineitem folder and its subfolders
> User B had access to lineitem folder but not its sub-folders.
> Now when User A runs the "refresh table metadata lineitem" command, the cache 
> file gets created under lineitem folder. This file contains information from 
> the underlying sub-directories as well.
> Now User B can download this file and get access to information which he 
> should not be seeing in the first place.
> This can be very easily reproducible if impersonation is enabled on the 
> cluster.
> Let me know if you need more information to reproduce this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-3887) Parquet metadata cache not being used

2015-10-01 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips reassigned DRILL-3887:
--

Assignee: Steven Phillips

> Parquet metadata cache not being used
> -
>
> Key: DRILL-3887
> URL: https://issues.apache.org/jira/browse/DRILL-3887
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>Priority: Critical
>
> The fix for DRILL-3788 causes a directory to be expanded to its list of files 
> early in the query, and this change causes the ParquetGroupScan to no longer 
> use the parquet metadata file, even when it is there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3887) Parquet metadata cache not being used

2015-10-01 Thread Steven Phillips (JIRA)

Steven Phillips created DRILL-3887:
--

 Summary: Parquet metadata cache not being used
 Key: DRILL-3887
 URL: https://issues.apache.org/jira/browse/DRILL-3887
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips
Priority: Critical


The fix for DRILL-3788 causes a directory to be expanded to its list of files 
early in the query, and this change causes the ParquetGroupScan to no longer 
use the parquet metadata file, even when it is there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3844) HOME and END keys do not work in drill console

2015-09-28 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933684#comment-14933684
 ] 

Steven Phillips commented on DRILL-3844:


They are working fine on my system (Mac OS yosemite)

What system are you using?

drill console is based no sqlline, which is based on jline, so it seems it 
could be related to this:

https://github.com/jline/jline2/issues/54

> HOME and END keys do not work in drill console
> --
>
> Key: DRILL-3844
> URL: https://issues.apache.org/jira/browse/DRILL-3844
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - CLI
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Philip Deegan
>
> Is there a reason for this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3229) Create a new EmbeddedVector

2015-09-17 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804637#comment-14804637
 ] 

Steven Phillips commented on DRILL-3229:


Basic design outline:

A Union type represents a field where the type can vary between records. The 
data for a field of type Union will be stored in a UnionVector.

h4. UnionVector
Internally uses a MapVector to hold the vectors for the various types. 
The types include all of the MinorTypes, including List and Map.
For example, the internal MapVector will have a subfield named 
"bigInt", which will refer to a NullableBigIntVector.

In addition to the vectors corresponding to the minor types, there will 
be two additional fields, both represented by UInt1Vectors. These are
"bits" and "types", which will represent the nullability and types of 
the underlying data. The "bits" vector will work the same way it works in other
nullable vectors. The "types" vector will store the number 
corresponding to the value of the MinorType as defined in the protobuf 
definition. There
will be mutator methods for setting null and type.

h4. UnionWriter
The UnionWriter implements and overwrites all of the methods of 
FieldWriter. It holds field writers corresponding to each of the types included 
in the underly
UnionVector, and delegates the method calls for each type to the 
corresponding writer. For example, the BigIntWriter interface:

{code}
public interface BigIntWriter extends BaseWriter {
  public void write(BigIntHolder h);

  public void writeBigInt(long value);
}
{code}
UnionWriter overwrites these methods:

{code}
@Override
  public void writeBigInt(long value) {
data.getMutator().setType(idx(), MinorType.BIGINT);
data.getMutator().setNotNull(idx());
getBigIntWriter().setPosition(idx());
getBigIntWriter().writeBigInt(value);
  }

@Override
  public void writeBigInt(BigIntHolder h) {
data.getMutator().setType(idx(), MinorType.BIGINT);
data.getMutator().setNotNull(idx());
getBigIntWriter().setPosition(idx());
getBigIntWriter().writeBigInt(holder.value);
  }
{code}

This requires users of the interface to go through the UnionWriter, 
rather than using the underlying BigIntWriter directly. Otherwise, the "type" 
and "bits" vector would not get set correctly.

h4. UnionReader
Much the same as the UnionWriter, the UnionReader overwrites the 
methods of FieldReader, and delegates to a corresponding specific FieldReader 
implementation depending on which type 
the current value is.

h4. UnionListVector
UnionListVector extends BaseRepeatedVector. It works much the same as 
other Repeated vectors; there is a data vector and an offset vector. The data 
vector in this case is a UnionVector.

h4. UnionListWriter
The UnionListWriter overrides all FieldWriter methods. When starting a 
new list, the startList() method is called. This calls the startNewValue(int 
index) method
of the underlying UnionListVector.Mutator. Subsequent calls to the 
ListWriter methods (such as bigint()), return the UnionListWriter itself, and 
calls to write are handled by calling
the appropriate method on the underlying UnionListVector.Mutator, which 
handles updating the offset vector.

In the case that the map() method is called (i.e. repeated map), the 
UnionListWriter is itself returned, but a state variable is updated to indicate 
that it should oeprate as a MapWriter.
While in MapWriter mode, calls to the MapWriter methods will also 
return the UnionListWriter itself, but will also update the field indicating 
what the name of the current field is.
Subsequent writes to the ScalarWriter methods will write to the 
underlying UnionVector using the UnionWriter interface.

For example,

{code}
UnionListWriter list;
...

list.startList();
list.map().bigInt("a").writeBigInt(1);
{code}

This code first indicates that a new list is starting. By doing this, 
the offset vector is correctly set. Calling map() sets the internal state of 
the writer to "MAP". bigInt("a") sets the current
field of the writer to "a", and writeBigInt(1) writes the value 1 to 
the underlying UnionVector.
Another example:

{code}
MapWriter mapWriter = list.map().map("a")
{code}

In this case, the final call to map("a") delegates to the underlying 
UnionWriter, and returns a new MapWriter, with the position set according to 
the current offset.

> Create a new EmbeddedVector
> ---
>
> Key: DRILL-3229
> URL: https://issues.apache.org/jira/browse/DRILL-3229
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Codegen, Execution - Data Types, Execution - 
> Relational Operators, Functions - Drill
>Reporter:

[jira] [Assigned] (DRILL-3228) Implement Embedded Type

2015-09-17 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips reassigned DRILL-3228:
--

Assignee: Steven Phillips  (was: Jacques Nadeau)

> Implement Embedded Type
> ---
>
> Key: DRILL-3228
> URL: https://issues.apache.org/jira/browse/DRILL-3228
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Codegen, Execution - Data Types, Execution - 
> Relational Operators, Functions - Drill
>Reporter: Jacques Nadeau
>Assignee: Steven Phillips
> Fix For: 1.3.0
>
>
> An Umbrella for the implementation of Embedded types within Drill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-3229) Create a new EmbeddedVector

2015-09-17 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips reassigned DRILL-3229:
--

Assignee: Steven Phillips  (was: Jacques Nadeau)

> Create a new EmbeddedVector
> ---
>
> Key: DRILL-3229
> URL: https://issues.apache.org/jira/browse/DRILL-3229
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Codegen, Execution - Data Types, Execution - 
> Relational Operators, Functions - Drill
>Reporter: Jacques Nadeau
>Assignee: Steven Phillips
> Fix For: Future
>
>
> Embedded Vector will leverage a binary encoding for holding information about 
> type for each individual field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3788) Partition Pruning not taking place with metadata caching when we have ~20k files

2015-09-15 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746645#comment-14746645
 ] 

Steven Phillips commented on DRILL-3788:


I am a bit confused. This jira seems to be related to directory-based partition 
pruning, not single-valued column based pruning. As far as I know they should 
both be working, though. I would have to use a debugger to find out why it's 
failing.

> Partition Pruning not taking place with metadata caching when we have ~20k 
> files
> 
>
> Key: DRILL-3788
> URL: https://issues.apache.org/jira/browse/DRILL-3788
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Aman Sinha
>Priority: Critical
> Fix For: 1.2.0
>
> Attachments: plan.txt
>
>
> git.commit.id.abbrev=240a455
> Partition Pruning did not take place for the below query after I executed the 
> "refresh table metadata command"
> {code}
>  explain plan for 
> select
>   l_returnflag,
>   l_linestatus
> from
>   `lineitem/2006/1`
> where
>   dir0=1 or dir0=2
> {code}
> The logs did not indicate that "pruning did not take place"
> Before executing the refresh table metadata command, partition pruning did 
> take effect
> I am not attaching the data set as it is larger than 10MB. Reach out to me if 
> you need more information



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3180) Apache Drill JDBC storage plugin to query rdbms systems such as MySQL and Netezza from Apache Drill

2015-09-13 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742880#comment-14742880
 ] 

Steven Phillips commented on DRILL-3180:


+1

> Apache Drill JDBC storage plugin to query rdbms systems such as MySQL and 
> Netezza from Apache Drill
> ---
>
> Key: DRILL-3180
> URL: https://issues.apache.org/jira/browse/DRILL-3180
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.0.0
>Reporter: Magnus Pierre
>Assignee: Jacques Nadeau
>  Labels: Drill, JDBC, plugin
> Fix For: 1.3.0
>
> Attachments: patch.diff, pom.xml, storage-mpjdbc.zip
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> I have developed the base code for a JDBC storage-plugin for Apache Drill. 
> The code is primitive but consitutes a good starting point for further 
> coding. Today it provides primitive support for SELECT against RDBMS with 
> JDBC. 
> The goal is to provide complete SELECT support against RDBMS with push down 
> capabilities.
> Currently the code is using standard JDBC classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3767) SchemaPath.getCompoundPath(String...strings) reverses it's input array

2015-09-11 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741328#comment-14741328
 ] 

Steven Phillips commented on DRILL-3767:


I think the side effect should be removed, rather than documented.

> SchemaPath.getCompoundPath(String...strings) reverses it's input array
> --
>
> Key: DRILL-3767
> URL: https://issues.apache.org/jira/browse/DRILL-3767
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
>Priority: Minor
> Fix For: 1.2.0
>
>
> If you pass an array of strings to {{SchemaPath.getCompoundPath()}}, the 
> input array will be reversed. This side effect is *undocumented* and has led 
> to at least one known bug DRILL-3758



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3723) RemoteServiceSet.getServiceSetWithFullCache() ignores arguments

2015-08-28 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720716#comment-14720716
 ] 

Steven Phillips commented on DRILL-3723:


That's leftover from the days when Drill had a distributed cache. When that was 
removed, we should have removed the WithFullCache method, as it no longer has 
any meaning.

 RemoteServiceSet.getServiceSetWithFullCache() ignores arguments
 ---

 Key: DRILL-3723
 URL: https://issues.apache.org/jira/browse/DRILL-3723
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - RPC
Affects Versions: 1.1.0
Reporter: Andrew
Assignee: Jacques Nadeau
Priority: Minor
 Fix For: 1.2.0


 RemoteServiceSet.getServiceSetWithFullCache() ignores both of its arguments 
 and is therefore functionally equivalent to getLocalServiceSet().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2743) Parquet file metadata caching

2015-08-24 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710078#comment-14710078
 ] 

Steven Phillips commented on DRILL-2743:


No, this case is not dealt with, so it is possible for the locations to get out 
of date. This won't cause any wrong results, but could give non-optimal 
performance. The only work around is to manually rerun the refresh metadata 
command.

 Parquet file metadata caching
 -

 Key: DRILL-2743
 URL: https://issues.apache.org/jira/browse/DRILL-2743
 Project: Apache Drill
  Issue Type: New Feature
  Components: Storage - Parquet
Reporter: Steven Phillips
Assignee: Steven Phillips
 Fix For: 1.2.0

 Attachments: DRILL-2743.patch, drill.parquet_metadata


 To run a query against parquet files, we have to first recursively search the 
 directory tree for all of the files, get the block locations for each file, 
 and read the footer from each file, and this is done during the planning 
 phase. When there are many files, this can result in a very large delay in 
 running the query, and it does not scale.
 However, there isn't really any need to read the footers during planning, if 
 we instead treat each parquet file as a single work unit, all we need to know 
 are the block locations for the file, the number of rows, and the columns. We 
 should store only the information which we need for planning in a file 
 located in the top directory for a given parquet table, and then we can delay 
 reading of the footers until execution time, which can be done in parallel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2743) Parquet file metadata caching

2015-08-19 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704132#comment-14704132
 ] 

Steven Phillips commented on DRILL-2743:


They can come from any source.

 Parquet file metadata caching
 -

 Key: DRILL-2743
 URL: https://issues.apache.org/jira/browse/DRILL-2743
 Project: Apache Drill
  Issue Type: New Feature
  Components: Storage - Parquet
Reporter: Steven Phillips
Assignee: Aman Sinha
 Fix For: 1.2.0

 Attachments: DRILL-2743.patch, drill.parquet_metadata


 To run a query against parquet files, we have to first recursively search the 
 directory tree for all of the files, get the block locations for each file, 
 and read the footer from each file, and this is done during the planning 
 phase. When there are many files, this can result in a very large delay in 
 running the query, and it does not scale.
 However, there isn't really any need to read the footers during planning, if 
 we instead treat each parquet file as a single work unit, all we need to know 
 are the block locations for the file, the number of rows, and the columns. We 
 should store only the information which we need for planning in a file 
 located in the top directory for a given parquet table, and then we can delay 
 reading of the footers until execution time, which can be done in parallel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2743) Parquet file metadata caching

2015-08-19 Thread Steven Phillips (JIRA)

[
https://issues.apache.org/jira/browse/DRILL-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704027#comment-14704027
]

Steven Phillips commented on DRILL-2743:

1. Currently, there is no log message, but I could add one.
2. I am not sure what you mean by change anything, but the case of both files
and directories is handled.
3. I don't think there will be changes to the format, but I can't guarantee
that. I also expect there to be changes to the format in future releases.
4. Those permissions will allow anyone to read the file. I do see a potential
problem, though. Currently, if a change is detected to the underlying files,
the metadata is updated automatically when a query is run. If the user doesn't
have write permission, this will cause a failure.

Parquet file metadata caching
-

Key: DRILL-2743
URL: https://issues.apache.org/jira/browse/DRILL-2743
Project: Apache Drill
Issue Type: New Feature
Components: Storage - Parquet
Reporter: Steven Phillips
Assignee: Aman Sinha
Fix For: 1.2.0

Attachments: DRILL-2743.patch, drill.parquet_metadata

To run a query against parquet files, we have to first recursively search the
directory tree for all of the files, get the block locations for each file,
and read the footer from each file, and this is done during the planning
phase. When there are many files, this can result in a very large delay in
running the query, and it does not scale.
However, there isn't really any need to read the footers during planning, if
we instead treat each parquet file as a single work unit, all we need to know
are the block locations for the file, the number of rows, and the columns. We
should store only the information which we need for planning in a file
located in the top directory for a given parquet table, and then we can delay
reading of the footers until execution time, which can be done in parallel.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2743) Parquet file metadata caching

2015-08-18 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-2743:
---
Assignee: Aman Sinha  (was: Steven Phillips)

 Parquet file metadata caching
 -

 Key: DRILL-2743
 URL: https://issues.apache.org/jira/browse/DRILL-2743
 Project: Apache Drill
  Issue Type: New Feature
  Components: Storage - Parquet
Reporter: Steven Phillips
Assignee: Aman Sinha
 Fix For: 1.2.0

 Attachments: DRILL-2743.patch, drill.parquet_metadata


 To run a query against parquet files, we have to first recursively search the 
 directory tree for all of the files, get the block locations for each file, 
 and read the footer from each file, and this is done during the planning 
 phase. When there are many files, this can result in a very large delay in 
 running the query, and it does not scale.
 However, there isn't really any need to read the footers during planning, if 
 we instead treat each parquet file as a single work unit, all we need to know 
 are the block locations for the file, the number of rows, and the columns. We 
 should store only the information which we need for planning in a file 
 located in the top directory for a given parquet table, and then we can delay 
 reading of the footers until execution time, which can be done in parallel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3353) Non data-type related schema changes errors

2015-07-10 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3353:
---
Assignee: Hanifi Gunes  (was: Steven Phillips)

 Non data-type related schema changes errors
 ---

 Key: DRILL-3353
 URL: https://issues.apache.org/jira/browse/DRILL-3353
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - JSON
Affects Versions: 1.0.0
Reporter: Oscar Bernal
Assignee: Hanifi Gunes
 Fix For: 1.2.0

 Attachments: i-bfbc0a5c-ios-PulsarEvent-2015-06-23_19.json.zip


 I'm having trouble querying a data set with varying schema for a nested 
 object fields. The majority of my data for a specific type of record has the 
 following nested data:
 {code}
 attributes:{daysSinceInstall:0,destination:none,logged:no,nth:1,type:organic,wearable:no}}
 {code}
 Among those records (hundreds of them) I have only two with a slightly 
 different schema:
 {code}
 attributes:{adSet:Teste-Adwords-Engagement-Branch-iOS-230615-adset,campaign:Teste-Adwords-Engagement-Branch-iOS-230615,channel:Adwords,daysSinceInstall:0,destination:none,logged:no,nth:4,type:branch,wearable:no}}
 {code}
 When trying to query the new fields, my queries fail:
 With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code}
 {noformat}
 0: jdbc:drill:zk=local select log.event.attributes from 
 `dfs`.`root`.`/file.json` as log where log.si = 
 '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = 
 'Teste-FB-Engagement-Puro-iOS-230615';
 Error: SYSTEM ERROR: java.lang.NumberFormatException: 
 Teste-FB-Engagement-Puro-iOS-230615
 Fragment 0:0
 [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on 
 ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0)
 {noformat}
 With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code}
 {noformat}
 0: jdbc:drill:zk=local select log.event.attributes from 
 `dfs`.`root`.`/file.json` as log where log.si = 
 '07A3F985-4B34-4A01-9B83-3B14548EF7BE';
 Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type 
 when you are using a ValueWriter of type NullableVarCharWriterImpl.
 File  file.json
 Record  35
 Fragment 0:0
 [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on 
 ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0)
 {noformat}
 If I try to extract all attributes from those events, Drill will only 
 return a subset of the fields, ignoring the others. 
 {noformat}
 0: jdbc:drill:zk=local select log.event.attributes from 
 `dfs`.`root`.`/file.json` as log where log.si = 
 '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App';
 ++
 |   EXPR$0   |
 ++
 | {logged:no,wearable:no,type:}   |
 | {logged:no,wearable:no,type:}  |
 | {logged:no,wearable:no,type:}  |
 | {logged:no,wearable:no,type:}|
 | {logged:no,wearable:no,type:}   |
 ++
 {noformat}
 What I find strange is that I have thousands of records in the same file with 
 different schema for different record types and all other queries seem run 
 well.
 Is there something about how Drill infers schema that I might be missing 
 here? Does it infer based on a sample % of the data and fail for records that 
 were not taken into account while inferring schema? I suspect I wouldn't have 
 this error if I had 100's of records with that other schema inside the file, 
 but I can't find anything in the docs or code to support that hypothesis. 
 Perhaps it's just a bug? Is it expected?
 Troubleshooting guide seems to mention something about this but it's very 
 vague in implying Drill doesn't fully support schema changes. I thought that 
 was for data type changes mostly, for which there are other well documented 
 issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3150) Error when filtering non-existent field with a string

2015-07-10 Thread Steven Phillips (JIRA)

[
https://issues.apache.org/jira/browse/DRILL-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622954#comment-14622954
]

Steven Phillips commented on DRILL-3150:

I actually think the correct thing is to use VARBINARY as the default. It's
true that comapring a non-numeric string to a valid integer field would fail,
but that's ok. Our rules for implicit cast require casting the VARCHAR to
NUMERIC, since NUMERIC types have a higher precedence. So doing a comparison
between a non-numeric string and a numeric type should fail. In that case, it
is necessary to explicitly cast the int as a string.

I actually filed DRILL-3477 the other day, without realizing this issue was
here.

Error when filtering non-existent field with a string
-

Key: DRILL-3150
URL: https://issues.apache.org/jira/browse/DRILL-3150
Project: Apache Drill
Issue Type: Bug
Components: Execution - Relational Operators
Affects Versions: 1.0.0
Reporter: Adam Gilmore
Assignee: Parth Chandra
Priority: Critical
Fix For: 1.2.0

Attachments: DRILL-3150.1.patch.txt

The following query throws an exception:
{code}
select count(*) from cp.`employee.json` where `blah` = 'test'
{code}
blah does not exist as a field in the JSON. The expected behaviour would
be to filter out all rows as that field is not present (thus cannot equal the
string 'test').
Instead, the following exception occurs:
{code}
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: test
Fragment 0:0
[Error Id: 5d6c9a82-8f87-41b2-a496-67b360302b76 on
ip-10-1-50-208.ec2.internal:31010]
{code}
Apart from the fact the real error message is hidden, the issue is that we're
trying to cast the varchar to int ('test' to an int). This seems to be
because the projection out of the scan when a field is not found becomes
INT:OPTIONAL.
The filter should not fail on this - if the varchar fails to convert to an
int, the filter should just simply not allow any records through.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3353) Non data-type related schema changes errors

2015-07-09 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621537#comment-14621537
 ] 

Steven Phillips commented on DRILL-3353:


There are several issues here.

1.
{code}
Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type when 
you are using a ValueWriter of type NullableVarCharWriterImpl.
{code}

This is due to the fact that in one of the records, the boolean value true 
has quotes around it. Thus, it is parsed as a string. Drill does not currently 
support changing the type of a specific field. See DRILL-3228 and DRILL-3229 
for future work that will enhnace our flexibility in this regard. The current 
work around for this is to set all_text_mode to true, which you already know.

2.
{code}
Error: SYSTEM ERROR: java.lang.NumberFormatException: 
Teste-FB-Engagement-Puro-iOS-230615
{code}

This is due to a problem with implicit cast and null fields. I filed DRILL-3477 
for this issue.

3. Missing fields

This is due to some bugs in Drill's processing of complex data that occurs in 
some operations when new fields are added.

I will be posting a fix for this shortly.

 Non data-type related schema changes errors
 ---

 Key: DRILL-3353
 URL: https://issues.apache.org/jira/browse/DRILL-3353
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - JSON
Affects Versions: 1.0.0
Reporter: Oscar Bernal
Assignee: Steven Phillips
 Fix For: 1.2.0

 Attachments: i-bfbc0a5c-ios-PulsarEvent-2015-06-23_19.json.zip


 I'm having trouble querying a data set with varying schema for a nested 
 object fields. The majority of my data for a specific type of record has the 
 following nested data:
 {code}
 attributes:{daysSinceInstall:0,destination:none,logged:no,nth:1,type:organic,wearable:no}}
 {code}
 Among those records (hundreds of them) I have only two with a slightly 
 different schema:
 {code}
 attributes:{adSet:Teste-Adwords-Engagement-Branch-iOS-230615-adset,campaign:Teste-Adwords-Engagement-Branch-iOS-230615,channel:Adwords,daysSinceInstall:0,destination:none,logged:no,nth:4,type:branch,wearable:no}}
 {code}
 When trying to query the new fields, my queries fail:
 With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code}
 {noformat}
 0: jdbc:drill:zk=local select log.event.attributes from 
 `dfs`.`root`.`/file.json` as log where log.si = 
 '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = 
 'Teste-FB-Engagement-Puro-iOS-230615';
 Error: SYSTEM ERROR: java.lang.NumberFormatException: 
 Teste-FB-Engagement-Puro-iOS-230615
 Fragment 0:0
 [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on 
 ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0)
 {noformat}
 With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code}
 {noformat}
 0: jdbc:drill:zk=local select log.event.attributes from 
 `dfs`.`root`.`/file.json` as log where log.si = 
 '07A3F985-4B34-4A01-9B83-3B14548EF7BE';
 Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type 
 when you are using a ValueWriter of type NullableVarCharWriterImpl.
 File  file.json
 Record  35
 Fragment 0:0
 [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on 
 ip-10-0-1-16.sa-east-1.compute.internal:31010] (state=,code=0)
 {noformat}
 If I try to extract all attributes from those events, Drill will only 
 return a subset of the fields, ignoring the others. 
 {noformat}
 0: jdbc:drill:zk=local select log.event.attributes from 
 `dfs`.`root`.`/file.json` as log where log.si = 
 '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App';
 ++
 |   EXPR$0   |
 ++
 | {logged:no,wearable:no,type:}   |
 | {logged:no,wearable:no,type:}  |
 | {logged:no,wearable:no,type:}  |
 | {logged:no,wearable:no,type:}|
 | {logged:no,wearable:no,type:}   |
 ++
 {noformat}
 What I find strange is that I have thousands of records in the same file with 
 different schema for different record types and all other queries seem run 
 well.
 Is there something about how Drill infers schema that I might be missing 
 here? Does it infer based on a sample % of the data and fail for records that 
 were not taken into account while inferring schema? I suspect I wouldn't have 
 this error if I had 100's of records with that other schema inside the file, 
 but I can't find anything in the docs or code to support that hypothesis. 
 Perhaps it's just a bug? Is it expected?
 Troubleshooting guide seems to mention something about this but it's very 
 vague in implying Drill doesn't fully support schema changes. I thought that 
 was

[jira] [Created] (DRILL-3487) MaterializedField equality doesn't check if nested fields are equal

2015-07-09 Thread Steven Phillips (JIRA)

Steven Phillips created DRILL-3487:
--

 Summary: MaterializedField equality doesn't check if nested fields 
are equal
 Key: DRILL-3487
 URL: https://issues.apache.org/jira/browse/DRILL-3487
 Project: Apache Drill
  Issue Type: Bug
  Components: Metadata
Reporter: Steven Phillips
Assignee: Hanifi Gunes


In several places, we use BatchSchema.equals() to determine if two schemas are 
the same. A BatchSchema is a set of MaterializedField objects. But ever since 
DRILL-1872, the child fields are no longer checked.

What this means, essentially, is that BatchSchema.equals() is not valid for 
determining schema changes if the batch contains any nested fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-3477) Using IntVector for null expressions causes problems with implicit cast

2015-07-08 Thread Steven Phillips (JIRA)

Steven Phillips created DRILL-3477:
--

 Summary: Using IntVector for null expressions causes problems with 
implicit cast
 Key: DRILL-3477
 URL: https://issues.apache.org/jira/browse/DRILL-3477
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips
Assignee: Steven Phillips


See DRILL-3353, for example.

A simple example is this:

{code}
select * from t where a = 's';
{code}

If the first batch scanned from table t does not contain the column a, the 
expression materializer in Project defaults to Nullable Int as the type. The 
Filter then sees an Equals expression between a VarChar and an Int type, so it 
does an implicit cast. Implicit cast rules give Int higher precedence, so the 
literal 's' is cast to Int, which ends up throwing a NumberFormatException.

In the class ResolverTypePrecedence, we see that Null type has the lowest 
precedence, which makes sense. But since we don't actually currently have an 
implementation for NullVector, we should materialize the Null type as the 
Vector with the lowest possible precedence, which is VarBinary.

My suggestion is that we should use VarBinary as the default type in 
ExpressionMaterializer instead of Int.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3477) Using IntVector for null expressions causes problems with implicit cast

2015-07-08 Thread Steven Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619762#comment-14619762
 ] 

Steven Phillips commented on DRILL-3477:


I was thinking that might be somewhat involved, but I guess it could be pretty 
simple. Just a simple implementation that would contain no buffers, and always 
return null when accessed. And cannot be written to.

 Using IntVector for null expressions causes problems with implicit cast
 ---

 Key: DRILL-3477
 URL: https://issues.apache.org/jira/browse/DRILL-3477
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips
Assignee: Jinfeng Ni

 See DRILL-3353, for example.
 A simple example is this:
 {code}
 select * from t where a = 's';
 {code}
 If the first batch scanned from table t does not contain the column a, the 
 expression materializer in Project defaults to Nullable Int as the type. The 
 Filter then sees an Equals expression between a VarChar and an Int type, so 
 it does an implicit cast. Implicit cast rules give Int higher precedence, so 
 the literal 's' is cast to Int, which ends up throwing a 
 NumberFormatException.
 In the class ResolverTypePrecedence, we see that Null type has the lowest 
 precedence, which makes sense. But since we don't actually currently have an 
 implementation for NullVector, we should materialize the Null type as the 
 Vector with the lowest possible precedence, which is VarBinary.
 My suggestion is that we should use VarBinary as the default type in 
 ExpressionMaterializer instead of Int.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3477) Using IntVector for null expressions causes problems with implicit cast

2015-07-08 Thread Steven Phillips (JIRA)

[
https://issues.apache.org/jira/browse/DRILL-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619772#comment-14619772
]

Steven Phillips commented on DRILL-3477:

I haven't run the test yet, just posting now to get some feedback on the idea.

Are there places in the code that are expecting it to be an IntVector? I
thought it was a somewhat arbitrary choice, and that using a different type
wouldn't cause any additional problems.

Using IntVector for null expressions causes problems with implicit cast
---

Key: DRILL-3477
URL: https://issues.apache.org/jira/browse/DRILL-3477
Project: Apache Drill
Issue Type: Bug
Reporter: Steven Phillips
Assignee: Jinfeng Ni

See DRILL-3353, for example.
A simple example is this:
{code}
select * from t where a = 's';
{code}
If the first batch scanned from table t does not contain the column a, the
expression materializer in Project defaults to Nullable Int as the type. The
Filter then sees an Equals expression between a VarChar and an Int type, so
it does an implicit cast. Implicit cast rules give Int higher precedence, so
the literal 's' is cast to Int, which ends up throwing a
NumberFormatException.
In the class ResolverTypePrecedence, we see that Null type has the lowest
precedence, which makes sense. But since we don't actually currently have an
implementation for NullVector, we should materialize the Null type as the
Vector with the lowest possible precedence, which is VarBinary.
My suggestion is that we should use VarBinary as the default type in
ExpressionMaterializer instead of Int.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3477) Using IntVector for null expressions causes problems with implicit cast

2015-07-08 Thread Steven Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3477:
---
Assignee: Jinfeng Ni  (was: Steven Phillips)

 Using IntVector for null expressions causes problems with implicit cast
 ---

 Key: DRILL-3477
 URL: https://issues.apache.org/jira/browse/DRILL-3477
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips
Assignee: Jinfeng Ni

 See DRILL-3353, for example.
 A simple example is this:
 {code}
 select * from t where a = 's';
 {code}
 If the first batch scanned from table t does not contain the column a, the 
 expression materializer in Project defaults to Nullable Int as the type. The 
 Filter then sees an Equals expression between a VarChar and an Int type, so 
 it does an implicit cast. Implicit cast rules give Int higher precedence, so 
 the literal 's' is cast to Int, which ends up throwing a 
 NumberFormatException.
 In the class ResolverTypePrecedence, we see that Null type has the lowest 
 precedence, which makes sense. But since we don't actually currently have an 
 implementation for NullVector, we should materialize the Null type as the 
 Vector with the lowest possible precedence, which is VarBinary.
 My suggestion is that we should use VarBinary as the default type in 
 ExpressionMaterializer instead of Int.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 3 4 5 6 >

1 - 100 of 547 matches

Mail list logo