Re: Does s3 plugin support AWS S3 signature version 4 ?
Any updates on this? Since we have migrated to Aws Mumbai, we are not able to connect s3 and Drill. On 04-Apr-2017 11:02 PM, "Shankar Mane"wrote: > Quick question here: > > Does s3 plugin support S3 signature version 4 ? > > FYI: s3 plugin works in case when region has support for both v2 and v4 > signature. Whereas it seems problematic, for regions (eg. ap-south-1) which > only has v4 signature version support. > > regards, > shankar >
[GitHub] drill pull request #838: DRILL-5512: Standardize error handling in ScanBatch
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/838#discussion_r117590009 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java --- @@ -173,9 +174,8 @@ public IterOutcome next() { currentReader.allocate(mutator.fieldVectorMap()); } catch (OutOfMemoryException e) { -logger.debug("Caught Out of Memory Exception", e); clearFieldVectorMap(); -return IterOutcome.OUT_OF_MEMORY; +throw UserException.memoryError(e).build(logger); --- End diff -- I am not sure if this specific line change is required, so please correct me if I am wrong. Thinking out loud.. There are three places in ScanBatch where OutOfMemoryException is handled. Since OutOfMemoryException is an unchecked exception, I could not quickly find all the calls which trigger the exception in this method. The [first case](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java#L175) and [second case](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java#L215) are similar in that `reader.allocate(...)` fails. So although there is no unwind logic, seems to me, this case is correctly handled as no records have been read, and so there is no need to unwind. Say this triggers spilling in sort, then the query could complete successfully, if allocate succeeds next time (and so on). Am I following this logic correctly? But this does not seems to be case, as [TestOutOfMemoryOutcome](https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/TestOutOfMemoryOutcome.java#L65) triggers an OutOfMemoryException during ["next" allocation](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java#L172), and all tests are expected to fail. And then, there is the [third case](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java#L247), which is a general catch (e.g.`reader.next()` throws OutOfMemoryException). And as you mentioned, readers cannot unwind, so that correctly fails the fragment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #805: Drill-4139: Exception while trying to prune partiti...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/805#discussion_r117582970 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java --- @@ -508,21 +516,32 @@ public void populatePruningVector(ValueVector v, int index, SchemaPath column, S NullableVarBinaryVector varBinaryVector = (NullableVarBinaryVector) v; Object s = partitionValueMap.get(f).get(column); byte[] bytes; -if (s instanceof Binary) { - bytes = ((Binary) s).getBytes(); -} else if (s instanceof String) { - bytes = ((String) s).getBytes(); -} else if (s instanceof byte[]) { - bytes = (byte[]) s; +if (s == null) { + varBinaryVector.getMutator().setNull(index); + return; } else { - throw new UnsupportedOperationException("Unable to create column data for type: " + type); + bytes = getBytes(type, s); } varBinaryVector.getMutator().setSafe(index, bytes, 0, bytes.length); return; } case DECIMAL18: { NullableDecimal18Vector decimalVector = (NullableDecimal18Vector) v; -Long value = (Long) partitionValueMap.get(f).get(column); +Object s = partitionValueMap.get(f).get(column); --- End diff -- If the patch also changes DECIMAL18's partition pruning, please modify the title of JIRA to reflect such change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #805: Drill-4139: Exception while trying to prune partiti...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/805#discussion_r117583804 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java --- @@ -181,6 +185,71 @@ else if (parquetTableMetadata instanceof Metadata.ParquetTableMetadata_v2 && } /** + * Checks that the metadata file was created by drill with version less than + * the version where was changed the serialization of BINARY values + * and assigns byte arrays to min/max values obtained from the deserialized string. + * + * @param parquetTableMetadata table metadata that should be corrected + */ + public static void correctBinaryInMetadataCache(Metadata.ParquetTableMetadataBase parquetTableMetadata) { +if (hasOldBinarySerialization(parquetTableMetadata)) { + Setnames = Sets.newHashSet(); + if (parquetTableMetadata instanceof Metadata.ParquetTableMetadata_v2) { +for (Metadata.ColumnTypeMetadata_v2 columnTypeMetadata : +((Metadata.ParquetTableMetadata_v2) parquetTableMetadata).columnTypeInfo.values()) { + if (columnTypeMetadata.primitiveType == PrimitiveTypeName.BINARY) { +names.add(Arrays.asList(columnTypeMetadata.name)); + } +} + } + for (Metadata.ParquetFileMetadata file : parquetTableMetadata.getFiles()) { +// Drill has only ever written a single row group per file, only need to correct the statistics +// on the first row group +Metadata.RowGroupMetadata rowGroupMetadata = file.getRowGroups().get(0); --- End diff -- It's true that parquet files created by Drill have single RG per file. But the metadata file could be created from parquet files from other source. Such assumption may not be true. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #805: Drill-4139: Exception while trying to prune partiti...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/805#discussion_r117585451 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java --- @@ -398,10 +399,115 @@ public void testDrill4877() throws Exception { } + @Test // DRILL-4139 + public void testBooleanPartitionPruning() throws Exception { +final String boolPartitionTable = "dfs_test.tmp.`interval_bool_partition`"; +try { + test("create table %s partition by (col_bln) as " + +"select * from cp.`parquet/alltypes_required.parquet`", boolPartitionTable); + test("refresh table metadata %s", boolPartitionTable); --- End diff -- We probably want to cover the cases where metadata cache file is not created (run the query before call "refresh table metadata"). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #805: Drill-4139: Exception while trying to prune partiti...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/805#discussion_r117584586 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java --- @@ -181,6 +185,71 @@ else if (parquetTableMetadata instanceof Metadata.ParquetTableMetadata_v2 && } /** + * Checks that the metadata file was created by drill with version less than + * the version where was changed the serialization of BINARY values + * and assigns byte arrays to min/max values obtained from the deserialized string. + * + * @param parquetTableMetadata table metadata that should be corrected + */ + public static void correctBinaryInMetadataCache(Metadata.ParquetTableMetadataBase parquetTableMetadata) { +if (hasOldBinarySerialization(parquetTableMetadata)) { + Setnames = Sets.newHashSet(); + if (parquetTableMetadata instanceof Metadata.ParquetTableMetadata_v2) { --- End diff -- Any reason you only check v2 here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #805: Drill-4139: Exception while trying to prune partiti...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/805#discussion_r117583114 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java --- @@ -508,21 +516,32 @@ public void populatePruningVector(ValueVector v, int index, SchemaPath column, S NullableVarBinaryVector varBinaryVector = (NullableVarBinaryVector) v; Object s = partitionValueMap.get(f).get(column); byte[] bytes; -if (s instanceof Binary) { - bytes = ((Binary) s).getBytes(); -} else if (s instanceof String) { - bytes = ((String) s).getBytes(); -} else if (s instanceof byte[]) { - bytes = (byte[]) s; +if (s == null) { + varBinaryVector.getMutator().setNull(index); + return; } else { - throw new UnsupportedOperationException("Unable to create column data for type: " + type); + bytes = getBytes(type, s); } varBinaryVector.getMutator().setSafe(index, bytes, 0, bytes.length); return; } case DECIMAL18: { NullableDecimal18Vector decimalVector = (NullableDecimal18Vector) v; -Long value = (Long) partitionValueMap.get(f).get(column); +Object s = partitionValueMap.get(f).get(column); +byte[] bytes; +if (s == null) { + decimalVector.getMutator().setNull(index); + return; +} else if (s instanceof Integer) { + decimalVector.getMutator().setSafe(index, (Integer) s); + return; +} else if (s instanceof Long) { + decimalVector.getMutator().setSafe(index, (Long) s); + return; +} else { + bytes = getBytes(type, s); --- End diff -- For DECIMAL18, under what kind scenarios, would we get a bytes from partitionValueMap? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #805: Drill-4139: Exception while trying to prune partiti...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/805#discussion_r117584380 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java --- @@ -181,6 +185,71 @@ else if (parquetTableMetadata instanceof Metadata.ParquetTableMetadata_v2 && } /** + * Checks that the metadata file was created by drill with version less than + * the version where was changed the serialization of BINARY values --- End diff -- Can you explain a bit more about why the serialization of BINARY values are wrong in prior version? Is it only happening in the metadata cache file? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #805: Drill-4139: Exception while trying to prune partiti...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/805#discussion_r117578486 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java --- @@ -548,23 +567,57 @@ public void populatePruningVector(ValueVector v, int index, SchemaPath column, S NullableVarCharVector varCharVector = (NullableVarCharVector) v; Object s = partitionValueMap.get(f).get(column); byte[] bytes; -if (s instanceof String) { // if the metadata was read from a JSON cache file it maybe a string type - bytes = ((String) s).getBytes(); -} else if (s instanceof Binary) { - bytes = ((Binary) s).getBytes(); -} else if (s instanceof byte[]) { - bytes = (byte[]) s; +if (s == null) { + varCharVector.getMutator().setNull(index); + return; } else { - throw new UnsupportedOperationException("Unable to create column data for type: " + type); + bytes = getBytes(type, s); } varCharVector.getMutator().setSafe(index, bytes, 0, bytes.length); return; } + case INTERVAL: { +NullableIntervalVector intervalVector = (NullableIntervalVector) v; +Object s = partitionValueMap.get(f).get(column); +byte[] bytes; +if (s == null) { + intervalVector.getMutator().setNull(index); + return; +} else { + bytes = getBytes(type, s); +} +intervalVector.getMutator().setSafe(index, 1, + ParquetReaderUtility.getIntFromLEBytes(bytes, 0), + ParquetReaderUtility.getIntFromLEBytes(bytes, 4), + ParquetReaderUtility.getIntFromLEBytes(bytes, 8)); +return; + } default: throw new UnsupportedOperationException("Unsupported type: " + type); } } + /** + * Returns the sequence of bytes received from {@code Object source}. + * + * @param type the column type + * @param source the source of the bytes sequence + * @return bytes sequence obtained from {@code Object source} + */ + private byte[] getBytes(MinorType type, Object source) { +byte[] bytes; +if (source instanceof String) { // if the metadata was read from a JSON cache file it maybe a string type + bytes = Base64.decodeBase64(((String) source).getBytes()); --- End diff -- Any reason you call Base4.decodeBase64, in stead of calling String.getBytes()? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Upgrading Netty
Could this have been addressed in later versions of Netty? Currently, initial performance numbers appear to have no impact on performance with the upgrade to version 4.0.48. Scale and concurrency related tests, along with longevity are in progress. ~ Kunal From: Parth ChandraSent: Friday, May 19, 2017 2:29:50 PM To: dev@drill.apache.org Subject: Re: Upgrading Netty Looks like the specific issue I was referring to was addressed in Netty 4.0.29 [1]. The comment for the commit is a little concerning: Result: ThreadPoolCache is now also usable and so gives performance improvements when allocation and deallocation thread are different. Performance when using same thread for allocation and deallocation is noticable worse then before. We might want to do a performance run to make sure things are no worse than before. [1] https://github.com/netty/netty/commit/f765053ae740e300a6b696840d7dfe5de32afeb3 On Mon, May 15, 2017 at 5:46 PM, Parth Chandra wrote: > The per thread allocation cache in Netty causes unbounded memory growth in > Drill because we pass the ownership of a buffer from one thread to another. > The version we use has a fix for the Drill use case where Netty will no > longer add a buffer to its per thread cache if the buffer was allocated by > a thread which is different from the thread freeing the buffer. > This fix was reversed in a subsequent release and the latest version has > the same issue. > There might have been a fix in Netty for this in some other place which I > am not aware of (perhaps they removed it altogether as Paul seems to have > seen). > AFAIK, we do not have a direct reference to that code in Drill's > allocator. If you try to upgrade and hit an issue, post it here. > If you are able to upgrade the Netty version, then run a longevity test to > make sure there is no 'leaking' of memory from one thread to another. > > > > On Mon, May 15, 2017 at 4:08 PM, Paul Rogers wrote: > >> As it turns out, Drill makes clever use of the internal details of the >> Netty memory allocator. But, that code changed significantly in the last >> couple of years. When I attempted to upgrade, I found that the private >> features of Netty that the Drill allocator uses no longer exist in the >> latest Netty. >> >> So, someone will need to understand what that part of the Drill allocator >> does and design an alternative integration. >> >> The particular issue seems to be that Netty had a per-thread allocation >> cache which seems to not exist in the latest version. >> >> - Paul >> >> > On May 15, 2017, at 3:58 PM, Sudheesh Katkam wrote: >> > >> > Hi all, >> > >> > As part of working on DRILL-5431 [1], I found a bug in Netty [2], which >> is due to be fixed in 4.0.48 [3]. Drill is currently using 4.0.27 [4]. Does >> anyone foresee issues with upgrading to the latest version of Netty? I >> noticed Apache Arrow upgraded to 4.0.41 [5]. >> > >> > Thank you, >> > Sudheesh >> > >> > [1] https://issues.apache.org/jira/browse/DRILL-5431 >> > [2] https://github.com/netty/netty/issues/6709 >> > [3] https://github.com/netty/netty/pull/6713 >> > [4] https://github.com/apache/drill/blob/master/pom.xml#L550 >> > [5] https://github.com/apache/arrow/commit/3487c2f0cdc2297a80ba3 >> 525c192745313b3da48 >> >> >
[GitHub] drill pull request #838: DRILL-5512: Standardize error handling in ScanBatch
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/838#discussion_r117580289 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java --- @@ -173,9 +174,8 @@ public IterOutcome next() { currentReader.allocate(mutator.fieldVectorMap()); } catch (OutOfMemoryException e) { -logger.debug("Caught Out of Memory Exception", e); clearFieldVectorMap(); -return IterOutcome.OUT_OF_MEMORY; +throw UserException.memoryError(e).build(logger); --- End diff -- Good question. Yes, since the `FragmentExecutor` already handles errors and unwinds, we just exploit that (existing, working) path instead of the (also existing, but harder-to-keep working) path of `fail`/`STOP`. The (managed) external sort, for example, reports all its errors via exceptions; it does not use `fail`/`STOP`. The fragment executor recovers just fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Upgrading Netty
Looks like the specific issue I was referring to was addressed in Netty 4.0.29 [1]. The comment for the commit is a little concerning: Result: ThreadPoolCache is now also usable and so gives performance improvements when allocation and deallocation thread are different. Performance when using same thread for allocation and deallocation is noticable worse then before. We might want to do a performance run to make sure things are no worse than before. [1] https://github.com/netty/netty/commit/f765053ae740e300a6b696840d7dfe5de32afeb3 On Mon, May 15, 2017 at 5:46 PM, Parth Chandrawrote: > The per thread allocation cache in Netty causes unbounded memory growth in > Drill because we pass the ownership of a buffer from one thread to another. > The version we use has a fix for the Drill use case where Netty will no > longer add a buffer to its per thread cache if the buffer was allocated by > a thread which is different from the thread freeing the buffer. > This fix was reversed in a subsequent release and the latest version has > the same issue. > There might have been a fix in Netty for this in some other place which I > am not aware of (perhaps they removed it altogether as Paul seems to have > seen). > AFAIK, we do not have a direct reference to that code in Drill's > allocator. If you try to upgrade and hit an issue, post it here. > If you are able to upgrade the Netty version, then run a longevity test to > make sure there is no 'leaking' of memory from one thread to another. > > > > On Mon, May 15, 2017 at 4:08 PM, Paul Rogers wrote: > >> As it turns out, Drill makes clever use of the internal details of the >> Netty memory allocator. But, that code changed significantly in the last >> couple of years. When I attempted to upgrade, I found that the private >> features of Netty that the Drill allocator uses no longer exist in the >> latest Netty. >> >> So, someone will need to understand what that part of the Drill allocator >> does and design an alternative integration. >> >> The particular issue seems to be that Netty had a per-thread allocation >> cache which seems to not exist in the latest version. >> >> - Paul >> >> > On May 15, 2017, at 3:58 PM, Sudheesh Katkam wrote: >> > >> > Hi all, >> > >> > As part of working on DRILL-5431 [1], I found a bug in Netty [2], which >> is due to be fixed in 4.0.48 [3]. Drill is currently using 4.0.27 [4]. Does >> anyone foresee issues with upgrading to the latest version of Netty? I >> noticed Apache Arrow upgraded to 4.0.41 [5]. >> > >> > Thank you, >> > Sudheesh >> > >> > [1] https://issues.apache.org/jira/browse/DRILL-5431 >> > [2] https://github.com/netty/netty/issues/6709 >> > [3] https://github.com/netty/netty/pull/6713 >> > [4] https://github.com/apache/drill/blob/master/pom.xml#L550 >> > [5] https://github.com/apache/arrow/commit/3487c2f0cdc2297a80ba3 >> 525c192745313b3da48 >> >> >
[GitHub] drill pull request #838: DRILL-5512: Standardize error handling in ScanBatch
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/838#discussion_r117576158 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java --- @@ -173,9 +174,8 @@ public IterOutcome next() { currentReader.allocate(mutator.fieldVectorMap()); } catch (OutOfMemoryException e) { -logger.debug("Caught Out of Memory Exception", e); clearFieldVectorMap(); -return IterOutcome.OUT_OF_MEMORY; +throw UserException.memoryError(e).build(logger); --- End diff -- Makes sense. Asking the question a different way.. To avoid regressions, should the pre-requisite changes ([DRILL-5211](https://issues.apache.org/jira/browse/DRILL-5211) and pertinent tasks) be committed before this patch? Or since _the readers do not correctly handle the case_ anyway, there will be no difference? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #838: DRILL-5512: Standardize error handling in ScanBatch
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/838#discussion_r117576060 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java --- @@ -213,17 +213,16 @@ public IterOutcome next() { try { currentReader.allocate(mutator.fieldVectorMap()); } catch (OutOfMemoryException e) { -logger.debug("Caught OutOfMemoryException"); clearFieldVectorMap(); -return IterOutcome.OUT_OF_MEMORY; +throw UserException.memoryError(e).build(logger); } addImplicitVectors(); } catch (ExecutionSetupException e) { - this.context.fail(e); --- End diff -- Sounds good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #818: DRILL-5140: Fix CompileException in run-time genera...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/818#discussion_r117573874 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/ClassGenerator.java --- @@ -77,10 +81,43 @@ private final CodeGenerator codeGenerator; public final JDefinedClass clazz; - private final LinkedList[] blocks; + private final JCodeModel model; private final OptionSet optionManager; + private ClassGenerator innerClassGenerator; + private LinkedList[] blocks; + private LinkedList[] oldBlocks; + + /** + * Assumed that field has 3 indexes within the constant pull: index of the CONSTANT_Fieldref_info + --- End diff -- I'm not entirely sure the calculation is correct, in terms of # of entries per field in constant pool of a class. Per JVM spec, each class field has CONSTANT_Fieldref_info (1 entry), which has class_index and name_and_type_index. The class_index points CONSTANT_Class_info, which is shared by across all the class fields. The second points to CONSTANT_NameAndType_info (1 entry), which points to name (1 entry) and descriptor (1 entry). Therefore, for each class field, at least 4 entries are required in constant pool. Similarly, we could get 4 entries for each method. Besides fields and methods, we also have to take constant literal into account, like int, float , string ... constant. For constant literals, since we apply source-code copy for build-in-function /udf, it's hard to figure out exactly how many constants are used in the generated class. Given the above reasons, I'm not sure whether it makes sense to try to come up with a formula to estimate the maximum # of fields a generated class could have. If the estimation is not accurate, then what if we just provides a ballpark estimation and put some 'magic' number here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (DRILL-5529) Repeated vectors missing "fill empties" logic
Paul Rogers created DRILL-5529: -- Summary: Repeated vectors missing "fill empties" logic Key: DRILL-5529 URL: https://issues.apache.org/jira/browse/DRILL-5529 Project: Apache Drill Issue Type: Bug Affects Versions: 1.8.0 Reporter: Paul Rogers Assignee: Paul Rogers Fix For: 1.11.0 Consider the Drill {{OptionalVarCharVector}} type. This vector is composed of three buffers (also called vectors): * Is-set (bit) vector: contains 1 if the value is set, 0 if it is null. * Data vector; effectively a byte array in which each value is packed one after another. * Offset vector, in which the entry for each row points to the first byte of the value in the data vector. Suppose we have the values "foo", null, "bar". Then, the vectors contain: {code} Is-Set: [1 0 1] Offsets: [0 3 3 6] Data: [f o o b a r] {code} (Yes, there is one more offset entry than rows.) Suppose that the code creating the vector writes values for rows 1 and 3, but omits 2 (it is null, which is the default). How do we get that required value of 3 in the entry for row 2? The answer is that the logic for setting a value keeps track of the last write position and "backfills" missing offset values: {code} public void setSafe(int index, ByteBuffer value, int start, int length) { if (index > lastSet + 1) { fillEmpties(index); } ... {code} So, when we write the value for row 3 ("bar") we back-fill the missing offset for row 2. So far so good. We can now generalize. We must to the same trick any time that we use a vector that uses an offset vector. There are three other cases: * Required variable-width vectors (where a missing value is the same as an empty string). * A repeated fixed-width vector. * A repeated variable-width vector (which has *two* offset vectors). The problem is, none of these actually provide the required code. The caller must implement its own back-fill logic else the offset vectors become corrupted. Consider the required {{VarCharVector}}: {code} protected void set(int index, byte[] bytes, int start, int length) { assert index >= 0; final int currentOffset = offsetVector.getAccessor().get(index); offsetVector.getMutator().set(index + 1, currentOffset + length); data.setBytes(currentOffset, bytes, start, length); } {code} As a result of this omission, any client which skips null values will corrupt offset vectors. Consider an example: "try", "foo", "", "bar". We omit writing record 2 (empty string). Desired result: {code} Data: [t r y f o o b a r] Offsets: [0 3 6 6 9] {code} Actual result: {code} Data: [t r y f o o b a r] Offsets: [0 3 6 0 9] {code} The result is that we compute the width of field 2 as -6, not 3. The value of the empty field is 9, not 0. A similar issue arrises with repeated vectors. Consider {{RepeatedVarCharVector}}: {code} public void addSafe(int index, byte[] bytes, int start, int length) { final int nextOffset = offsets.getAccessor().get(index+1); values.getMutator().setSafe(nextOffset, bytes, start, length); offsets.getMutator().setSafe(index+1, nextOffset+1); } {code} Consider this example: (\["a", "b"], \[ ], \["d", "e"]). Expected: {code} Array Offset: [0 2 2 4] Value Offset: [0 1 2 3 4] Data: [a b d e] {code} Actual: {code} Array Offset: [0 2 0 4] Value Offset: [0 1 2 3 4] Data: [a b d e] {code} The entry for the (unwritten) position 2 is missing. This bug may be the root cause of several other issues found recently. (Potentially DRILL-5470 -- need to verify.) Two resolutions are possible: * Require that client code write all values, backfilling empty or null values as needed. * Generalize the mutators to back-fill in all cases, not just a nullable var char. A related issue occurs when a reader fails to do a "final fill" at the end of a batch (DRILL-5487). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: [ANNOUNCE] New Committer: Paul Rogers
Congratulations Paul! Really well deserved! Kind regards Vitalii On Fri, May 19, 2017 at 6:31 PM, Parth Chandrawrote: > I thinks it's time to put a link to Paul's wiki in the Apache Drill web > site. > > On Fri, May 19, 2017 at 11:16 AM, Sudheesh Katkam > wrote: > > > Forgot to mention, not many developers know about this: > > https://github.com/paul-rogers/drill/wiki > > > > So thank you Paul, for that informative wiki, and all your contributions. > > > > On May 19, 2017, at 10:50 AM, Paul Rogers > r...@mapr.com>> wrote: > > > > Thanks everyone! > > > > - Paul > > > > On May 19, 2017, at 10:30 AM, Kunal Khatua > u...@mapr.com>> wrote: > > > > Congratulations, Paul !! Thank you for your contributions! > > > > > > From: Khurram Faraaz > > > Sent: Friday, May 19, 2017 10:07:09 AM > > To: dev > > Subject: Re: [ANNOUNCE] New Committer: Paul Rogers > > > > Congratulations, Paul! > > > > > > From: Bridget Bevens > > > Sent: Friday, May 19, 2017 10:29:29 PM > > To: dev > > Subject: Re: [ANNOUNCE] New Committer: Paul Rogers > > > > Congratulations, Paul! > > > > > > From: Jinfeng Ni > > > Sent: Friday, May 19, 2017 9:57:35 AM > > To: dev > > Subject: Re: [ANNOUNCE] New Committer: Paul Rogers > > > > Congratulations, Paul! > > > > > > On Fri, May 19, 2017 at 9:36 AM, Aman Bawa > mapr.com>> wrote: > > > > Congratulations, Paul! > > > > On 5/19/17, 8:22 AM, "Aman Sinha" mansi...@apache.org>> wrote: > > > > The Project Management Committee (PMC) for Apache Drill has invited > > Paul > > Rogers to become a committer, and we are pleased to announce that he > > has > > accepted. > > > > Paul has a long list of contributions that have touched many aspects > > of the > > product. > > > > Welcome Paul, and thank you for your contributions. Keep up the good > > work ! > > > > - Aman > > > > (on behalf of the Apache Drill PMC) > > > > > > > > > > > > >
Re: [ANNOUNCE] New Committer: Paul Rogers
I thinks it's time to put a link to Paul's wiki in the Apache Drill web site. On Fri, May 19, 2017 at 11:16 AM, Sudheesh Katkamwrote: > Forgot to mention, not many developers know about this: > https://github.com/paul-rogers/drill/wiki > > So thank you Paul, for that informative wiki, and all your contributions. > > On May 19, 2017, at 10:50 AM, Paul Rogers r...@mapr.com>> wrote: > > Thanks everyone! > > - Paul > > On May 19, 2017, at 10:30 AM, Kunal Khatua u...@mapr.com>> wrote: > > Congratulations, Paul !! Thank you for your contributions! > > > From: Khurram Faraaz > > Sent: Friday, May 19, 2017 10:07:09 AM > To: dev > Subject: Re: [ANNOUNCE] New Committer: Paul Rogers > > Congratulations, Paul! > > > From: Bridget Bevens > > Sent: Friday, May 19, 2017 10:29:29 PM > To: dev > Subject: Re: [ANNOUNCE] New Committer: Paul Rogers > > Congratulations, Paul! > > > From: Jinfeng Ni > > Sent: Friday, May 19, 2017 9:57:35 AM > To: dev > Subject: Re: [ANNOUNCE] New Committer: Paul Rogers > > Congratulations, Paul! > > > On Fri, May 19, 2017 at 9:36 AM, Aman Bawa mapr.com>> wrote: > > Congratulations, Paul! > > On 5/19/17, 8:22 AM, "Aman Sinha" > wrote: > > The Project Management Committee (PMC) for Apache Drill has invited > Paul > Rogers to become a committer, and we are pleased to announce that he > has > accepted. > > Paul has a long list of contributions that have touched many aspects > of the > product. > > Welcome Paul, and thank you for your contributions. Keep up the good > work ! > > - Aman > > (on behalf of the Apache Drill PMC) > > > > > >
Re: [ANNOUNCE] New Committer: Paul Rogers
Forgot to mention, not many developers know about this: https://github.com/paul-rogers/drill/wiki So thank you Paul, for that informative wiki, and all your contributions. On May 19, 2017, at 10:50 AM, Paul Rogers> wrote: Thanks everyone! - Paul On May 19, 2017, at 10:30 AM, Kunal Khatua > wrote: Congratulations, Paul !! Thank you for your contributions! From: Khurram Faraaz > Sent: Friday, May 19, 2017 10:07:09 AM To: dev Subject: Re: [ANNOUNCE] New Committer: Paul Rogers Congratulations, Paul! From: Bridget Bevens > Sent: Friday, May 19, 2017 10:29:29 PM To: dev Subject: Re: [ANNOUNCE] New Committer: Paul Rogers Congratulations, Paul! From: Jinfeng Ni > Sent: Friday, May 19, 2017 9:57:35 AM To: dev Subject: Re: [ANNOUNCE] New Committer: Paul Rogers Congratulations, Paul! On Fri, May 19, 2017 at 9:36 AM, Aman Bawa > wrote: Congratulations, Paul! On 5/19/17, 8:22 AM, "Aman Sinha" > wrote: The Project Management Committee (PMC) for Apache Drill has invited Paul Rogers to become a committer, and we are pleased to announce that he has accepted. Paul has a long list of contributions that have touched many aspects of the product. Welcome Paul, and thank you for your contributions. Keep up the good work ! - Aman (on behalf of the Apache Drill PMC)
DrillTextRecordReader -- still used?
Hi All, Drill has two text readers: the (RFC 4180) compliant version, and DrillTextRecordReader. It seems that the complaint one is newer and is selected by a session option, exec.storage.enable_new_text_reader, which defaults to true. Do we know of any users that set this option to false to use the old version? I ask because I am retrofitting the “compliant” version to limit vector sizes (DRILL-5211). I wonder if, rather than retrofitting the old reader, we can just retire it altogether. Do we have enough confidence in the “compliant” version that we can retire DrillTextRecordReader? Thanks, - Paul
[GitHub] drill pull request #838: DRILL-5512: Standardize error handling in ScanBatch
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/838#discussion_r117536637 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java --- @@ -173,9 +174,8 @@ public IterOutcome next() { currentReader.allocate(mutator.fieldVectorMap()); } catch (OutOfMemoryException e) { -logger.debug("Caught Out of Memory Exception", e); clearFieldVectorMap(); -return IterOutcome.OUT_OF_MEMORY; +throw UserException.memoryError(e).build(logger); --- End diff -- As it turns out, the idea of the OUT_OF_MEMORY return code works better in theory than in practice. No reader correctly handles this case. Let's say we have three columns (a, b, c). Let say that column c needs to double its vector, but hits OOM. No reader has the internal state needed to hold onto the value for c, unwind the call stack, then on the next next() call, rewind back to the point of writing c into the in-flight row. Moving forward, we want to take a broader approach to memory: budget sufficient memory that readers can work. Modify the mutators so that they enforce batch size limits so that the reader operates within its budget. As we move to that approach, the OUT_OF_MEMORY status will be retired. The JIRA mentions another JIRA that holds a spec for all this stuff; something we discussed six months ago, but did not have time to implement then. This all merits a complete discussion; maybe we can discuss the overall approach in that other JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #838: DRILL-5512: Standardize error handling in ScanBatch
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/838#discussion_r117535728 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java --- @@ -213,17 +213,16 @@ public IterOutcome next() { try { currentReader.allocate(mutator.fieldVectorMap()); } catch (OutOfMemoryException e) { -logger.debug("Caught OutOfMemoryException"); clearFieldVectorMap(); -return IterOutcome.OUT_OF_MEMORY; +throw UserException.memoryError(e).build(logger); } addImplicitVectors(); } catch (ExecutionSetupException e) { - this.context.fail(e); --- End diff -- Throwing an exception, it turns out, does exactly the same: it cancels the query and causes the fragment executor to cascade close() calls to all the operators (record batches) in the fragment tree. It seems some code kills the query by throwing an exception, other code calls the fail method and bubbles up STOP. But, since the proper way to handle STOP is to unwind the stack, STOP is equivalent to throwing an exception. The idea is, rather than have two ways to clean up, let's standardize on one. Since we must handle unchecked exceptions in any case, the exception-based solution is the logical choice for standardization. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [ANNOUNCE] New Committer: Paul Rogers
Congratulations, Paul !! Thank you for your contributions! From: Khurram FaraazSent: Friday, May 19, 2017 10:07:09 AM To: dev Subject: Re: [ANNOUNCE] New Committer: Paul Rogers Congratulations, Paul! From: Bridget Bevens Sent: Friday, May 19, 2017 10:29:29 PM To: dev Subject: Re: [ANNOUNCE] New Committer: Paul Rogers Congratulations, Paul! From: Jinfeng Ni Sent: Friday, May 19, 2017 9:57:35 AM To: dev Subject: Re: [ANNOUNCE] New Committer: Paul Rogers Congratulations, Paul! On Fri, May 19, 2017 at 9:36 AM, Aman Bawa wrote: > Congratulations, Paul! > > On 5/19/17, 8:22 AM, "Aman Sinha" wrote: > > The Project Management Committee (PMC) for Apache Drill has invited > Paul > Rogers to become a committer, and we are pleased to announce that he > has > accepted. > > Paul has a long list of contributions that have touched many aspects > of the > product. > > Welcome Paul, and thank you for your contributions. Keep up the good > work ! > > - Aman > > (on behalf of the Apache Drill PMC) > > >
[GitHub] drill pull request #838: DRILL-5512: Standardize error handling in ScanBatch
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/838#discussion_r117532070 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java --- @@ -213,17 +213,16 @@ public IterOutcome next() { try { currentReader.allocate(mutator.fieldVectorMap()); } catch (OutOfMemoryException e) { -logger.debug("Caught OutOfMemoryException"); clearFieldVectorMap(); -return IterOutcome.OUT_OF_MEMORY; +throw UserException.memoryError(e).build(logger); } addImplicitVectors(); } catch (ExecutionSetupException e) { - this.context.fail(e); --- End diff -- This call triggers query failure (stopping the fragment, notifying the Foreman, and cancelling other fragments, etc.). What is the flow after this change? Similar changes below. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #838: DRILL-5512: Standardize error handling in ScanBatch
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/838#discussion_r117531676 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java --- @@ -173,9 +174,8 @@ public IterOutcome next() { currentReader.allocate(mutator.fieldVectorMap()); } catch (OutOfMemoryException e) { -logger.debug("Caught Out of Memory Exception", e); clearFieldVectorMap(); -return IterOutcome.OUT_OF_MEMORY; +throw UserException.memoryError(e).build(logger); --- End diff -- The non-managed external sort spills to disk in case it receives this outcome. I do not know if there are other operators that handle this outcome. Are all the pre-requisite changes (to handle this change) already committed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #842: DRILL-5523: Revert if condition in UnionAllRecordBatch cha...
Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/842 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #830: DRILL-5498: Improve handling of CSV column headers
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/830 Commits squashed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [ANNOUNCE] New Committer: Paul Rogers
Congratulations, Paul! From: Bridget BevensSent: Friday, May 19, 2017 10:29:29 PM To: dev Subject: Re: [ANNOUNCE] New Committer: Paul Rogers Congratulations, Paul! From: Jinfeng Ni Sent: Friday, May 19, 2017 9:57:35 AM To: dev Subject: Re: [ANNOUNCE] New Committer: Paul Rogers Congratulations, Paul! On Fri, May 19, 2017 at 9:36 AM, Aman Bawa wrote: > Congratulations, Paul! > > On 5/19/17, 8:22 AM, "Aman Sinha" wrote: > > The Project Management Committee (PMC) for Apache Drill has invited > Paul > Rogers to become a committer, and we are pleased to announce that he > has > accepted. > > Paul has a long list of contributions that have touched many aspects > of the > product. > > Welcome Paul, and thank you for your contributions. Keep up the good > work ! > > - Aman > > (on behalf of the Apache Drill PMC) > > >
Re: [ANNOUNCE] New Committer: Paul Rogers
Congrats, Paul!! From: Aman SinhaSent: Friday, May 19, 2017 8:22:46 AM To: dev@drill.apache.org Subject: [ANNOUNCE] New Committer: Paul Rogers The Project Management Committee (PMC) for Apache Drill has invited Paul Rogers to become a committer, and we are pleased to announce that he has accepted. Paul has a long list of contributions that have touched many aspects of the product. Welcome Paul, and thank you for your contributions. Keep up the good work ! - Aman (on behalf of the Apache Drill PMC)
[GitHub] drill issue #832: DRILL-5504: Vector validator to diagnose offset vector iss...
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/832 Commits squashed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #830: DRILL-5498: Improve handling of CSV column headers
Github user sudheeshkatkam commented on the issue: https://github.com/apache/drill/pull/830 +1 Please squash the commits. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [ANNOUNCE] New Committer: Paul Rogers
Congratulations, Paul! From: Jinfeng NiSent: Friday, May 19, 2017 9:57:35 AM To: dev Subject: Re: [ANNOUNCE] New Committer: Paul Rogers Congratulations, Paul! On Fri, May 19, 2017 at 9:36 AM, Aman Bawa wrote: > Congratulations, Paul! > > On 5/19/17, 8:22 AM, "Aman Sinha" wrote: > > The Project Management Committee (PMC) for Apache Drill has invited > Paul > Rogers to become a committer, and we are pleased to announce that he > has > accepted. > > Paul has a long list of contributions that have touched many aspects > of the > product. > > Welcome Paul, and thank you for your contributions. Keep up the good > work ! > > - Aman > > (on behalf of the Apache Drill PMC) > > >
Re: [ANNOUNCE] New Committer: Paul Rogers
Congratulations, Paul! On Fri, May 19, 2017 at 9:36 AM, Aman Bawawrote: > Congratulations, Paul! > > On 5/19/17, 8:22 AM, "Aman Sinha" wrote: > > The Project Management Committee (PMC) for Apache Drill has invited > Paul > Rogers to become a committer, and we are pleased to announce that he > has > accepted. > > Paul has a long list of contributions that have touched many aspects > of the > product. > > Welcome Paul, and thank you for your contributions. Keep up the good > work ! > > - Aman > > (on behalf of the Apache Drill PMC) > > >
Re: [ANNOUNCE] New Committer: Paul Rogers
Congratulations, Paul! On 5/19/17, 8:22 AM, "Aman Sinha"wrote: The Project Management Committee (PMC) for Apache Drill has invited Paul Rogers to become a committer, and we are pleased to announce that he has accepted. Paul has a long list of contributions that have touched many aspects of the product. Welcome Paul, and thank you for your contributions. Keep up the good work ! - Aman (on behalf of the Apache Drill PMC)
Re: [ANNOUNCE] New Committer: Paul Rogers
Congratulations Paul and thank you for your contributions to the project. Parth On Fri, May 19, 2017 at 9:20 AM, rahul challapalli < challapallira...@gmail.com> wrote: > Congratulations Paul. Well Deserved. > > On Fri, May 19, 2017 at 8:46 AM, Gautam Paraiwrote: > > > Congratulations Paul and thank you for your contributions! > > > > > > Gautam > > > > > > From: Abhishek Girish > > Sent: Friday, May 19, 2017 8:27:05 AM > > To: dev@drill.apache.org > > Subject: Re: [ANNOUNCE] New Committer: Paul Rogers > > > > Congrats Paul! > > > > On Fri, May 19, 2017 at 8:23 AM, Charles Givre wrote: > > > > > Congrats Paul!! > > > > > > On Fri, May 19, 2017 at 11:22 AM, Aman Sinha > > wrote: > > > > > > > The Project Management Committee (PMC) for Apache Drill has invited > > Paul > > > > Rogers to become a committer, and we are pleased to announce that he > > has > > > > accepted. > > > > > > > > Paul has a long list of contributions that have touched many aspects > of > > > the > > > > product. > > > > > > > > Welcome Paul, and thank you for your contributions. Keep up the good > > > work > > > > ! > > > > > > > > - Aman > > > > > > > > (on behalf of the Apache Drill PMC) > > > > > > > > > >
Re: [ANNOUNCE] New Committer: Paul Rogers
Congratulations, Paul! > On May 19, 2017, at 9:46 AM, Robert Houwrote: > > Congrats, Paul! > > > > From: Chunhui Shi > Sent: Friday, May 19, 2017 9:44 AM > To: dev > Subject: Re: [ANNOUNCE] New Committer: Paul Rogers > > Congrats Paul! Thank you for your contributions! > > > From: rahul challapalli > Sent: Friday, May 19, 2017 9:20:52 AM > To: dev > Subject: Re: [ANNOUNCE] New Committer: Paul Rogers > > Congratulations Paul. Well Deserved. > > On Fri, May 19, 2017 at 8:46 AM, Gautam Parai wrote: > >> Congratulations Paul and thank you for your contributions! >> >> >> Gautam >> >> >> From: Abhishek Girish >> Sent: Friday, May 19, 2017 8:27:05 AM >> To: dev@drill.apache.org >> Subject: Re: [ANNOUNCE] New Committer: Paul Rogers >> >> Congrats Paul! >> >> On Fri, May 19, 2017 at 8:23 AM, Charles Givre wrote: >> >>> Congrats Paul!! >>> >>> On Fri, May 19, 2017 at 11:22 AM, Aman Sinha >> wrote: >>> The Project Management Committee (PMC) for Apache Drill has invited >> Paul Rogers to become a committer, and we are pleased to announce that he >> has accepted. Paul has a long list of contributions that have touched many aspects of >>> the product. Welcome Paul, and thank you for your contributions. Keep up the good >>> work ! - Aman (on behalf of the Apache Drill PMC) >>> >>
Re: [ANNOUNCE] New Committer: Paul Rogers
Congrats, Paul! From: Chunhui ShiSent: Friday, May 19, 2017 9:44 AM To: dev Subject: Re: [ANNOUNCE] New Committer: Paul Rogers Congrats Paul! Thank you for your contributions! From: rahul challapalli Sent: Friday, May 19, 2017 9:20:52 AM To: dev Subject: Re: [ANNOUNCE] New Committer: Paul Rogers Congratulations Paul. Well Deserved. On Fri, May 19, 2017 at 8:46 AM, Gautam Parai wrote: > Congratulations Paul and thank you for your contributions! > > > Gautam > > > From: Abhishek Girish > Sent: Friday, May 19, 2017 8:27:05 AM > To: dev@drill.apache.org > Subject: Re: [ANNOUNCE] New Committer: Paul Rogers > > Congrats Paul! > > On Fri, May 19, 2017 at 8:23 AM, Charles Givre wrote: > > > Congrats Paul!! > > > > On Fri, May 19, 2017 at 11:22 AM, Aman Sinha > wrote: > > > > > The Project Management Committee (PMC) for Apache Drill has invited > Paul > > > Rogers to become a committer, and we are pleased to announce that he > has > > > accepted. > > > > > > Paul has a long list of contributions that have touched many aspects of > > the > > > product. > > > > > > Welcome Paul, and thank you for your contributions. Keep up the good > > work > > > ! > > > > > > - Aman > > > > > > (on behalf of the Apache Drill PMC) > > > > > >
Re: [ANNOUNCE] New Committer: Paul Rogers
Congrats Paul! Thank you for your contributions! From: rahul challapalliSent: Friday, May 19, 2017 9:20:52 AM To: dev Subject: Re: [ANNOUNCE] New Committer: Paul Rogers Congratulations Paul. Well Deserved. On Fri, May 19, 2017 at 8:46 AM, Gautam Parai wrote: > Congratulations Paul and thank you for your contributions! > > > Gautam > > > From: Abhishek Girish > Sent: Friday, May 19, 2017 8:27:05 AM > To: dev@drill.apache.org > Subject: Re: [ANNOUNCE] New Committer: Paul Rogers > > Congrats Paul! > > On Fri, May 19, 2017 at 8:23 AM, Charles Givre wrote: > > > Congrats Paul!! > > > > On Fri, May 19, 2017 at 11:22 AM, Aman Sinha > wrote: > > > > > The Project Management Committee (PMC) for Apache Drill has invited > Paul > > > Rogers to become a committer, and we are pleased to announce that he > has > > > accepted. > > > > > > Paul has a long list of contributions that have touched many aspects of > > the > > > product. > > > > > > Welcome Paul, and thank you for your contributions. Keep up the good > > work > > > ! > > > > > > - Aman > > > > > > (on behalf of the Apache Drill PMC) > > > > > >
[jira] [Created] (DRILL-5528) Sorting 19GB data with 14GB memory in a single fragment takes ~150 minutes
Rahul Challapalli created DRILL-5528: Summary: Sorting 19GB data with 14GB memory in a single fragment takes ~150 minutes Key: DRILL-5528 URL: https://issues.apache.org/jira/browse/DRILL-5528 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.10.0 Reporter: Rahul Challapalli Assignee: Paul Rogers Configuration : {code} git.commit.id.abbrev=1e0a14c DRILL_MAX_DIRECT_MEMORY="32G" DRILL_MAX_HEAP="4G" {code} Based on the runtime of the below query, I suspect there is a performance bottleneck somewhere {code} [root@qa-node190 external-sort]# /opt/drill/bin/sqlline -u jdbc:drill:zk=10.10.100.190:5181 apache drill 1.11.0-SNAPSHOT "start your sql engine" 0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET `exec.sort.disable_managed` = false; +---+-+ | ok | summary | +---+-+ | true | exec.sort.disable_managed updated. | +---+-+ 1 row selected (0.975 seconds) 0: jdbc:drill:zk=10.10.100.190:5181> alter session set `planner.width.max_per_node` = 1; +---+--+ | ok | summary| +---+--+ | true | planner.width.max_per_node updated. | +---+--+ 1 row selected (0.371 seconds) 0: jdbc:drill:zk=10.10.100.190:5181> alter session set `planner.disable_exchanges` = true; +---+-+ | ok | summary | +---+-+ | true | planner.disable_exchanges updated. | +---+-+ 1 row selected (0.292 seconds) 0: jdbc:drill:zk=10.10.100.190:5181> alter session set `planner.memory.max_query_memory_per_node` = 14106127360; +---++ | ok | summary | +---++ | true | planner.memory.max_query_memory_per_node updated. | +---++ 1 row selected (0.316 seconds) 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select * from dfs.`/drill/testdata/resource-manager/250wide.tbl` order by columns[0])d where d.columns[0] = 'ljdfhwuehnoiueyf'; +-+ | EXPR$0 | +-+ | 0 | +-+ 1 row selected (8530.719 seconds) {code} I attached the logs and profile files. The data is too large to attach to a jira. Reach out to me if you need any more information -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: [ANNOUNCE] New Committer: Paul Rogers
Congratulations Paul. Well Deserved. On Fri, May 19, 2017 at 8:46 AM, Gautam Paraiwrote: > Congratulations Paul and thank you for your contributions! > > > Gautam > > > From: Abhishek Girish > Sent: Friday, May 19, 2017 8:27:05 AM > To: dev@drill.apache.org > Subject: Re: [ANNOUNCE] New Committer: Paul Rogers > > Congrats Paul! > > On Fri, May 19, 2017 at 8:23 AM, Charles Givre wrote: > > > Congrats Paul!! > > > > On Fri, May 19, 2017 at 11:22 AM, Aman Sinha > wrote: > > > > > The Project Management Committee (PMC) for Apache Drill has invited > Paul > > > Rogers to become a committer, and we are pleased to announce that he > has > > > accepted. > > > > > > Paul has a long list of contributions that have touched many aspects of > > the > > > product. > > > > > > Welcome Paul, and thank you for your contributions. Keep up the good > > work > > > ! > > > > > > - Aman > > > > > > (on behalf of the Apache Drill PMC) > > > > > >
[GitHub] drill issue #832: DRILL-5504: Vector validator to diagnose offset vector iss...
Github user sudheeshkatkam commented on the issue: https://github.com/apache/drill/pull/832 +1 Please squash the commits. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [ANNOUNCE] New Committer: Paul Rogers
Congratulations Paul and thank you for your contributions! Gautam From: Abhishek GirishSent: Friday, May 19, 2017 8:27:05 AM To: dev@drill.apache.org Subject: Re: [ANNOUNCE] New Committer: Paul Rogers Congrats Paul! On Fri, May 19, 2017 at 8:23 AM, Charles Givre wrote: > Congrats Paul!! > > On Fri, May 19, 2017 at 11:22 AM, Aman Sinha wrote: > > > The Project Management Committee (PMC) for Apache Drill has invited Paul > > Rogers to become a committer, and we are pleased to announce that he has > > accepted. > > > > Paul has a long list of contributions that have touched many aspects of > the > > product. > > > > Welcome Paul, and thank you for your contributions. Keep up the good > work > > ! > > > > - Aman > > > > (on behalf of the Apache Drill PMC) > > >
Re: [ANNOUNCE] New Committer: Paul Rogers
Congrats Paul! On Fri, May 19, 2017 at 8:23 AM, Charles Givrewrote: > Congrats Paul!! > > On Fri, May 19, 2017 at 11:22 AM, Aman Sinha wrote: > > > The Project Management Committee (PMC) for Apache Drill has invited Paul > > Rogers to become a committer, and we are pleased to announce that he has > > accepted. > > > > Paul has a long list of contributions that have touched many aspects of > the > > product. > > > > Welcome Paul, and thank you for your contributions. Keep up the good > work > > ! > > > > - Aman > > > > (on behalf of the Apache Drill PMC) > > >
Re: [ANNOUNCE] New Committer: Paul Rogers
Congratulations Paul! Well deserved! On Fri, May 19, 2017 at 6:23 PM, Charles Givrewrote: > Congrats Paul!! > > On Fri, May 19, 2017 at 11:22 AM, Aman Sinha wrote: > > > The Project Management Committee (PMC) for Apache Drill has invited Paul > > Rogers to become a committer, and we are pleased to announce that he has > > accepted. > > > > Paul has a long list of contributions that have touched many aspects of > the > > product. > > > > Welcome Paul, and thank you for your contributions. Keep up the good > work > > ! > > > > - Aman > > > > (on behalf of the Apache Drill PMC) > > >
[jira] [Created] (DRILL-5527) Support for querying slowly changing dimensions of HBase/MapR-DB tables on TIMESTAMP/TIMERANGE/VERSION
Alan Fischer e Silva created DRILL-5527: --- Summary: Support for querying slowly changing dimensions of HBase/MapR-DB tables on TIMESTAMP/TIMERANGE/VERSION Key: DRILL-5527 URL: https://issues.apache.org/jira/browse/DRILL-5527 Project: Apache Drill Issue Type: New Feature Components: Storage - HBase Affects Versions: 1.10.0 Reporter: Alan Fischer e Silva HBase and MapR-DB support versioning of cell values via timestamp, but today a Drill query only returns the most recent version of a cell. Being able to query an HBase/MapR-DB cell on it's version, timestamp or timerange would be a major improvement to the HBase storage plugin in order to support slowly changing dimensions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: [ANNOUNCE] New Committer: Paul Rogers
Congrats Paul!! On Fri, May 19, 2017 at 11:22 AM, Aman Sinhawrote: > The Project Management Committee (PMC) for Apache Drill has invited Paul > Rogers to become a committer, and we are pleased to announce that he has > accepted. > > Paul has a long list of contributions that have touched many aspects of the > product. > > Welcome Paul, and thank you for your contributions. Keep up the good work > ! > > - Aman > > (on behalf of the Apache Drill PMC) >
[ANNOUNCE] New Committer: Paul Rogers
The Project Management Committee (PMC) for Apache Drill has invited Paul Rogers to become a committer, and we are pleased to announce that he has accepted. Paul has a long list of contributions that have touched many aspects of the product. Welcome Paul, and thank you for your contributions. Keep up the good work ! - Aman (on behalf of the Apache Drill PMC)
[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/809#discussion_r117423249 --- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp --- @@ -234,6 +233,37 @@ void DrillClientImpl::Close() { } /* + * Write bytesToWrite length data bytes pointed by dataPtr. It handles EINTR error + * occurred during write_some sys call and does a retry on that. + * + * Parameters: + * dataPtr - in param - Pointer to data bytes to write on socket. + * bytesToWrite - in param - Length of data bytes to write from dataPtr. + * errorCode- out param - Error code set by boost. + */ +void DrillClientImpl::doWriteToSocket(const char* dataPtr, size_t bytesToWrite, +boost::system::error_code& errorCode) { +if(0 == bytesToWrite) { --- End diff -- Not really. Since write_some will set the proper error code in that case. The handing for `bytesToWrite == 0` was done since that's a success case and didn't want to call write_some on it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/809#discussion_r117423292 --- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp --- @@ -364,7 +395,41 @@ connectionStatus_t DrillClientImpl::recvHandshake(){ return CONN_SUCCESS; } -void DrillClientImpl::handleHandshake(ByteBuf_t _buf, +/* + * Read bytesToRead length data bytes from socket into inBuf. It handles EINTR error + * occurred during read_some sys call and does a retry on that. + * + * Parameters: + * inBuf- out param - Pointer to buffer to read data into from socket. + * bytesToRead - in param - Length of data bytes to read from socket. + * errorCode- out param - Error code set by boost. + */ +void DrillClientImpl::doReadFromSocket(ByteBuf_t inBuf, size_t bytesToRead, + boost::system::error_code& errorCode) { + +// Check if bytesToRead is zero +if(0 == bytesToRead) { --- End diff -- Same as above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...
Github user bitblender commented on a diff in the pull request: https://github.com/apache/drill/pull/809#discussion_r117412544 --- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp --- @@ -364,7 +395,41 @@ connectionStatus_t DrillClientImpl::recvHandshake(){ return CONN_SUCCESS; } -void DrillClientImpl::handleHandshake(ByteBuf_t _buf, +/* + * Read bytesToRead length data bytes from socket into inBuf. It handles EINTR error + * occurred during read_some sys call and does a retry on that. + * + * Parameters: + * inBuf- out param - Pointer to buffer to read data into from socket. + * bytesToRead - in param - Length of data bytes to read from socket. + * errorCode- out param - Error code set by boost. + */ +void DrillClientImpl::doReadFromSocket(ByteBuf_t inBuf, size_t bytesToRead, + boost::system::error_code& errorCode) { + +// Check if bytesToRead is zero +if(0 == bytesToRead) { --- End diff -- Does a NULL inBuf have to be handled ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...
Github user bitblender commented on a diff in the pull request: https://github.com/apache/drill/pull/809#discussion_r117412175 --- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp --- @@ -234,6 +233,37 @@ void DrillClientImpl::Close() { } /* + * Write bytesToWrite length data bytes pointed by dataPtr. It handles EINTR error + * occurred during write_some sys call and does a retry on that. + * + * Parameters: + * dataPtr - in param - Pointer to data bytes to write on socket. + * bytesToWrite - in param - Length of data bytes to write from dataPtr. + * errorCode- out param - Error code set by boost. + */ +void DrillClientImpl::doWriteToSocket(const char* dataPtr, size_t bytesToWrite, +boost::system::error_code& errorCode) { +if(0 == bytesToWrite) { --- End diff -- Should you check for a NULL dataPtr ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---