date:20170519

Re: Does s3 plugin support AWS S3 signature version 4 ?

2017-05-19 Thread Anup Tiwari

Any updates on this?
Since we have migrated to Aws Mumbai, we are not able to connect s3 and
Drill.

On 04-Apr-2017 11:02 PM, "Shankar Mane"  wrote:

> Quick question here:
>
> Does s3 plugin support S3 signature version 4  ?
>
> FYI: s3 plugin works in case when region has support for both v2 and v4
> signature. Whereas it seems problematic, for regions (eg. ap-south-1) which
> only has v4 signature version support.
>
> regards,
> shankar
>

[GitHub] drill pull request #838: DRILL-5512: Standardize error handling in ScanBatch

2017-05-19 Thread sudheeshkatkam

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/838#discussion_r117590009
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -173,9 +174,8 @@ public IterOutcome next() {
 
 currentReader.allocate(mutator.fieldVectorMap());
   } catch (OutOfMemoryException e) {
-logger.debug("Caught Out of Memory Exception", e);
 clearFieldVectorMap();
-return IterOutcome.OUT_OF_MEMORY;
+throw UserException.memoryError(e).build(logger);
--- End diff --

I am not sure if this specific line change is required, so please correct 
me if I am wrong. Thinking out loud..

There are three places in ScanBatch where OutOfMemoryException is handled. 
Since OutOfMemoryException is an unchecked exception, I could not quickly find 
all the calls which trigger the exception in this method.

The [first 
case](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java#L175)
 and [second 
case](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java#L215)
 are similar in that `reader.allocate(...)` fails. So although there is no 
unwind logic, seems to me, this case is correctly handled as no records have 
been read, and so there is no need to unwind. Say this triggers spilling in 
sort, then the query could complete successfully, if allocate succeeds next 
time (and so on). Am I following this logic correctly?

But this does not seems to be case, as 
[TestOutOfMemoryOutcome](https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/TestOutOfMemoryOutcome.java#L65)
 triggers an OutOfMemoryException during ["next" 
allocation](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java#L172),
 and all tests are expected to fail.

And then, there is the [third 
case](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java#L247),
 which is a general catch (e.g.`reader.next()` throws OutOfMemoryException). 
And as you mentioned, readers cannot unwind, so that correctly fails the 
fragment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #805: Drill-4139: Exception while trying to prune partiti...

2017-05-19 Thread jinfengni

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/805#discussion_r117582970
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
@@ -508,21 +516,32 @@ public void populatePruningVector(ValueVector v, int 
index, SchemaPath column, S
 NullableVarBinaryVector varBinaryVector = 
(NullableVarBinaryVector) v;
 Object s = partitionValueMap.get(f).get(column);
 byte[] bytes;
-if (s instanceof Binary) {
-  bytes = ((Binary) s).getBytes();
-} else if (s instanceof String) {
-  bytes = ((String) s).getBytes();
-} else if (s instanceof byte[]) {
-  bytes = (byte[]) s;
+if (s == null) {
+  varBinaryVector.getMutator().setNull(index);
+  return;
 } else {
-  throw new UnsupportedOperationException("Unable to create column 
data for type: " + type);
+  bytes = getBytes(type, s);
 }
 varBinaryVector.getMutator().setSafe(index, bytes, 0, 
bytes.length);
 return;
   }
   case DECIMAL18: {
 NullableDecimal18Vector decimalVector = (NullableDecimal18Vector) 
v;
-Long value = (Long) partitionValueMap.get(f).get(column);
+Object s = partitionValueMap.get(f).get(column);
--- End diff --

If the patch also changes DECIMAL18's partition pruning, please modify the 
title of JIRA to reflect such change. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #805: Drill-4139: Exception while trying to prune partiti...

2017-05-19 Thread jinfengni

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/805#discussion_r117583804
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ---
@@ -181,6 +185,71 @@ else if (parquetTableMetadata instanceof 
Metadata.ParquetTableMetadata_v2 &&
   }
 
   /**
+   * Checks that the metadata file was created by drill with version less 
than
+   * the version where was changed the serialization of BINARY values
+   * and assigns byte arrays to min/max values obtained from the 
deserialized string.
+   *
+   * @param parquetTableMetadata table metadata that should be corrected
+   */
+  public static void 
correctBinaryInMetadataCache(Metadata.ParquetTableMetadataBase 
parquetTableMetadata) {
+if (hasOldBinarySerialization(parquetTableMetadata)) {
+  Set names = Sets.newHashSet();
+  if (parquetTableMetadata instanceof 
Metadata.ParquetTableMetadata_v2) {
+for (Metadata.ColumnTypeMetadata_v2 columnTypeMetadata :
+((Metadata.ParquetTableMetadata_v2) 
parquetTableMetadata).columnTypeInfo.values()) {
+  if (columnTypeMetadata.primitiveType == 
PrimitiveTypeName.BINARY) {
+names.add(Arrays.asList(columnTypeMetadata.name));
+  }
+}
+  }
+  for (Metadata.ParquetFileMetadata file : 
parquetTableMetadata.getFiles()) {
+// Drill has only ever written a single row group per file, only 
need to correct the statistics
+// on the first row group
+Metadata.RowGroupMetadata rowGroupMetadata = 
file.getRowGroups().get(0);
--- End diff --

It's true that parquet files created by Drill have single RG per file. But 
the metadata file could be created from parquet files from other source. Such 
assumption may not be true.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #805: Drill-4139: Exception while trying to prune partiti...

2017-05-19 Thread jinfengni

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/805#discussion_r117585451
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java
 ---
@@ -398,10 +399,115 @@ public void testDrill4877() throws Exception {
 
   }
 
+  @Test // DRILL-4139
+  public void testBooleanPartitionPruning() throws Exception {
+final String boolPartitionTable = 
"dfs_test.tmp.`interval_bool_partition`";
+try {
+  test("create table %s partition by (col_bln) as " +
+"select * from cp.`parquet/alltypes_required.parquet`", 
boolPartitionTable);
+  test("refresh table metadata %s", boolPartitionTable);
--- End diff --

We probably want to cover the cases where metadata cache file is not 
created (run the query before call "refresh table metadata").  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #805: Drill-4139: Exception while trying to prune partiti...

2017-05-19 Thread jinfengni

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/805#discussion_r117584586
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ---
@@ -181,6 +185,71 @@ else if (parquetTableMetadata instanceof 
Metadata.ParquetTableMetadata_v2 &&
   }
 
   /**
+   * Checks that the metadata file was created by drill with version less 
than
+   * the version where was changed the serialization of BINARY values
+   * and assigns byte arrays to min/max values obtained from the 
deserialized string.
+   *
+   * @param parquetTableMetadata table metadata that should be corrected
+   */
+  public static void 
correctBinaryInMetadataCache(Metadata.ParquetTableMetadataBase 
parquetTableMetadata) {
+if (hasOldBinarySerialization(parquetTableMetadata)) {
+  Set names = Sets.newHashSet();
+  if (parquetTableMetadata instanceof 
Metadata.ParquetTableMetadata_v2) {
--- End diff --

Any reason you only check v2 here? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #805: Drill-4139: Exception while trying to prune partiti...

2017-05-19 Thread jinfengni

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/805#discussion_r117583114
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
@@ -508,21 +516,32 @@ public void populatePruningVector(ValueVector v, int 
index, SchemaPath column, S
 NullableVarBinaryVector varBinaryVector = 
(NullableVarBinaryVector) v;
 Object s = partitionValueMap.get(f).get(column);
 byte[] bytes;
-if (s instanceof Binary) {
-  bytes = ((Binary) s).getBytes();
-} else if (s instanceof String) {
-  bytes = ((String) s).getBytes();
-} else if (s instanceof byte[]) {
-  bytes = (byte[]) s;
+if (s == null) {
+  varBinaryVector.getMutator().setNull(index);
+  return;
 } else {
-  throw new UnsupportedOperationException("Unable to create column 
data for type: " + type);
+  bytes = getBytes(type, s);
 }
 varBinaryVector.getMutator().setSafe(index, bytes, 0, 
bytes.length);
 return;
   }
   case DECIMAL18: {
 NullableDecimal18Vector decimalVector = (NullableDecimal18Vector) 
v;
-Long value = (Long) partitionValueMap.get(f).get(column);
+Object s = partitionValueMap.get(f).get(column);
+byte[] bytes;
+if (s == null) {
+  decimalVector.getMutator().setNull(index);
+  return;
+} else if (s instanceof Integer) {
+  decimalVector.getMutator().setSafe(index, (Integer) s);
+  return;
+} else if (s instanceof Long) {
+  decimalVector.getMutator().setSafe(index, (Long) s);
+  return;
+} else {
+  bytes = getBytes(type, s);
--- End diff --

For DECIMAL18, under what kind scenarios, would we get a bytes from 
partitionValueMap? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #805: Drill-4139: Exception while trying to prune partiti...

2017-05-19 Thread jinfengni

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/805#discussion_r117584380
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ---
@@ -181,6 +185,71 @@ else if (parquetTableMetadata instanceof 
Metadata.ParquetTableMetadata_v2 &&
   }
 
   /**
+   * Checks that the metadata file was created by drill with version less 
than
+   * the version where was changed the serialization of BINARY values
--- End diff --

Can you explain a bit more about why the serialization of BINARY values are 
wrong in prior version? Is it only happening in the metadata cache file? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #805: Drill-4139: Exception while trying to prune partiti...

2017-05-19 Thread jinfengni

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/805#discussion_r117578486
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
@@ -548,23 +567,57 @@ public void populatePruningVector(ValueVector v, int 
index, SchemaPath column, S
 NullableVarCharVector varCharVector = (NullableVarCharVector) v;
 Object s = partitionValueMap.get(f).get(column);
 byte[] bytes;
-if (s instanceof String) { // if the metadata was read from a JSON 
cache file it maybe a string type
-  bytes = ((String) s).getBytes();
-} else if (s instanceof Binary) {
-  bytes = ((Binary) s).getBytes();
-} else if (s instanceof byte[]) {
-  bytes = (byte[]) s;
+if (s == null) {
+  varCharVector.getMutator().setNull(index);
+  return;
 } else {
-  throw new UnsupportedOperationException("Unable to create column 
data for type: " + type);
+  bytes = getBytes(type, s);
 }
 varCharVector.getMutator().setSafe(index, bytes, 0, bytes.length);
 return;
   }
+  case INTERVAL: {
+NullableIntervalVector intervalVector = (NullableIntervalVector) v;
+Object s = partitionValueMap.get(f).get(column);
+byte[] bytes;
+if (s == null) {
+  intervalVector.getMutator().setNull(index);
+  return;
+} else {
+  bytes = getBytes(type, s);
+}
+intervalVector.getMutator().setSafe(index, 1,
+  ParquetReaderUtility.getIntFromLEBytes(bytes, 0),
+  ParquetReaderUtility.getIntFromLEBytes(bytes, 4),
+  ParquetReaderUtility.getIntFromLEBytes(bytes, 8));
+return;
+  }
   default:
 throw new UnsupportedOperationException("Unsupported type: " + 
type);
 }
   }
 
+  /**
+   * Returns the sequence of bytes received from {@code Object source}.
+   *
+   * @param type   the column type
+   * @param source the source of the bytes sequence
+   * @return bytes sequence obtained from {@code Object source}
+   */
+  private byte[] getBytes(MinorType type, Object source) {
+byte[] bytes;
+if (source instanceof String) { // if the metadata was read from a 
JSON cache file it maybe a string type
+  bytes = Base64.decodeBase64(((String) source).getBytes());
--- End diff --

Any reason you call Base4.decodeBase64, in stead of calling 
String.getBytes()? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: Upgrading Netty

2017-05-19 Thread Kunal Khatua

Could this have been addressed in later versions of Netty?


Currently, initial performance numbers appear to have no impact on performance 
with the upgrade to version 4.0.48.

Scale and concurrency related tests, along with longevity are in progress.


~ Kunal



From: Parth Chandra 
Sent: Friday, May 19, 2017 2:29:50 PM
To: dev@drill.apache.org
Subject: Re: Upgrading Netty

Looks like the specific issue I was referring to was addressed in Netty
4.0.29 [1]. The comment for the commit is a little concerning:



Result:

ThreadPoolCache is now also usable and so gives performance
improvements when allocation and deallocation thread are different.

Performance when using same thread for allocation and deallocation is
noticable worse then before.





We might want to do a performance run to make sure things are no worse than
before.


[1]
https://github.com/netty/netty/commit/f765053ae740e300a6b696840d7dfe5de32afeb3


On Mon, May 15, 2017 at 5:46 PM, Parth Chandra  wrote:

> The per thread allocation cache in Netty causes unbounded memory growth in
> Drill because we pass the ownership of a buffer from one thread to another.
> The version we use has a fix for the Drill use case where Netty will no
> longer add a buffer to its per thread cache if the buffer was allocated by
> a thread which is different from the thread freeing the buffer.
> This fix was reversed in a subsequent release and the latest version has
> the same issue.
> There might have been a fix in Netty for this in some other place which I
> am not aware of (perhaps they removed it altogether as Paul seems to have
> seen).
> AFAIK, we do not have a direct reference to that code in Drill's
> allocator. If you try to upgrade and hit an issue, post it here.
> If you are able to upgrade the Netty version, then run a longevity test to
> make sure there is no 'leaking' of memory from one thread to another.
>
>
>
> On Mon, May 15, 2017 at 4:08 PM, Paul Rogers  wrote:
>
>> As it turns out, Drill makes clever use of the internal details of the
>> Netty memory allocator. But, that code changed significantly in the last
>> couple of years. When I attempted to upgrade, I found that the private
>> features of Netty that the Drill allocator uses no longer exist in the
>> latest Netty.
>>
>> So, someone will need to understand what that part of the Drill allocator
>> does and design an alternative integration.
>>
>> The particular issue seems to be that Netty had a per-thread allocation
>> cache which seems to not exist in the latest version.
>>
>> - Paul
>>
>> > On May 15, 2017, at 3:58 PM, Sudheesh Katkam  wrote:
>> >
>> > Hi all,
>> >
>> > As part of working on DRILL-5431 [1], I found a bug in Netty [2], which
>> is due to be fixed in 4.0.48 [3]. Drill is currently using 4.0.27 [4]. Does
>> anyone foresee issues with upgrading to the latest version of Netty? I
>> noticed Apache Arrow upgraded to 4.0.41 [5].
>> >
>> > Thank you,
>> > Sudheesh
>> >
>> > [1] https://issues.apache.org/jira/browse/DRILL-5431
>> > [2] https://github.com/netty/netty/issues/6709
>> > [3] https://github.com/netty/netty/pull/6713
>> > [4] https://github.com/apache/drill/blob/master/pom.xml#L550
>> > [5] https://github.com/apache/arrow/commit/3487c2f0cdc2297a80ba3
>> 525c192745313b3da48
>>
>>
>

[GitHub] drill pull request #838: DRILL-5512: Standardize error handling in ScanBatch

2017-05-19 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/838#discussion_r117580289
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -173,9 +174,8 @@ public IterOutcome next() {
 
 currentReader.allocate(mutator.fieldVectorMap());
   } catch (OutOfMemoryException e) {
-logger.debug("Caught Out of Memory Exception", e);
 clearFieldVectorMap();
-return IterOutcome.OUT_OF_MEMORY;
+throw UserException.memoryError(e).build(logger);
--- End diff --

Good question. Yes, since the `FragmentExecutor` already handles errors and 
unwinds, we just exploit that (existing, working) path instead of the (also 
existing, but harder-to-keep working) path of `fail`/`STOP`.

The (managed) external sort, for example, reports all its errors via 
exceptions; it does not use `fail`/`STOP`. The fragment executor recovers just 
fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: Upgrading Netty

2017-05-19 Thread Parth Chandra

Looks like the specific issue I was referring to was addressed in Netty
4.0.29 [1]. The comment for the commit is a little concerning:



Result:

ThreadPoolCache is now also usable and so gives performance
improvements when allocation and deallocation thread are different.

Performance when using same thread for allocation and deallocation is
noticable worse then before.





We might want to do a performance run to make sure things are no worse than
before.


[1]
https://github.com/netty/netty/commit/f765053ae740e300a6b696840d7dfe5de32afeb3


On Mon, May 15, 2017 at 5:46 PM, Parth Chandra  wrote:

> The per thread allocation cache in Netty causes unbounded memory growth in
> Drill because we pass the ownership of a buffer from one thread to another.
> The version we use has a fix for the Drill use case where Netty will no
> longer add a buffer to its per thread cache if the buffer was allocated by
> a thread which is different from the thread freeing the buffer.
> This fix was reversed in a subsequent release and the latest version has
> the same issue.
> There might have been a fix in Netty for this in some other place which I
> am not aware of (perhaps they removed it altogether as Paul seems to have
> seen).
> AFAIK, we do not have a direct reference to that code in Drill's
> allocator. If you try to upgrade and hit an issue, post it here.
> If you are able to upgrade the Netty version, then run a longevity test to
> make sure there is no 'leaking' of memory from one thread to another.
>
>
>
> On Mon, May 15, 2017 at 4:08 PM, Paul Rogers  wrote:
>
>> As it turns out, Drill makes clever use of the internal details of the
>> Netty memory allocator. But, that code changed significantly in the last
>> couple of years. When I attempted to upgrade, I found that the private
>> features of Netty that the Drill allocator uses no longer exist in the
>> latest Netty.
>>
>> So, someone will need to understand what that part of the Drill allocator
>> does and design an alternative integration.
>>
>> The particular issue seems to be that Netty had a per-thread allocation
>> cache which seems to not exist in the latest version.
>>
>> - Paul
>>
>> > On May 15, 2017, at 3:58 PM, Sudheesh Katkam  wrote:
>> >
>> > Hi all,
>> >
>> > As part of working on DRILL-5431 [1], I found a bug in Netty [2], which
>> is due to be fixed in 4.0.48 [3]. Drill is currently using 4.0.27 [4]. Does
>> anyone foresee issues with upgrading to the latest version of Netty? I
>> noticed Apache Arrow upgraded to 4.0.41 [5].
>> >
>> > Thank you,
>> > Sudheesh
>> >
>> > [1] https://issues.apache.org/jira/browse/DRILL-5431
>> > [2] https://github.com/netty/netty/issues/6709
>> > [3] https://github.com/netty/netty/pull/6713
>> > [4] https://github.com/apache/drill/blob/master/pom.xml#L550
>> > [5] https://github.com/apache/arrow/commit/3487c2f0cdc2297a80ba3
>> 525c192745313b3da48
>>
>>
>

[GitHub] drill pull request #838: DRILL-5512: Standardize error handling in ScanBatch

2017-05-19 Thread sudheeshkatkam

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/838#discussion_r117576158
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -173,9 +174,8 @@ public IterOutcome next() {
 
 currentReader.allocate(mutator.fieldVectorMap());
   } catch (OutOfMemoryException e) {
-logger.debug("Caught Out of Memory Exception", e);
 clearFieldVectorMap();
-return IterOutcome.OUT_OF_MEMORY;
+throw UserException.memoryError(e).build(logger);
--- End diff --

Makes sense.

Asking the question a different way.. To avoid regressions, should the 
pre-requisite changes 
([DRILL-5211](https://issues.apache.org/jira/browse/DRILL-5211) and pertinent 
tasks) be committed before this patch? Or since _the readers do not correctly 
handle the case_ anyway, there will be no difference?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #838: DRILL-5512: Standardize error handling in ScanBatch

2017-05-19 Thread sudheeshkatkam

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/838#discussion_r117576060
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -213,17 +213,16 @@ public IterOutcome next() {
   try {
 currentReader.allocate(mutator.fieldVectorMap());
   } catch (OutOfMemoryException e) {
-logger.debug("Caught OutOfMemoryException");
 clearFieldVectorMap();
-return IterOutcome.OUT_OF_MEMORY;
+throw UserException.memoryError(e).build(logger);
   }
   addImplicitVectors();
 } catch (ExecutionSetupException e) {
-  this.context.fail(e);
--- End diff --

Sounds good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #818: DRILL-5140: Fix CompileException in run-time genera...

2017-05-19 Thread jinfengni

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/818#discussion_r117573874
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/ClassGenerator.java ---
@@ -77,10 +81,43 @@
   private final CodeGenerator codeGenerator;
 
   public final JDefinedClass clazz;
-  private final LinkedList[] blocks;
+
   private final JCodeModel model;
   private final OptionSet optionManager;
 
+  private ClassGenerator innerClassGenerator;
+  private LinkedList[] blocks;
+  private LinkedList[] oldBlocks;
+
+  /**
+   * Assumed that field has 3 indexes within the constant pull: index of 
the CONSTANT_Fieldref_info +
--- End diff --

I'm not entirely sure the calculation is correct, in terms of # of entries 
per field in constant pool of a class. 

Per JVM spec, each class field has CONSTANT_Fieldref_info (1 entry), which 
has class_index and name_and_type_index. The class_index points 
CONSTANT_Class_info, which is shared by across all the class fields. The second 
points to CONSTANT_NameAndType_info (1 entry), which points to name (1 entry) 
and descriptor (1 entry). Therefore, for each class field, at least 4 entries 
are required in constant pool.  Similarly, we could get 4 entries for each 
method.

Besides fields and methods, we also have to take constant literal into 
account, like int, float , string ... constant. For constant literals, since we 
apply source-code copy for build-in-function /udf, it's hard to figure out 
exactly how many constants are used in the generated class.

Given the above reasons, I'm not sure whether it makes sense to try to come 
up with a formula to estimate the maximum # of fields a  generated class could 
have. If the estimation is not accurate, then what if we just provides a 
ballpark estimation and put some 'magic' number here?   


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (DRILL-5529) Repeated vectors missing "fill empties" logic

2017-05-19 Thread Paul Rogers (JIRA)

Paul Rogers created DRILL-5529:
--

 Summary: Repeated vectors missing "fill empties" logic
 Key: DRILL-5529
 URL: https://issues.apache.org/jira/browse/DRILL-5529
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.8.0
Reporter: Paul Rogers
Assignee: Paul Rogers
 Fix For: 1.11.0


Consider the Drill {{OptionalVarCharVector}} type. This vector is composed of 
three buffers (also called vectors):

* Is-set (bit) vector: contains 1 if the value is set, 0 if it is null.
* Data vector; effectively a byte array in which each value is packed one after 
another.
*  Offset vector, in which the entry for each row points to the first byte of 
the value in the data vector.

Suppose we have the values "foo", null, "bar". Then, the vectors contain:

{code}
Is-Set: [1 0 1]
Offsets: [0 3 3 6]
Data:  [f o o b a r]
{code}

(Yes, there is one more offset entry than rows.)

Suppose that the code creating the vector writes values for rows 1 and 3, but 
omits 2 (it is null, which is the default). How do we get that required value 
of 3 in the entry for row 2? The answer is that the logic for setting a value 
keeps track of the last write position and "backfills" missing offset values:

{code}
public void setSafe(int index, ByteBuffer value, int start, int length) {
  if (index > lastSet + 1) {
fillEmpties(index);
  }
  ...
{code}

So, when we write the value for row 3 ("bar") we back-fill the missing offset 
for row 2. So far so good.

We can now generalize. We must to the same trick any time that we use a vector 
that uses an offset vector. There are three other cases:

* Required variable-width vectors (where a missing value is the same as an 
empty string).
* A repeated fixed-width vector.
* A repeated variable-width vector (which has *two* offset vectors).

The problem is, none of these actually provide the required code. The caller 
must implement its own back-fill logic else the offset vectors become corrupted.

Consider the required {{VarCharVector}}:

{code}
protected void set(int index, byte[] bytes, int start, int length) {
  assert index >= 0;
  final int currentOffset = offsetVector.getAccessor().get(index);
  offsetVector.getMutator().set(index + 1, currentOffset + length);
  data.setBytes(currentOffset, bytes, start, length);
}
{code}

As a result of this omission, any client which skips null values will corrupt 
offset vectors. Consider an example: "try", "foo", "", "bar". We omit writing 
record 2 (empty string). Desired result:

{code}
Data: [t r y f o o b a r]
Offsets: [0 3 6 6 9]
{code}

Actual result:

{code}
Data: [t r y f o o b a r]
Offsets: [0 3 6 0 9]
{code}

The result is that we compute the width of field 2 as -6, not 3. The value of 
the empty field is 9, not 0.

A similar issue arrises with repeated vectors. Consider 
{{RepeatedVarCharVector}}:

{code}
public void addSafe(int index, byte[] bytes, int start, int length) {
  final int nextOffset = offsets.getAccessor().get(index+1);
  values.getMutator().setSafe(nextOffset, bytes, start, length);
  offsets.getMutator().setSafe(index+1, nextOffset+1);
}
{code}

Consider this example: (\["a", "b"], \[ ], \["d", "e"]).

Expected:
{code}
Array Offset: [0 2 2 4]
Value Offset: [0 1 2 3 4]
Data: [a b d e]
{code}

Actual:
{code}
Array Offset: [0 2 0 4]
Value Offset: [0 1 2 3 4]
Data: [a b d e]
{code}

The entry for the (unwritten) position 2 is missing.

This bug may be the root cause of several other issues found recently. 
(Potentially DRILL-5470 -- need to verify.)

Two resolutions are possible:

* Require that client code write all values, backfilling empty or null values 
as needed.
* Generalize the mutators to back-fill in all cases, not just a nullable var 
char.

A related issue occurs when a reader fails to do a "final fill" at the end of a 
batch (DRILL-5487).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: [ANNOUNCE] New Committer: Paul Rogers

2017-05-19 Thread Vitalii Diravka

Congratulations Paul! Really well deserved!

Kind regards
Vitalii

On Fri, May 19, 2017 at 6:31 PM, Parth Chandra  wrote:

> I thinks it's time to put a link to Paul's wiki in the Apache Drill web
> site.
>
> On Fri, May 19, 2017 at 11:16 AM, Sudheesh Katkam 
> wrote:
>
> > Forgot to mention, not many developers know about this:
> > https://github.com/paul-rogers/drill/wiki
> >
> > So thank you Paul, for that informative wiki, and all your contributions.
> >
> > On May 19, 2017, at 10:50 AM, Paul Rogers  > r...@mapr.com>> wrote:
> >
> > Thanks everyone!
> >
> > - Paul
> >
> > On May 19, 2017, at 10:30 AM, Kunal Khatua  > u...@mapr.com>> wrote:
> >
> > Congratulations, Paul !!  Thank you for your contributions!
> >
> > 
> > From: Khurram Faraaz >
> > Sent: Friday, May 19, 2017 10:07:09 AM
> > To: dev
> > Subject: Re: [ANNOUNCE] New Committer: Paul Rogers
> >
> > Congratulations, Paul!
> >
> > 
> > From: Bridget Bevens >
> > Sent: Friday, May 19, 2017 10:29:29 PM
> > To: dev
> > Subject: Re: [ANNOUNCE] New Committer: Paul Rogers
> >
> > Congratulations, Paul!
> >
> > 
> > From: Jinfeng Ni >
> > Sent: Friday, May 19, 2017 9:57:35 AM
> > To: dev
> > Subject: Re: [ANNOUNCE] New Committer: Paul Rogers
> >
> > Congratulations, Paul!
> >
> >
> > On Fri, May 19, 2017 at 9:36 AM, Aman Bawa  > mapr.com>> wrote:
> >
> > Congratulations, Paul!
> >
> > On 5/19/17, 8:22 AM, "Aman Sinha"  mansi...@apache.org>> wrote:
> >
> >   The Project Management Committee (PMC) for Apache Drill has invited
> > Paul
> >   Rogers to become a committer, and we are pleased to announce that he
> > has
> >   accepted.
> >
> >   Paul has a long list of contributions that have touched many aspects
> > of the
> >   product.
> >
> >   Welcome Paul, and thank you for your contributions.  Keep up the good
> > work !
> >
> >   - Aman
> >
> >   (on behalf of the Apache Drill PMC)
> >
> >
> >
> >
> >
> >
>

Re: [ANNOUNCE] New Committer: Paul Rogers

2017-05-19 Thread Parth Chandra

I thinks it's time to put a link to Paul's wiki in the Apache Drill web
site.

On Fri, May 19, 2017 at 11:16 AM, Sudheesh Katkam  wrote:

> Forgot to mention, not many developers know about this:
> https://github.com/paul-rogers/drill/wiki
>
> So thank you Paul, for that informative wiki, and all your contributions.
>
> On May 19, 2017, at 10:50 AM, Paul Rogers  r...@mapr.com>> wrote:
>
> Thanks everyone!
>
> - Paul
>
> On May 19, 2017, at 10:30 AM, Kunal Khatua  u...@mapr.com>> wrote:
>
> Congratulations, Paul !!  Thank you for your contributions!
>
> 
> From: Khurram Faraaz >
> Sent: Friday, May 19, 2017 10:07:09 AM
> To: dev
> Subject: Re: [ANNOUNCE] New Committer: Paul Rogers
>
> Congratulations, Paul!
>
> 
> From: Bridget Bevens >
> Sent: Friday, May 19, 2017 10:29:29 PM
> To: dev
> Subject: Re: [ANNOUNCE] New Committer: Paul Rogers
>
> Congratulations, Paul!
>
> 
> From: Jinfeng Ni >
> Sent: Friday, May 19, 2017 9:57:35 AM
> To: dev
> Subject: Re: [ANNOUNCE] New Committer: Paul Rogers
>
> Congratulations, Paul!
>
>
> On Fri, May 19, 2017 at 9:36 AM, Aman Bawa  mapr.com>> wrote:
>
> Congratulations, Paul!
>
> On 5/19/17, 8:22 AM, "Aman Sinha" > wrote:
>
>   The Project Management Committee (PMC) for Apache Drill has invited
> Paul
>   Rogers to become a committer, and we are pleased to announce that he
> has
>   accepted.
>
>   Paul has a long list of contributions that have touched many aspects
> of the
>   product.
>
>   Welcome Paul, and thank you for your contributions.  Keep up the good
> work !
>
>   - Aman
>
>   (on behalf of the Apache Drill PMC)
>
>
>
>
>
>

Re: [ANNOUNCE] New Committer: Paul Rogers

2017-05-19 Thread Sudheesh Katkam

Forgot to mention, not many developers know about this: 
https://github.com/paul-rogers/drill/wiki

So thank you Paul, for that informative wiki, and all your contributions.

On May 19, 2017, at 10:50 AM, Paul Rogers 
> wrote:

Thanks everyone!

- Paul

On May 19, 2017, at 10:30 AM, Kunal Khatua 
> wrote:

Congratulations, Paul !!  Thank you for your contributions!

From: Khurram Faraaz >
Sent: Friday, May 19, 2017 10:07:09 AM
To: dev
Subject: Re: [ANNOUNCE] New Committer: Paul Rogers

Congratulations, Paul!

From: Bridget Bevens >
Sent: Friday, May 19, 2017 10:29:29 PM
To: dev
Subject: Re: [ANNOUNCE] New Committer: Paul Rogers

Congratulations, Paul!

From: Jinfeng Ni >
Sent: Friday, May 19, 2017 9:57:35 AM
To: dev
Subject: Re: [ANNOUNCE] New Committer: Paul Rogers

Congratulations, Paul!

On Fri, May 19, 2017 at 9:36 AM, Aman Bawa 
> wrote:

Congratulations, Paul!

On 5/19/17, 8:22 AM, "Aman Sinha" 
> wrote:

  The Project Management Committee (PMC) for Apache Drill has invited
Paul
  Rogers to become a committer, and we are pleased to announce that he
has
  accepted.

  Paul has a long list of contributions that have touched many aspects
of the
  product.

  Welcome Paul, and thank you for your contributions.  Keep up the good
work !

  - Aman

  (on behalf of the Apache Drill PMC)

DrillTextRecordReader -- still used?

2017-05-19 Thread Paul Rogers

Hi All,

Drill has two text readers: the (RFC 4180) compliant version, and 
DrillTextRecordReader. It seems that the complaint one is newer and is selected 
by a session option, exec.storage.enable_new_text_reader, which defaults to 
true.

Do we know of any users that set this option to false to use the old version?

I ask because I am retrofitting the “compliant” version to limit vector sizes 
(DRILL-5211). I wonder if, rather than retrofitting the old reader, we can just 
retire it altogether.

Do we have enough confidence in the “compliant” version that we can retire 
DrillTextRecordReader?

Thanks,

- Paul

[GitHub] drill pull request #838: DRILL-5512: Standardize error handling in ScanBatch

2017-05-19 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/838#discussion_r117536637
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -173,9 +174,8 @@ public IterOutcome next() {
 
 currentReader.allocate(mutator.fieldVectorMap());
   } catch (OutOfMemoryException e) {
-logger.debug("Caught Out of Memory Exception", e);
 clearFieldVectorMap();
-return IterOutcome.OUT_OF_MEMORY;
+throw UserException.memoryError(e).build(logger);
--- End diff --

As it turns out, the idea of the OUT_OF_MEMORY return code works better in 
theory than in practice. No reader correctly handles this case. Let's say we 
have three columns (a, b, c). Let say that column c needs to double its vector, 
but hits OOM. No reader has the internal state needed to hold onto the value 
for c, unwind the call stack, then on the next next() call, rewind back to the 
point of writing c into the in-flight row.

Moving forward, we want to take a broader approach to memory: budget 
sufficient memory that readers can work. Modify the mutators so that they 
enforce batch size limits so that the reader operates within its budget.

As we move to that approach, the OUT_OF_MEMORY status will be retired.

The JIRA mentions another JIRA that holds a spec for all this stuff; 
something we discussed six months ago, but did not have time to implement then.

This all merits a complete discussion; maybe we can discuss the overall 
approach in that other JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #838: DRILL-5512: Standardize error handling in ScanBatch

2017-05-19 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/838#discussion_r117535728
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -213,17 +213,16 @@ public IterOutcome next() {
   try {
 currentReader.allocate(mutator.fieldVectorMap());
   } catch (OutOfMemoryException e) {
-logger.debug("Caught OutOfMemoryException");
 clearFieldVectorMap();
-return IterOutcome.OUT_OF_MEMORY;
+throw UserException.memoryError(e).build(logger);
   }
   addImplicitVectors();
 } catch (ExecutionSetupException e) {
-  this.context.fail(e);
--- End diff --

Throwing an exception, it turns out, does exactly the same: it cancels the 
query and causes the fragment executor to cascade close() calls to all the 
operators (record batches) in the fragment tree. It seems some code kills the 
query by throwing an exception, other code calls the fail method and bubbles up 
STOP. But, since the proper way to handle STOP is to unwind the stack, STOP is 
equivalent to throwing an exception.

The idea is, rather than have two ways to clean up, let's standardize on 
one. Since we must handle unchecked exceptions in any case, the exception-based 
solution is the logical choice for standardization.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: [ANNOUNCE] New Committer: Paul Rogers

2017-05-19 Thread Kunal Khatua

Congratulations, Paul !!  Thank you for your contributions!


From: Khurram Faraaz 
Sent: Friday, May 19, 2017 10:07:09 AM
To: dev
Subject: Re: [ANNOUNCE] New Committer: Paul Rogers

Congratulations, Paul!


From: Bridget Bevens 
Sent: Friday, May 19, 2017 10:29:29 PM
To: dev
Subject: Re: [ANNOUNCE] New Committer: Paul Rogers

Congratulations, Paul!


From: Jinfeng Ni 
Sent: Friday, May 19, 2017 9:57:35 AM
To: dev
Subject: Re: [ANNOUNCE] New Committer: Paul Rogers

Congratulations, Paul!


On Fri, May 19, 2017 at 9:36 AM, Aman Bawa  wrote:

> Congratulations, Paul!
>
> On 5/19/17, 8:22 AM, "Aman Sinha"  wrote:
>
> The Project Management Committee (PMC) for Apache Drill has invited
> Paul
> Rogers to become a committer, and we are pleased to announce that he
> has
> accepted.
>
> Paul has a long list of contributions that have touched many aspects
> of the
> product.
>
> Welcome Paul, and thank you for your contributions.  Keep up the good
> work !
>
> - Aman
>
> (on behalf of the Apache Drill PMC)
>
>
>

[GitHub] drill pull request #838: DRILL-5512: Standardize error handling in ScanBatch

2017-05-19 Thread sudheeshkatkam

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/838#discussion_r117532070
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -213,17 +213,16 @@ public IterOutcome next() {
   try {
 currentReader.allocate(mutator.fieldVectorMap());
   } catch (OutOfMemoryException e) {
-logger.debug("Caught OutOfMemoryException");
 clearFieldVectorMap();
-return IterOutcome.OUT_OF_MEMORY;
+throw UserException.memoryError(e).build(logger);
   }
   addImplicitVectors();
 } catch (ExecutionSetupException e) {
-  this.context.fail(e);
--- End diff --

This call triggers query failure (stopping the fragment, notifying the 
Foreman, and cancelling other fragments, etc.). What is the flow after this 
change?

Similar changes below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #838: DRILL-5512: Standardize error handling in ScanBatch

2017-05-19 Thread sudheeshkatkam

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/838#discussion_r117531676
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -173,9 +174,8 @@ public IterOutcome next() {
 
 currentReader.allocate(mutator.fieldVectorMap());
   } catch (OutOfMemoryException e) {
-logger.debug("Caught Out of Memory Exception", e);
 clearFieldVectorMap();
-return IterOutcome.OUT_OF_MEMORY;
+throw UserException.memoryError(e).build(logger);
--- End diff --

The non-managed external sort spills to disk in case it receives this 
outcome. I do not know if there are other operators that handle this outcome. 
Are all the pre-requisite changes (to handle this change) already committed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #842: DRILL-5523: Revert if condition in UnionAllRecordBatch cha...

2017-05-19 Thread amansinha100

Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/842
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #830: DRILL-5498: Improve handling of CSV column headers

2017-05-19 Thread paul-rogers

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/830
  
Commits squashed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: [ANNOUNCE] New Committer: Paul Rogers

2017-05-19 Thread Khurram Faraaz

Congratulations, Paul!


From: Bridget Bevens 
Sent: Friday, May 19, 2017 10:29:29 PM
To: dev
Subject: Re: [ANNOUNCE] New Committer: Paul Rogers

Congratulations, Paul!


From: Jinfeng Ni 
Sent: Friday, May 19, 2017 9:57:35 AM
To: dev
Subject: Re: [ANNOUNCE] New Committer: Paul Rogers

Congratulations, Paul!


On Fri, May 19, 2017 at 9:36 AM, Aman Bawa  wrote:

> Congratulations, Paul!
>
> On 5/19/17, 8:22 AM, "Aman Sinha"  wrote:
>
> The Project Management Committee (PMC) for Apache Drill has invited
> Paul
> Rogers to become a committer, and we are pleased to announce that he
> has
> accepted.
>
> Paul has a long list of contributions that have touched many aspects
> of the
> product.
>
> Welcome Paul, and thank you for your contributions.  Keep up the good
> work !
>
> - Aman
>
> (on behalf of the Apache Drill PMC)
>
>
>

Re: [ANNOUNCE] New Committer: Paul Rogers

2017-05-19 Thread Dechang Gu

Congrats, Paul!!

From: Aman Sinha 
Sent: Friday, May 19, 2017 8:22:46 AM
To: dev@drill.apache.org
Subject: [ANNOUNCE] New Committer: Paul Rogers

The Project Management Committee (PMC) for Apache Drill has invited Paul
Rogers to become a committer, and we are pleased to announce that he has
accepted.

Paul has a long list of contributions that have touched many aspects of the
product.

Welcome Paul, and thank you for your contributions.  Keep up the good work !

- Aman

(on behalf of the Apache Drill PMC)

[GitHub] drill issue #832: DRILL-5504: Vector validator to diagnose offset vector iss...

2017-05-19 Thread paul-rogers

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/832
  
Commits squashed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #830: DRILL-5498: Improve handling of CSV column headers

2017-05-19 Thread sudheeshkatkam

Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/830
  
+1

Please squash the commits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: [ANNOUNCE] New Committer: Paul Rogers

2017-05-19 Thread Bridget Bevens

Congratulations, Paul!


From: Jinfeng Ni 
Sent: Friday, May 19, 2017 9:57:35 AM
To: dev
Subject: Re: [ANNOUNCE] New Committer: Paul Rogers

Congratulations, Paul!


On Fri, May 19, 2017 at 9:36 AM, Aman Bawa  wrote:

> Congratulations, Paul!
>
> On 5/19/17, 8:22 AM, "Aman Sinha"  wrote:
>
> The Project Management Committee (PMC) for Apache Drill has invited
> Paul
> Rogers to become a committer, and we are pleased to announce that he
> has
> accepted.
>
> Paul has a long list of contributions that have touched many aspects
> of the
> product.
>
> Welcome Paul, and thank you for your contributions.  Keep up the good
> work !
>
> - Aman
>
> (on behalf of the Apache Drill PMC)
>
>
>

Re: [ANNOUNCE] New Committer: Paul Rogers

2017-05-19 Thread Jinfeng Ni

Congratulations, Paul!


On Fri, May 19, 2017 at 9:36 AM, Aman Bawa  wrote:

> Congratulations, Paul!
>
> On 5/19/17, 8:22 AM, "Aman Sinha"  wrote:
>
> The Project Management Committee (PMC) for Apache Drill has invited
> Paul
> Rogers to become a committer, and we are pleased to announce that he
> has
> accepted.
>
> Paul has a long list of contributions that have touched many aspects
> of the
> product.
>
> Welcome Paul, and thank you for your contributions.  Keep up the good
> work !
>
> - Aman
>
> (on behalf of the Apache Drill PMC)
>
>
>

Re: [ANNOUNCE] New Committer: Paul Rogers

2017-05-19 Thread Aman Bawa

Congratulations, Paul! 

On 5/19/17, 8:22 AM, "Aman Sinha"  wrote:

The Project Management Committee (PMC) for Apache Drill has invited Paul
Rogers to become a committer, and we are pleased to announce that he has
accepted.

Paul has a long list of contributions that have touched many aspects of the
product.

Welcome Paul, and thank you for your contributions.  Keep up the good work !

- Aman

(on behalf of the Apache Drill PMC)

Re: [ANNOUNCE] New Committer: Paul Rogers

2017-05-19 Thread Parth Chandra

Congratulations Paul and thank you for your contributions to the project.

Parth

On Fri, May 19, 2017 at 9:20 AM, rahul challapalli <
challapallira...@gmail.com> wrote:

> Congratulations Paul. Well Deserved.
>
> On Fri, May 19, 2017 at 8:46 AM, Gautam Parai  wrote:
>
> > Congratulations Paul and thank you for your contributions!
> >
> >
> > Gautam
> >
> > 
> > From: Abhishek Girish 
> > Sent: Friday, May 19, 2017 8:27:05 AM
> > To: dev@drill.apache.org
> > Subject: Re: [ANNOUNCE] New Committer: Paul Rogers
> >
> > Congrats Paul!
> >
> > On Fri, May 19, 2017 at 8:23 AM, Charles Givre  wrote:
> >
> > > Congrats Paul!!
> > >
> > > On Fri, May 19, 2017 at 11:22 AM, Aman Sinha 
> > wrote:
> > >
> > > > The Project Management Committee (PMC) for Apache Drill has invited
> > Paul
> > > > Rogers to become a committer, and we are pleased to announce that he
> > has
> > > > accepted.
> > > >
> > > > Paul has a long list of contributions that have touched many aspects
> of
> > > the
> > > > product.
> > > >
> > > > Welcome Paul, and thank you for your contributions.  Keep up the good
> > > work
> > > > !
> > > >
> > > > - Aman
> > > >
> > > > (on behalf of the Apache Drill PMC)
> > > >
> > >
> >
>

Re: [ANNOUNCE] New Committer: Paul Rogers

2017-05-19 Thread Sudheesh Katkam

Congratulations, Paul!

> On May 19, 2017, at 9:46 AM, Robert Hou  wrote:
> 
> Congrats, Paul!
> 
> 
> 
> From: Chunhui Shi 
> Sent: Friday, May 19, 2017 9:44 AM
> To: dev
> Subject: Re: [ANNOUNCE] New Committer: Paul Rogers
> 
> Congrats Paul! Thank you for your contributions!
> 
> 
> From: rahul challapalli 
> Sent: Friday, May 19, 2017 9:20:52 AM
> To: dev
> Subject: Re: [ANNOUNCE] New Committer: Paul Rogers
> 
> Congratulations Paul. Well Deserved.
> 
> On Fri, May 19, 2017 at 8:46 AM, Gautam Parai  wrote:
> 
>> Congratulations Paul and thank you for your contributions!
>> 
>> 
>> Gautam
>> 
>> 
>> From: Abhishek Girish 
>> Sent: Friday, May 19, 2017 8:27:05 AM
>> To: dev@drill.apache.org
>> Subject: Re: [ANNOUNCE] New Committer: Paul Rogers
>> 
>> Congrats Paul!
>> 
>> On Fri, May 19, 2017 at 8:23 AM, Charles Givre  wrote:
>> 
>>> Congrats Paul!!
>>> 
>>> On Fri, May 19, 2017 at 11:22 AM, Aman Sinha 
>> wrote:
>>> 
 The Project Management Committee (PMC) for Apache Drill has invited
>> Paul
 Rogers to become a committer, and we are pleased to announce that he
>> has
 accepted.
 
 Paul has a long list of contributions that have touched many aspects of
>>> the
 product.
 
 Welcome Paul, and thank you for your contributions.  Keep up the good
>>> work
 !
 
 - Aman
 
 (on behalf of the Apache Drill PMC)
 
>>> 
>>

Re: [ANNOUNCE] New Committer: Paul Rogers

2017-05-19 Thread Robert Hou

Congrats, Paul!



From: Chunhui Shi 
Sent: Friday, May 19, 2017 9:44 AM
To: dev
Subject: Re: [ANNOUNCE] New Committer: Paul Rogers

Congrats Paul! Thank you for your contributions!


From: rahul challapalli 
Sent: Friday, May 19, 2017 9:20:52 AM
To: dev
Subject: Re: [ANNOUNCE] New Committer: Paul Rogers

Congratulations Paul. Well Deserved.

On Fri, May 19, 2017 at 8:46 AM, Gautam Parai  wrote:

> Congratulations Paul and thank you for your contributions!
>
>
> Gautam
>
> 
> From: Abhishek Girish 
> Sent: Friday, May 19, 2017 8:27:05 AM
> To: dev@drill.apache.org
> Subject: Re: [ANNOUNCE] New Committer: Paul Rogers
>
> Congrats Paul!
>
> On Fri, May 19, 2017 at 8:23 AM, Charles Givre  wrote:
>
> > Congrats Paul!!
> >
> > On Fri, May 19, 2017 at 11:22 AM, Aman Sinha 
> wrote:
> >
> > > The Project Management Committee (PMC) for Apache Drill has invited
> Paul
> > > Rogers to become a committer, and we are pleased to announce that he
> has
> > > accepted.
> > >
> > > Paul has a long list of contributions that have touched many aspects of
> > the
> > > product.
> > >
> > > Welcome Paul, and thank you for your contributions.  Keep up the good
> > work
> > > !
> > >
> > > - Aman
> > >
> > > (on behalf of the Apache Drill PMC)
> > >
> >
>

Re: [ANNOUNCE] New Committer: Paul Rogers

2017-05-19 Thread Chunhui Shi

Congrats Paul! Thank you for your contributions!


From: rahul challapalli 
Sent: Friday, May 19, 2017 9:20:52 AM
To: dev
Subject: Re: [ANNOUNCE] New Committer: Paul Rogers

Congratulations Paul. Well Deserved.

On Fri, May 19, 2017 at 8:46 AM, Gautam Parai  wrote:

> Congratulations Paul and thank you for your contributions!
>
>
> Gautam
>
> 
> From: Abhishek Girish 
> Sent: Friday, May 19, 2017 8:27:05 AM
> To: dev@drill.apache.org
> Subject: Re: [ANNOUNCE] New Committer: Paul Rogers
>
> Congrats Paul!
>
> On Fri, May 19, 2017 at 8:23 AM, Charles Givre  wrote:
>
> > Congrats Paul!!
> >
> > On Fri, May 19, 2017 at 11:22 AM, Aman Sinha 
> wrote:
> >
> > > The Project Management Committee (PMC) for Apache Drill has invited
> Paul
> > > Rogers to become a committer, and we are pleased to announce that he
> has
> > > accepted.
> > >
> > > Paul has a long list of contributions that have touched many aspects of
> > the
> > > product.
> > >
> > > Welcome Paul, and thank you for your contributions.  Keep up the good
> > work
> > > !
> > >
> > > - Aman
> > >
> > > (on behalf of the Apache Drill PMC)
> > >
> >
>

[jira] [Created] (DRILL-5528) Sorting 19GB data with 14GB memory in a single fragment takes ~150 minutes

2017-05-19 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5528:


 Summary: Sorting 19GB data with 14GB memory in a single fragment 
takes ~150 minutes
 Key: DRILL-5528
 URL: https://issues.apache.org/jira/browse/DRILL-5528
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.10.0
Reporter: Rahul Challapalli
Assignee: Paul Rogers


Configuration :
{code}
git.commit.id.abbrev=1e0a14c
DRILL_MAX_DIRECT_MEMORY="32G"
DRILL_MAX_HEAP="4G"
{code}

Based on the runtime of the below query, I suspect there is a performance 
bottleneck somewhere
{code}
[root@qa-node190 external-sort]# /opt/drill/bin/sqlline -u 
jdbc:drill:zk=10.10.100.190:5181
apache drill 1.11.0-SNAPSHOT
"start your sql engine"
0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET 
`exec.sort.disable_managed` = false;
+---+-+
|  ok   |   summary   |
+---+-+
| true  | exec.sort.disable_managed updated.  |
+---+-+
1 row selected (0.975 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
`planner.width.max_per_node` = 1;
+---+--+
|  ok   |   summary|
+---+--+
| true  | planner.width.max_per_node updated.  |
+---+--+
1 row selected (0.371 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
`planner.disable_exchanges` = true;
+---+-+
|  ok   |   summary   |
+---+-+
| true  | planner.disable_exchanges updated.  |
+---+-+
1 row selected (0.292 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
`planner.memory.max_query_memory_per_node` = 14106127360;
+---++
|  ok   |  summary   |
+---++
| true  | planner.memory.max_query_memory_per_node updated.  |
+---++
1 row selected (0.316 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select * from 
dfs.`/drill/testdata/resource-manager/250wide.tbl` order by columns[0])d where 
d.columns[0] = 'ljdfhwuehnoiueyf';
+-+
| EXPR$0  |
+-+
| 0   |
+-+
1 row selected (8530.719 seconds)
{code}

I attached the logs and profile files. The data is too large to attach to a 
jira. Reach out to me if you need any more information



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: [ANNOUNCE] New Committer: Paul Rogers

2017-05-19 Thread rahul challapalli

Congratulations Paul. Well Deserved.

On Fri, May 19, 2017 at 8:46 AM, Gautam Parai  wrote:

> Congratulations Paul and thank you for your contributions!
>
>
> Gautam
>
> 
> From: Abhishek Girish 
> Sent: Friday, May 19, 2017 8:27:05 AM
> To: dev@drill.apache.org
> Subject: Re: [ANNOUNCE] New Committer: Paul Rogers
>
> Congrats Paul!
>
> On Fri, May 19, 2017 at 8:23 AM, Charles Givre  wrote:
>
> > Congrats Paul!!
> >
> > On Fri, May 19, 2017 at 11:22 AM, Aman Sinha 
> wrote:
> >
> > > The Project Management Committee (PMC) for Apache Drill has invited
> Paul
> > > Rogers to become a committer, and we are pleased to announce that he
> has
> > > accepted.
> > >
> > > Paul has a long list of contributions that have touched many aspects of
> > the
> > > product.
> > >
> > > Welcome Paul, and thank you for your contributions.  Keep up the good
> > work
> > > !
> > >
> > > - Aman
> > >
> > > (on behalf of the Apache Drill PMC)
> > >
> >
>

[GitHub] drill issue #832: DRILL-5504: Vector validator to diagnose offset vector iss...

2017-05-19 Thread sudheeshkatkam

Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/832
  
+1

Please squash the commits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: [ANNOUNCE] New Committer: Paul Rogers

2017-05-19 Thread Gautam Parai

Congratulations Paul and thank you for your contributions!


Gautam


From: Abhishek Girish 
Sent: Friday, May 19, 2017 8:27:05 AM
To: dev@drill.apache.org
Subject: Re: [ANNOUNCE] New Committer: Paul Rogers

Congrats Paul!

On Fri, May 19, 2017 at 8:23 AM, Charles Givre  wrote:

> Congrats Paul!!
>
> On Fri, May 19, 2017 at 11:22 AM, Aman Sinha  wrote:
>
> > The Project Management Committee (PMC) for Apache Drill has invited Paul
> > Rogers to become a committer, and we are pleased to announce that he has
> > accepted.
> >
> > Paul has a long list of contributions that have touched many aspects of
> the
> > product.
> >
> > Welcome Paul, and thank you for your contributions.  Keep up the good
> work
> > !
> >
> > - Aman
> >
> > (on behalf of the Apache Drill PMC)
> >
>

Re: [ANNOUNCE] New Committer: Paul Rogers

2017-05-19 Thread Abhishek Girish

Congrats Paul!

On Fri, May 19, 2017 at 8:23 AM, Charles Givre  wrote:

> Congrats Paul!!
>
> On Fri, May 19, 2017 at 11:22 AM, Aman Sinha  wrote:
>
> > The Project Management Committee (PMC) for Apache Drill has invited Paul
> > Rogers to become a committer, and we are pleased to announce that he has
> > accepted.
> >
> > Paul has a long list of contributions that have touched many aspects of
> the
> > product.
> >
> > Welcome Paul, and thank you for your contributions.  Keep up the good
> work
> > !
> >
> > - Aman
> >
> > (on behalf of the Apache Drill PMC)
> >
>

Re: [ANNOUNCE] New Committer: Paul Rogers

2017-05-19 Thread Arina Yelchiyeva

Congratulations Paul! Well deserved!

On Fri, May 19, 2017 at 6:23 PM, Charles Givre  wrote:

> Congrats Paul!!
>
> On Fri, May 19, 2017 at 11:22 AM, Aman Sinha  wrote:
>
> > The Project Management Committee (PMC) for Apache Drill has invited Paul
> > Rogers to become a committer, and we are pleased to announce that he has
> > accepted.
> >
> > Paul has a long list of contributions that have touched many aspects of
> the
> > product.
> >
> > Welcome Paul, and thank you for your contributions.  Keep up the good
> work
> > !
> >
> > - Aman
> >
> > (on behalf of the Apache Drill PMC)
> >
>

[jira] [Created] (DRILL-5527) Support for querying slowly changing dimensions of HBase/MapR-DB tables on TIMESTAMP/TIMERANGE/VERSION

2017-05-19 Thread Alan Fischer e Silva (JIRA)

Alan Fischer e Silva created DRILL-5527:
---

 Summary: Support for querying slowly changing dimensions of 
HBase/MapR-DB tables on TIMESTAMP/TIMERANGE/VERSION
 Key: DRILL-5527
 URL: https://issues.apache.org/jira/browse/DRILL-5527
 Project: Apache Drill
  Issue Type: New Feature
  Components: Storage - HBase
Affects Versions: 1.10.0
Reporter: Alan Fischer e Silva


HBase and MapR-DB support versioning of cell values via timestamp, but today a 
Drill query only returns the most recent version of a cell.

Being able to query an HBase/MapR-DB cell on it's version, timestamp or 
timerange would be a major improvement to the HBase storage plugin in order to 
support slowly changing dimensions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: [ANNOUNCE] New Committer: Paul Rogers

2017-05-19 Thread Charles Givre

Congrats Paul!!

On Fri, May 19, 2017 at 11:22 AM, Aman Sinha  wrote:

> The Project Management Committee (PMC) for Apache Drill has invited Paul
> Rogers to become a committer, and we are pleased to announce that he has
> accepted.
>
> Paul has a long list of contributions that have touched many aspects of the
> product.
>
> Welcome Paul, and thank you for your contributions.  Keep up the good work
> !
>
> - Aman
>
> (on behalf of the Apache Drill PMC)
>

[ANNOUNCE] New Committer: Paul Rogers

2017-05-19 Thread Aman Sinha

The Project Management Committee (PMC) for Apache Drill has invited Paul
Rogers to become a committer, and we are pleased to announce that he has
accepted.

Paul has a long list of contributions that have touched many aspects of the
product.

Welcome Paul, and thank you for your contributions.  Keep up the good work !

- Aman

(on behalf of the Apache Drill PMC)

[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...

2017-05-19 Thread sohami

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/809#discussion_r117423249
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -234,6 +233,37 @@ void DrillClientImpl::Close() {
 }
 
 /*
+ * Write bytesToWrite length data bytes pointed by dataPtr. It handles 
EINTR error
+ * occurred during write_some sys call and does a retry on that.
+ *
+ * Parameters:
+ *  dataPtr  - in param   - Pointer to data bytes to write on 
socket.
+ *  bytesToWrite - in param   - Length of data bytes to write from 
dataPtr.
+ *  errorCode-  out param - Error code set by boost.
+ */
+void DrillClientImpl::doWriteToSocket(const char* dataPtr, size_t 
bytesToWrite,
+boost::system::error_code& 
errorCode) {
+if(0 == bytesToWrite) {
--- End diff --

Not really. Since write_some will set the proper error code in that case. 
The handing for `bytesToWrite == 0` was done since that's a success case and 
didn't want to call write_some on it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...

2017-05-19 Thread sohami

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/809#discussion_r117423292
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -364,7 +395,41 @@ connectionStatus_t DrillClientImpl::recvHandshake(){
 return CONN_SUCCESS;
 }
 
-void DrillClientImpl::handleHandshake(ByteBuf_t _buf,
+/*
+ * Read bytesToRead length data bytes from socket into inBuf. It handles 
EINTR error
+ * occurred during read_some sys call and does a retry on that.
+ *
+ * Parameters:
+ *  inBuf- out param  - Pointer to buffer to read data into 
from socket.
+ *  bytesToRead  - in param   - Length of data bytes to read from 
socket.
+ *  errorCode- out param  - Error code set by boost.
+ */
+void DrillClientImpl::doReadFromSocket(ByteBuf_t inBuf, size_t bytesToRead,
+   boost::system::error_code& 
errorCode) {
+
+// Check if bytesToRead is zero
+if(0 == bytesToRead) {
--- End diff --

Same as above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...

2017-05-19 Thread bitblender

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/809#discussion_r117412544
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -364,7 +395,41 @@ connectionStatus_t DrillClientImpl::recvHandshake(){
 return CONN_SUCCESS;
 }
 
-void DrillClientImpl::handleHandshake(ByteBuf_t _buf,
+/*
+ * Read bytesToRead length data bytes from socket into inBuf. It handles 
EINTR error
+ * occurred during read_some sys call and does a retry on that.
+ *
+ * Parameters:
+ *  inBuf- out param  - Pointer to buffer to read data into 
from socket.
+ *  bytesToRead  - in param   - Length of data bytes to read from 
socket.
+ *  errorCode- out param  - Error code set by boost.
+ */
+void DrillClientImpl::doReadFromSocket(ByteBuf_t inBuf, size_t bytesToRead,
+   boost::system::error_code& 
errorCode) {
+
+// Check if bytesToRead is zero
+if(0 == bytesToRead) {
--- End diff --

Does a NULL inBuf have to be handled ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #809: Drill-4335: C++ client changes for supporting encry...

2017-05-19 Thread bitblender

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/809#discussion_r117412175
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -234,6 +233,37 @@ void DrillClientImpl::Close() {
 }
 
 /*
+ * Write bytesToWrite length data bytes pointed by dataPtr. It handles 
EINTR error
+ * occurred during write_some sys call and does a retry on that.
+ *
+ * Parameters:
+ *  dataPtr  - in param   - Pointer to data bytes to write on 
socket.
+ *  bytesToWrite - in param   - Length of data bytes to write from 
dataPtr.
+ *  errorCode-  out param - Error code set by boost.
+ */
+void DrillClientImpl::doWriteToSocket(const char* dataPtr, size_t 
bytesToWrite,
+boost::system::error_code& 
errorCode) {
+if(0 == bytesToWrite) {
--- End diff --

Should you check for a NULL dataPtr ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

51 matches

Mail list logo