[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/518
  
Ran some tests. The results look good. In particular, files with nested 
structures produced the correct results. Since it was the nested structure case 
that had me a bit worried, looks like the code is good to go.

+1 (non-binding)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #509: DRILL-4618: Fix hive function loader not correctly take ra...

2016-10-06 Thread Ben-Zvi
Github user Ben-Zvi commented on the issue:

https://github.com/apache/drill/pull/509
  
Tested and checked ... LGTM



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #604: DRILL-4862: binary_string should use another buffer as out...

2016-10-06 Thread gparai
Github user gparai commented on the issue:

https://github.com/apache/drill/pull/604
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #604: DRILL-4862: binary_string should use another buffer...

2016-10-06 Thread chunhui-shi
Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/604#discussion_r82315506
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java
 ---
@@ -1540,15 +1540,16 @@ public void eval() {
   public static class BinaryString implements DrillSimpleFunc {
 @Param  VarCharHolder in;
 @Output VarBinaryHolder out;
+@Inject DrillBuf buffer;
 
 @Override
 public void setup() {}
 
 @Override
 public void eval() {
-  out.buffer = in.buffer;
-  out.start = in.start;
-  out.end = 
org.apache.drill.common.util.DrillStringUtils.parseBinaryString(in.buffer, 
in.start, in.end);
+  out.buffer = buffer.reallocIfNeeded(in.end - in.start);
+  out.start = out.end = 0;
+  out.end = 
org.apache.drill.common.util.DrillStringUtils.parseBinaryString(in.buffer, 
in.start, in.end, out.buffer);
   out.buffer.setIndex(out.start, out.end);
--- End diff --

Yes, we need to set readerIndex and writerIndex. Otherwise if there is 
another function consume the output of this function, it will hit error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #600: DRILL-4373: Drill and Hive have incompatible timest...

2016-10-06 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/600#discussion_r82314071
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ---
@@ -45,4 +53,34 @@ public static int getIntFromLEBytes(byte[] input, int 
start) {
 }
 return out;
   }
+
+  /**
+   * Utilities for converting from parquet INT96 binary (impala, hive 
timestamp)
+   * to date time value. This utilizes the Joda library.
+   */
+  public static class NanoTimeUtils {
+
+public static final long NANOS_PER_DAY = TimeUnit.DAYS.toNanos(1);
+public static final long NANOS_PER_HOUR = TimeUnit.HOURS.toNanos(1);
+public static final long NANOS_PER_MINUTE = 
TimeUnit.MINUTES.toNanos(1);
+public static final long NANOS_PER_SECOND = 
TimeUnit.SECONDS.toNanos(1);
+public static final long NANOS_PER_MILLISECOND =  
TimeUnit.MILLISECONDS.toNanos(1);
+
+  /**
+   * @param binaryTimeStampValue
+   *  hive, impala timestamp values with nanoseconds precision
+   *  are stored in parquet Binary as INT96
+   *
+   * @return  the number of milliseconds since January 1, 1970, 00:00:00 
GMT
+   *  represented by @param binaryTimeStampValue .
+   */
+public static long getDateTimeValueFromBinary(Binary 
binaryTimeStampValue) {
+  NanoTime nt = NanoTime.fromBinary(binaryTimeStampValue);
+  int julianDay = nt.getJulianDay();
+  long nanosOfDay = nt.getTimeOfDayNanos();
+  return DateTimeUtils.fromJulianDay(julianDay-0.5d) + 
nanosOfDay/NANOS_PER_MILLISECOND;
--- End diff --

1.  I would recommend not using Joda. Do the calculations directly, like in 
ConvertFromImpalaTimestamp. Joda uses non-standard, hence  confusing, 
terminology. What Joda calls and uses as JulianDay, is actually Julian Date. 
Seems like you have identified this discrepancy and adjusted for it by 
subtracting 0.5 from _julianDay_. 

Note: (I guess you have already figured this out) : The actual code and 
the Joda code in the comment, in ConvertFromImpalaTimestamp, are inconsistent. 
Took me a day to figure out the reason behind this ! A bug should be opened to 
delete the comment. 

2. Can you please also leave a comment stating that 2440588 is the JDN for 
the Unix Epoch.

3. Please leave a comment stating that the order of the calls to get 
_julianDay_ and _nanosOfDay_ matters. You can do this by just stating how 
timestamps are stored in INT96 i.e 32-bit JDN followed by 64-bit nanosOfDay.

4. Consistent(single or none) spacing for binary operators (+-/) used here 
would be nice. Single spacing would be preferable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #496: Update for MapR profile

2016-10-06 Thread pwong-mapr
Github user pwong-mapr closed the pull request at:

https://github.com/apache/drill/pull/496


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #496: Update for MapR profile

2016-10-06 Thread pwong-mapr
Github user pwong-mapr commented on the issue:

https://github.com/apache/drill/pull/496
  
Not sure why this PR is still open - go ahead and delete it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #487: remove useless text

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/487
  
+1 (non-binding)
A very simple change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #478: DRILL-4581: Minor fixes to drill startup scripts for linux

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/478
  
Replaced by later commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #478: DRILL-4581: Minor fixes to drill startup scripts fo...

2016-10-06 Thread paul-rogers
Github user paul-rogers closed the pull request at:

https://github.com/apache/drill/pull/478


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #573: Drill 4858

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/573#discussion_r82305362
  
--- Diff: exec/java-exec/src/main/codegen/templates/TypeHelper.java ---
@@ -68,8 +68,23 @@ public static JType getHolderType(JCodeModel model, 
MinorType type, DataMode mod
 case UNION:
   return model._ref(UnionHolder.class);
 case MAP:
+switch (mode) {
+  case REQUIRED:
--- End diff --

Perhaps combine the case statements to avoid repeated code:

case REQUIRED:
case OPTIONAL:
return model...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #573: Drill 4858

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/573
  
Unit tests? If this repeated maps have never been enabled, how do we know 
that the code actually works for this case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #593: DRILL-3178 csv reader should allow newlines inside ...

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/593#discussion_r82303401
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/TextReader.java
 ---
@@ -231,33 +231,34 @@ private void parseQuotedValue(byte prev) throws 
IOException {
 final TextInput input = this.input;
 final byte quote = this.quote;
 
-ch = input.nextChar();
+try {
+  input.setMonitorForNewLine(false);
--- End diff --

Seems an overly complex way to do the parsing. Is there any reason we want 
to capture the original newline character rather than the normalized one?

If we need to capture the original one, then a cleaner way to do that is to 
keep track of the start & end position of the current token (character), and 
provide a method to return that block as a string. Then, scan for a close 
quote, reading characters & special-casing any newlines.

If we want to include newlines in quoted strings sometimes, but not other 
times, then the check logic can be a bit more complex.

But, the proposed solution of making newlines not be newlines seems a bit 
odd...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #593: DRILL-3178 csv reader should allow newlines inside ...

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/593#discussion_r82303872
  
--- Diff: exec/java-exec/src/test/resources/store/text/WithQuotedCrLf.tbl 
---
@@ -0,0 +1,6 @@
+"a
+1"|a|a
+a|"a
+2"|a
+a|a|"a
+3"
--- End diff --

Is there an issue with git converting Windows-style newlines (\r\n) into 
Unix-style (\n) when this file is checked in & out? Will that mess up the test? 
Should the test generate this file to handle this particular special case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #593: DRILL-3178 csv reader should allow newlines inside ...

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/593#discussion_r82304834
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/TextReader.java
 ---
@@ -231,33 +231,34 @@ private void parseQuotedValue(byte prev) throws 
IOException {
 final TextInput input = this.input;
 final byte quote = this.quote;
 
-ch = input.nextChar();
+try {
+  input.setMonitorForNewLine(false);
+  ch = input.nextChar();
 
-while (!(prev == quote && (ch == delimiter || ch == newLine || 
isWhite(ch {
-  if (ch != quote) {
-if (prev == quote) { // unescaped quote detected
-  if (parseUnescapedQuotes) {
-output.append(quote);
-output.append(ch);
-parseQuotedValue(ch);
-break;
-  } else {
-throw new TextParsingException(
-context,
-"Unescaped quote character '"
-+ quote
-+ "' inside quoted value of CSV field. To allow 
unescaped quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser 
settings. Cannot parse CSV input.");
+  while (!(prev == quote && (ch == delimiter || ch == newLine || 
isWhite(ch {
+if (ch != quote) {
+  if (prev == quote) { // unescaped quote detected
+if (parseUnescapedQuotes) {
+  output.append(quote);
+  output.append(ch);
+  parseQuotedValue(ch);
+  break;
+} else {
+  throw new TextParsingException(context, "Unescaped quote 
character '" + quote + "' inside quoted value of CSV field. To allow unescaped 
quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser settings. Cannot 
parse CSV input.");
+}
   }
+  output.append(ch);
+  prev = ch;
+} else if (prev == quoteEscape) {
+  output.append(quote);
+  prev = NULL_BYTE;
+} else {
+  prev = ch;
 }
-output.append(ch);
-prev = ch;
-  } else if (prev == quoteEscape) {
-output.append(quote);
-prev = NULL_BYTE;
-  } else {
-prev = ch;
+ch = input.nextChar();
   }
-  ch = input.nextChar();
+} finally {
--- End diff --

I see why it is done in finally. However, as noted above, I'm not sure that 
pushing this kind of flag into the getChar function is the optimal approach...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #593: DRILL-3178 csv reader should allow newlines inside ...

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/593#discussion_r82296690
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/TextInput.java
 ---
@@ -88,6 +88,11 @@
   private boolean endFound = false;
 
   /**
+   * Switch for enabling/disabling new line detection
--- End diff --

Explain a bit more? Presumably, we already "monitor" and "detect" new lines 
in some way. What, specifically does this add? Presumably, it sets the mode to 
enable new line detection within quotes (the title of the Jira entry)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #595: DRILL-4203: Parquet File. Date is stored wrongly

2016-10-06 Thread vdiravka
Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/595
  
Thanks @jaltekruse.
@parthchandra Could you please take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #565: DRILL-4841: Use server event loop for web clients

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82266459
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -688,4 +802,142 @@ public DrillBuf getBuffer() {
   return null;
 }
   }
+
+  /**
+   * Return a new {@link DrillClient.Builder Drill client builder}.
+   * @return a new builder
+   */
+  public static Builder newBuilder() {
+return new Builder();
+  }
+
+  /**
+   * Helper class to construct a {@link DrillClient Drill client}.
+   */
+  public static class Builder {
+
+private DrillConfig config;
+private BufferAllocator allocator;
+private ClusterCoordinator clusterCoordinator;
+private EventLoopGroup eventLoopGroup;
+private ExecutorService executor;
+
+// defaults
+private boolean supportComplexTypes = true;
+private boolean isDirectConnection = false;
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client.
+ *
+ * @param drillConfig drill configuration
+ * @return this builder
+ */
+public Builder setConfig(DrillConfig drillConfig) {
+  this.config = drillConfig;
+  return this;
+}
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client based on 
the given file.
--- End diff --

Better to set this based on a File (or, for the trendy, Path) object, since 
that is the Java standard for local files.

Does this replace the default class-path config? Or, is this added to the 
defaults?

Can I use this with the above? Or, can I set either the config directory OR 
via a file? If one or the other, should we check that case and throw an 
IllegalStateException (unchecked) or the like?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #565: DRILL-4841: Use server event loop for web clients

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82267594
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -688,4 +802,142 @@ public DrillBuf getBuffer() {
   return null;
 }
   }
+
+  /**
+   * Return a new {@link DrillClient.Builder Drill client builder}.
+   * @return a new builder
+   */
+  public static Builder newBuilder() {
+return new Builder();
+  }
+
+  /**
+   * Helper class to construct a {@link DrillClient Drill client}.
+   */
+  public static class Builder {
+
+private DrillConfig config;
+private BufferAllocator allocator;
+private ClusterCoordinator clusterCoordinator;
+private EventLoopGroup eventLoopGroup;
+private ExecutorService executor;
+
+// defaults
+private boolean supportComplexTypes = true;
+private boolean isDirectConnection = false;
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client.
+ *
+ * @param drillConfig drill configuration
+ * @return this builder
+ */
+public Builder setConfig(DrillConfig drillConfig) {
+  this.config = drillConfig;
+  return this;
+}
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client based on 
the given file.
+ *
+ * @param fileName configuration file name
+ * @return this builder
+ */
+public Builder setConfigFromFile(final String fileName) {
+  this.config = DrillConfig.create(fileName);
+  return this;
+}
+
+/**
+ * Sets the {@link BufferAllocator buffer allocator} to be used by 
this client.
+ * If this is not set, an allocator will be created based on the 
configuration.
+ *
+ * If this is set, the caller is responsible for closing the given 
allocator.
+ *
+ * @param allocator buffer allocator
+ * @return this builder
+ */
+public Builder setAllocator(final BufferAllocator allocator) {
+  this.allocator = allocator;
+  return this;
+}
+
+/**
+ * Sets the {@link ClusterCoordinator cluster coordinator} that this 
client
+ * registers with. If this is not set and the this client does not use 
a
+ * {@link #setDirectConnection direct connection}, a cluster 
coordinator will
+ * be created based on the configuration.
+ *
+ * If this is set, the caller is responsible for closing the given 
coordinator.
+ *
+ * @param clusterCoordinator cluster coordinator
+ * @return this builder
+ */
+public Builder setClusterCoordinator(final ClusterCoordinator 
clusterCoordinator) {
+  this.clusterCoordinator = clusterCoordinator;
+  return this;
+}
+
+/**
+ * Sets the event loop group that to be used by the client. If this is 
not set,
+ * an event loop group will be created based on the configuration.
+ *
+ * If this is set, the caller is responsible for closing the given 
event loop group.
+ *
+ * @param eventLoopGroup event loop group
+ * @return this builder
+ */
+public Builder setEventLoopGroup(final EventLoopGroup eventLoopGroup) {
+  this.eventLoopGroup = eventLoopGroup;
+  return this;
+}
+
+/**
+ * Sets the executor service to be used by the client. If this is not 
set,
+ * an executor will be created based on the configuration.
+ *
+ * If this is set, the caller is responsible for closing the given 
executor.
+ *
+ * @param executor executor service
+ * @return this builder
+ */
+public Builder setExecutorService(final ExecutorService executor) {
+  this.executor = executor;
+  return this;
+}
+
+/**
+ * Sets whether the application is willing to accept complex types 
(Map, Arrays)
--- End diff --

Default value?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #565: DRILL-4841: Use server event loop for web clients

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82267107
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -688,4 +802,142 @@ public DrillBuf getBuffer() {
   return null;
 }
   }
+
+  /**
+   * Return a new {@link DrillClient.Builder Drill client builder}.
+   * @return a new builder
+   */
+  public static Builder newBuilder() {
+return new Builder();
+  }
+
+  /**
+   * Helper class to construct a {@link DrillClient Drill client}.
+   */
+  public static class Builder {
+
+private DrillConfig config;
+private BufferAllocator allocator;
+private ClusterCoordinator clusterCoordinator;
+private EventLoopGroup eventLoopGroup;
+private ExecutorService executor;
+
+// defaults
+private boolean supportComplexTypes = true;
+private boolean isDirectConnection = false;
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client.
+ *
+ * @param drillConfig drill configuration
+ * @return this builder
+ */
+public Builder setConfig(DrillConfig drillConfig) {
+  this.config = drillConfig;
+  return this;
+}
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client based on 
the given file.
+ *
+ * @param fileName configuration file name
+ * @return this builder
+ */
+public Builder setConfigFromFile(final String fileName) {
+  this.config = DrillConfig.create(fileName);
+  return this;
+}
+
+/**
+ * Sets the {@link BufferAllocator buffer allocator} to be used by 
this client.
+ * If this is not set, an allocator will be created based on the 
configuration.
+ *
+ * If this is set, the caller is responsible for closing the given 
allocator.
+ *
+ * @param allocator buffer allocator
+ * @return this builder
+ */
+public Builder setAllocator(final BufferAllocator allocator) {
+  this.allocator = allocator;
+  return this;
+}
+
+/**
+ * Sets the {@link ClusterCoordinator cluster coordinator} that this 
client
--- End diff --

When would I use this rather than letting Drill create it from the config? 
Would I do this if I have multiple Drill clients in a single app? Or, will the 
underlying code know to share the same coordinator?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #565: DRILL-4841: Use server event loop for web clients

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82266904
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -688,4 +802,142 @@ public DrillBuf getBuffer() {
   return null;
 }
   }
+
+  /**
+   * Return a new {@link DrillClient.Builder Drill client builder}.
+   * @return a new builder
+   */
+  public static Builder newBuilder() {
+return new Builder();
+  }
+
+  /**
+   * Helper class to construct a {@link DrillClient Drill client}.
+   */
+  public static class Builder {
+
+private DrillConfig config;
+private BufferAllocator allocator;
+private ClusterCoordinator clusterCoordinator;
+private EventLoopGroup eventLoopGroup;
+private ExecutorService executor;
+
+// defaults
+private boolean supportComplexTypes = true;
+private boolean isDirectConnection = false;
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client.
+ *
+ * @param drillConfig drill configuration
+ * @return this builder
+ */
+public Builder setConfig(DrillConfig drillConfig) {
+  this.config = drillConfig;
+  return this;
+}
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client based on 
the given file.
+ *
+ * @param fileName configuration file name
+ * @return this builder
+ */
+public Builder setConfigFromFile(final String fileName) {
+  this.config = DrillConfig.create(fileName);
+  return this;
+}
+
+/**
+ * Sets the {@link BufferAllocator buffer allocator} to be used by 
this client.
+ * If this is not set, an allocator will be created based on the 
configuration.
--- End diff --

But, what if I don't provide a config? Or, create my own? What do I have to 
set in that config to make it work?

More to the point, WHEN do I have to provide my own root? When I do 
multiple clients in a single app? If so, can we provide a short example showing 
how this works?

Even easier, can the builder itself create a static allocator that it 
shares across connections?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #565: DRILL-4841: Use server event loop for web clients

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82267967
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -688,4 +802,142 @@ public DrillBuf getBuffer() {
   return null;
 }
   }
+
+  /**
+   * Return a new {@link DrillClient.Builder Drill client builder}.
+   * @return a new builder
+   */
+  public static Builder newBuilder() {
+return new Builder();
+  }
+
+  /**
+   * Helper class to construct a {@link DrillClient Drill client}.
+   */
+  public static class Builder {
+
+private DrillConfig config;
+private BufferAllocator allocator;
+private ClusterCoordinator clusterCoordinator;
+private EventLoopGroup eventLoopGroup;
+private ExecutorService executor;
+
+// defaults
+private boolean supportComplexTypes = true;
+private boolean isDirectConnection = false;
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client.
+ *
+ * @param drillConfig drill configuration
+ * @return this builder
+ */
+public Builder setConfig(DrillConfig drillConfig) {
+  this.config = drillConfig;
+  return this;
+}
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client based on 
the given file.
+ *
+ * @param fileName configuration file name
+ * @return this builder
+ */
+public Builder setConfigFromFile(final String fileName) {
+  this.config = DrillConfig.create(fileName);
+  return this;
+}
+
+/**
+ * Sets the {@link BufferAllocator buffer allocator} to be used by 
this client.
+ * If this is not set, an allocator will be created based on the 
configuration.
+ *
+ * If this is set, the caller is responsible for closing the given 
allocator.
+ *
+ * @param allocator buffer allocator
+ * @return this builder
+ */
+public Builder setAllocator(final BufferAllocator allocator) {
+  this.allocator = allocator;
+  return this;
+}
+
+/**
+ * Sets the {@link ClusterCoordinator cluster coordinator} that this 
client
+ * registers with. If this is not set and the this client does not use 
a
+ * {@link #setDirectConnection direct connection}, a cluster 
coordinator will
+ * be created based on the configuration.
+ *
+ * If this is set, the caller is responsible for closing the given 
coordinator.
+ *
+ * @param clusterCoordinator cluster coordinator
+ * @return this builder
+ */
+public Builder setClusterCoordinator(final ClusterCoordinator 
clusterCoordinator) {
+  this.clusterCoordinator = clusterCoordinator;
+  return this;
+}
+
+/**
+ * Sets the event loop group that to be used by the client. If this is 
not set,
+ * an event loop group will be created based on the configuration.
+ *
+ * If this is set, the caller is responsible for closing the given 
event loop group.
+ *
+ * @param eventLoopGroup event loop group
+ * @return this builder
+ */
+public Builder setEventLoopGroup(final EventLoopGroup eventLoopGroup) {
+  this.eventLoopGroup = eventLoopGroup;
+  return this;
+}
+
+/**
+ * Sets the executor service to be used by the client. If this is not 
set,
+ * an executor will be created based on the configuration.
+ *
+ * If this is set, the caller is responsible for closing the given 
executor.
+ *
+ * @param executor executor service
+ * @return this builder
+ */
+public Builder setExecutorService(final ExecutorService executor) {
+  this.executor = executor;
+  return this;
+}
+
+/**
+ * Sets whether the application is willing to accept complex types 
(Map, Arrays)
+ * in the returned result set. Default is {@code true}. If set to 
{@code false},
+ * the complex types are returned as JSON encoded VARCHAR type.
+ *
+ * @param supportComplexTypes if client accepts complex types
+ * @return this builder
+ */
+public Builder setSupportsComplexTypes(final boolean 
supportComplexTypes) {
+  this.supportComplexTypes = supportComplexTypes;
+  return this;
+}
+
+/**
+ * Sets whether the client will connect directly to the drillbit 
instead of going
--- End diff --

Here (or in the class comment), explain how to use this with an embedded 
Drillbit. There is no ZK in that case. Would I use a direct 

[GitHub] drill pull request #565: DRILL-4841: Use server event loop for web clients

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82266574
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -688,4 +802,142 @@ public DrillBuf getBuffer() {
   return null;
 }
   }
+
+  /**
+   * Return a new {@link DrillClient.Builder Drill client builder}.
+   * @return a new builder
+   */
+  public static Builder newBuilder() {
+return new Builder();
+  }
+
+  /**
+   * Helper class to construct a {@link DrillClient Drill client}.
+   */
+  public static class Builder {
+
+private DrillConfig config;
+private BufferAllocator allocator;
+private ClusterCoordinator clusterCoordinator;
+private EventLoopGroup eventLoopGroup;
+private ExecutorService executor;
+
+// defaults
+private boolean supportComplexTypes = true;
+private boolean isDirectConnection = false;
+
+/**
+ * Sets the {@link DrillConfig configuration} for this client.
+ *
+ * @param drillConfig drill configuration
+ * @return this builder
+ */
+public Builder setConfig(DrillConfig drillConfig) {
--- End diff --

DrillConfig objects are pretty complex. Should we accept the underlying 
Config (TypeSafe object) instead/in addition?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #565: DRILL-4841: Use server event loop for web clients

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82269228
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/auth/DrillUserPrincipal.java
 ---
@@ -118,8 +118,11 @@ public AnonDrillUserPrincipal(final DrillbitContext 
drillbitContext) {
 public DrillClient getDrillClient() throws IOException {
   try {
 // Create a DrillClient
-drillClient = new DrillClient(drillbitContext.getConfig(),
-drillbitContext.getClusterCoordinator(), 
drillbitContext.getAllocator());
+drillClient = DrillClient.newBuilder()
--- End diff --

Is the context something we should expose as part of our changes:

DrillbitContext context = ...
DrillClient client = DrillClient.builder( )
.fromContext( context )
.build( );

Doing the above might simplify test code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #565: DRILL-4841: Use server event loop for web clients

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82268707
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/auth/AbstractDrillLoginService.java
 ---
@@ -46,8 +46,11 @@ protected DrillClient createDrillClient(final String 
userName, final String pass
 
 try {
   // Create a DrillClient
-  drillClient = new DrillClient(drillbitContext.getConfig(),
-  drillbitContext.getClusterCoordinator(), 
drillbitContext.getAllocator());
+  drillClient = DrillClient.newBuilder()
+  .setConfig(drillbitContext.getConfig())
+  .setClusterCoordinator(drillbitContext.getClusterCoordinator())
+  .setAllocator(drillbitContext.getAllocator())
+  .build();
--- End diff --

Should the builder be extended to also (optionally) do connect? Or, return 
a connect builder?

Here I'm thinking that it would be helpful to have a single builder gather 
things like the user name property and the connection string (which would seem 
to be a DrillClient parameter but is actually a connection property.)

That is, either:

DrillClient.builder( ) ...
   .withUser( userName )
   .connectTo( "myHost", 1234 )
  .build( );

Or

DrillClient.builder( ) ...
   .buildClient( )
   .withUser( userName )
   .connectTo( "myHost", 1234 )
   .connect( );



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #565: DRILL-4841: Use server event loop for web clients

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82295775
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -89,79 +88,175 @@
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.Strings;
 import com.google.common.util.concurrent.AbstractCheckedFuture;
+import com.google.common.util.concurrent.MoreExecutors;
 import com.google.common.util.concurrent.SettableFuture;
 
 /**
  * Thin wrapper around a UserClient that handles connect/close and 
transforms
  * String into ByteBuf.
+ *
+ * To create non-default objects, use {@link DrillClient.Builder the 
builder class}.
+ * E.g.
+ * 
+ *   DrillClient client = DrillClient.newBuilder()
+ *   .setConfig(...)
+ *   .setIsDirectConnection(true)
+ *   .build();
+ * 
+ *
+ * Except for {@link #runQuery} and {@link #cancelQuery}, this class is 
generally not thread safe.
  */
 public class DrillClient implements Closeable, ConnectionThrottle {
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillClient.class);
 
   private static final ObjectMapper objectMapper = new ObjectMapper();
   private final DrillConfig config;
-  private UserClient client;
-  private UserProperties props = null;
-  private volatile ClusterCoordinator clusterCoordinator;
-  private volatile boolean connected = false;
   private final BufferAllocator allocator;
-  private int reconnectTimes;
-  private int reconnectDelay;
-  private boolean supportComplexTypes;
-  private final boolean ownsZkConnection;
+  private final EventLoopGroup eventLoopGroup;
+  private final ExecutorService executor;
+  private final boolean isDirectConnection;
+  private final int reconnectTimes;
+  private final int reconnectDelay;
+
+  // TODO: clusterCoordinator should be initialized in the constructor.
+  // Currently, initialization is tightly coupled with #connect.
+  private ClusterCoordinator clusterCoordinator;
+
+  // checks if this client owns these resources (used when closing)
   private final boolean ownsAllocator;
-  private final boolean isDirectConnection; // true if the connection 
bypasses zookeeper and connects directly to a drillbit
-  private EventLoopGroup eventLoopGroup;
-  private ExecutorService executor;
+  private final boolean ownsZkConnection;
+  private final boolean ownsEventLoopGroup;
+  private final boolean ownsExecutor;
+
+  // once #setSupportComplexTypes() is removed, make this final
+  private boolean supportComplexTypes;
+
+  private UserClient client;
+  private UserProperties props;
+  private boolean connected;
 
-  public DrillClient() throws OutOfMemoryException {
-this(DrillConfig.create(), false);
+  public DrillClient() {
+this(newBuilder());
   }
 
-  public DrillClient(boolean isDirect) throws OutOfMemoryException {
-this(DrillConfig.create(), isDirect);
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */
+  @Deprecated
+  public DrillClient(boolean isDirect) {
+this(newBuilder()
+.setDirectConnection(isDirect));
   }
 
-  public DrillClient(String fileName) throws OutOfMemoryException {
-this(DrillConfig.create(fileName), false);
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */
+  @Deprecated
+  public DrillClient(String fileName) {
+this(newBuilder()
+.setConfigFromFile(fileName));
   }
 
-  public DrillClient(DrillConfig config) throws OutOfMemoryException {
-this(config, null, false);
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */
+  @Deprecated
+  public DrillClient(DrillConfig config) {
+this(newBuilder()
+.setConfig(config));
   }
 
-  public DrillClient(DrillConfig config, boolean isDirect)
-  throws OutOfMemoryException {
-this(config, null, isDirect);
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */
+  @Deprecated
+  public DrillClient(DrillConfig config, boolean isDirect) {
+this(newBuilder()
+.setConfig(config)
+.setDirectConnection(isDirect));
   }
 
-  public DrillClient(DrillConfig config, ClusterCoordinator coordinator)
-throws OutOfMemoryException {
-this(config, coordinator, null, false);
+  /**
+   * @deprecated Create a DrillClient using {@link DrillClient.Builder}.
+   */
+  @Deprecated
+  public DrillClient(DrillConfig 

[GitHub] drill pull request #565: DRILL-4841: Use server event loop for web clients

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82265866
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -688,4 +802,142 @@ public DrillBuf getBuffer() {
   return null;
 }
   }
+
+  /**
+   * Return a new {@link DrillClient.Builder Drill client builder}.
+   * @return a new builder
+   */
+  public static Builder newBuilder() {
+return new Builder();
+  }
+
+  /**
+   * Helper class to construct a {@link DrillClient Drill client}.
+   */
+  public static class Builder {
--- End diff --

Seems useful/visible enough to justify being a top-level class, with its 
own file. That is, any reason not to say:

DrillClient client = new ClientBuilder( ) ... .build( );

Rather than:

DrillClient client = DrillClient.builder( ) ... .build( );

Because the Builder becomes the gateway to Drill, having extensive JavaDoc 
comments would be very helpful. See suggestions below.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #565: DRILL-4841: Use server event loop for web clients

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/565#discussion_r82269525
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java
 ---
@@ -41,7 +41,11 @@
   public void oneBitOneExchangeOneEntryRun() throws Exception{
 RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet();
 
-try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient 
client = new DrillClient(CONFIG, serviceSet.getCoordinator());){
+try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet);
--- End diff --

This pattern show up over and over. Should it just be moved into a function 
rather than as a duplicated "code injection" in a zillion files?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #604: DRILL-4862: binary_string should use another buffer...

2016-10-06 Thread gparai
Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/604#discussion_r82294640
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java
 ---
@@ -1540,15 +1540,16 @@ public void eval() {
   public static class BinaryString implements DrillSimpleFunc {
 @Param  VarCharHolder in;
 @Output VarBinaryHolder out;
+@Inject DrillBuf buffer;
 
 @Override
 public void setup() {}
 
 @Override
 public void eval() {
-  out.buffer = in.buffer;
-  out.start = in.start;
-  out.end = 
org.apache.drill.common.util.DrillStringUtils.parseBinaryString(in.buffer, 
in.start, in.end);
+  out.buffer = buffer.reallocIfNeeded(in.end - in.start);
+  out.start = out.end = 0;
+  out.end = 
org.apache.drill.common.util.DrillStringUtils.parseBinaryString(in.buffer, 
in.start, in.end, out.buffer);
   out.buffer.setIndex(out.start, out.end);
--- End diff --

Do we still need to set the index?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #604: DRILL-4862: binary_string should use another buffer...

2016-10-06 Thread gparai
Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/604#discussion_r82293808
  
--- Diff: 
common/src/main/java/org/apache/drill/common/util/DrillStringUtils.java ---
@@ -160,10 +160,9 @@ private static void appendByte(StringBuilder result, 
byte b) {
*
* @return Index in the byte buffer just after the last written byte.
*/
-  public static int parseBinaryString(ByteBuf str, int strStart, int 
strEnd) {
-int length = (strEnd - strStart);
-int dstEnd = strStart;
-for (int i = strStart; i < strStart+length ; i++) {
+  public static int parseBinaryString(ByteBuf str, int strStart, int 
strEnd, ByteBuf out) {
+int dstEnd = 0;
+for (int i = strStart; i < strEnd ; i++) {
--- End diff --

Please change to `strEnd;`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #600: DRILL-4373: Drill and Hive have incompatible timestamp rep...

2016-10-06 Thread vdiravka
Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/600
  
@bitblender Sorry about this. That was hidden `\u` symbols.
Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #604: DRILL-4862: binary_string should use another buffer...

2016-10-06 Thread gparai
Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/604#discussion_r82293523
  
--- Diff: 
common/src/main/java/org/apache/drill/common/util/DrillStringUtils.java ---
@@ -160,10 +160,9 @@ private static void appendByte(StringBuilder result, 
byte b) {
*
--- End diff --

Please update the comment - it no longer does in-place parsing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #595: DRILL-4203: Parquet File. Date is stored wrongly

2016-10-06 Thread jaltekruse
Github user jaltekruse commented on a diff in the pull request:

https://github.com/apache/drill/pull/595#discussion_r82291973
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java
 ---
@@ -123,6 +123,7 @@ public ScanBatch getBatch(FragmentContext context, 
HiveDrillNativeParquetSubScan
   // in the first row group
   ParquetReaderUtility.DateCorruptionStatus containsCorruptDates =
   ParquetReaderUtility.detectCorruptDates(parquetMetadata, 
config.getColumns(), true);
+  logger.info(containsCorruptDates.toString());
--- End diff --

Sounds good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #595: DRILL-4203: Parquet File. Date is stored wrongly

2016-10-06 Thread jaltekruse
Github user jaltekruse commented on a diff in the pull request:

https://github.com/apache/drill/pull/595#discussion_r82292165
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ---
@@ -184,17 +201,15 @@ public static DateCorruptionStatus 
detectCorruptDates(ParquetMetadata footer,
   if (parsedCreatedByVersion.hasSemanticVersion()) {
 SemanticVersion semVer = 
parsedCreatedByVersion.getSemanticVersion();
 String pre = semVer.pre + "";
-if (semVer != null && semVer.major == 1 && semVer.minor == 8 
&& semVer.patch == 1 && pre.contains("drill")) {
--- End diff --

Sorry I should have looked through the rest of the changes, thank you for 
fixing this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #595: DRILL-4203: Parquet File. Date is stored wrongly

2016-10-06 Thread jaltekruse
Github user jaltekruse commented on a diff in the pull request:

https://github.com/apache/drill/pull/595#discussion_r82292241
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java
 ---
@@ -107,6 +107,7 @@ public ScanBatch getBatch(FragmentContext context, 
ParquetRowGroupScan rowGroupS
 boolean autoCorrectCorruptDates = 
rowGroupScan.formatConfig.autoCorrectCorruptDates;
 ParquetReaderUtility.DateCorruptionStatus containsCorruptDates = 
ParquetReaderUtility.detectCorruptDates(footers.get(e.getPath()), 
rowGroupScan.getColumns(),
 autoCorrectCorruptDates);
+logger.info(containsCorruptDates.toString());
--- End diff --

addressed, in other similar comment, this is good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #595: DRILL-4203: Parquet File. Date is stored wrongly

2016-10-06 Thread vdiravka
Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/595#discussion_r82287507
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ---
@@ -184,17 +201,15 @@ public static DateCorruptionStatus 
detectCorruptDates(ParquetMetadata footer,
   if (parsedCreatedByVersion.hasSemanticVersion()) {
 SemanticVersion semVer = 
parsedCreatedByVersion.getSemanticVersion();
 String pre = semVer.pre + "";
-if (semVer != null && semVer.major == 1 && semVer.minor == 8 
&& semVer.patch == 1 && pre.contains("drill")) {
--- End diff --

If `semVer `will be null we get NPE above, where `semVer.pre` is called. 
It happened with tests on our regression test framework. That's why I added 
`hasSemanticVersion()` above. So if this method return `true`, `semVer `cann't 
be `NULL` according `org.apache.parquet.SemanticVersion` and the following 
`semVer!=null` checking is redundant.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #595: DRILL-4203: Parquet File. Date is stored wrongly

2016-10-06 Thread vdiravka
Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/595#discussion_r82290736
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java
 ---
@@ -123,6 +123,7 @@ public ScanBatch getBatch(FragmentContext context, 
HiveDrillNativeParquetSubScan
   // in the first row group
   ParquetReaderUtility.DateCorruptionStatus containsCorruptDates =
   ParquetReaderUtility.detectCorruptDates(parquetMetadata, 
config.getColumns(), true);
+  logger.info(containsCorruptDates.toString());
--- End diff --

But it is not a regular Boolean value, `DateCorruptionStatus` is an enum 
with overrided toString method. For example how log file looks after querying 
hive parquet file with correct date values:

`2016-10-06 23:32:30,598 [280920f1-e362-136e-0fdd-24779fef2c4a:frag:0:0] 
INFO  o.a.d.e.s.p.ParquetScanBatchCreator - It is determined from metadata that 
the date values are definitely CORRECT`

Is it enought?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [HANGOUT] Topics for 10/04/16

2016-10-06 Thread Laurent Goujon
For the PR I didn't squash the changes into one single commit to make it
easier to review, but let me know if you would prefer the changes or pull
requests organized differently to speed up the review...

On Thu, Oct 6, 2016 at 11:50 AM, Parth Chandra 
wrote:

> I see, so it is not the prepare query where we see the improvement. I
> didn't realise that the BI tools can make so many information schema
> queries.
>
> I haven't been able to go thru the PR in detail. There is extensive
> refactoring that always makes it hard to tell the exact changes.  (Make me
> nervous as we don't have _any_ tests to test for memory leaks, concurrency
> and error conditions). I'll try to test some of these conditions out.
>
> Re DRILL-4280 - I was basing my comment on the PR comment.  I'm assuming
> Sudheesh will follow up and address this.
>
>
> On Wed, Oct 5, 2016 at 1:37 PM, Jacques Nadeau  wrote:
>
> > -user (since this is really a dev discussion)
> >
> > Looking at DRILL-4280, I see the comment about C++ changes but don't see
> > them there or in the linked squash branch. Maybe I'm missing something
> > obvious?
> >
> > The key performance benefits from the provided patch are about all the
> > other interrogations that Tableau (and other BI tools) makes against the
> > ODBC metadata interfaces (not prepare). With the new driver, those items
> > can be answered directly (such as list of schemas) rather than through
> ODBC
> > driver generated queries. I've commonly seen situations where even a
> simple
> > tableau workflow can take 7-10s even in the situation that the actual
> > "work" query (and associated limit 0 query) is quite short (<1s). This
> has
> > been especially problematic in situations where you're interacting with a
> > large number of sources since those information schema queries are not
> > always optimal to what is actually needed.
> >
> > With regards to benchmarks, we haven't done anything formal but you
> should
> > be able to easily do a comparison and see the improvements, especially
> when
> > dealing with larger catalogs.
> >
> >
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Wed, Oct 5, 2016 at 9:49 AM, Parth Chandra 
> > wrote:
> >
> > > Yup. I agree that we need to make sure that both clients are in sync. I
> > > believe DRILL-4280's PR refers to making changes in both APIs as well.
> > >
> > > Do you have a sense of how these changes give us a performance boost?
> As
> > > far as I can see, the APIs result in nearly the same code path being
> > > executed, with the difference being that the limit 0 query is now
> > submitted
> > > by the server instead of the client.
> > >
> > > I don't know much about the tweaking of performance for various BI
> tools;
> > > is there something that Tableau et al do different? I don't see how,
> > since
> > > the the ODBC/JDBC interface remains the same. Just trying to understand
> > > this.
> > >
> > > Anyway, any performance gain is wonderful. Do you have any numbers to
> > > share?
> > >
> > >
> > > On Tue, Oct 4, 2016 at 10:29 AM, Jacques Nadeau 
> > > wrote:
> > >
> > > > Both the C++ and the JDBC changes are updates that leverage a number
> of
> > > > pre-existing APIs already on the server. Our initial evaluations, we
> > have
> > > > already seen substantially improved BI tool performance with the
> > proposed
> > > > changes (with no additional server side changes). Are you seeing
> > > something
> > > > different? If you haven't yet looked at the changes in that light, I
> > > > suggest you do.
> > > >
> > > > If anything, I'm more concerned about client feature proposals that
> > don't
> > > > cover both the C++ and Java client. For example, I think we should be
> > > > cautious about merging something like DRILL-4280. We should be
> cautious
> > > > about introducing new server APIs unless there is a concrete plan
> > around
> > > > support in all clients.
> > > >
> > > > So I agree with the spirit of your ask: change proposals should be
> > > > "complete". However, I don't think it reasonably applies to the
> changes
> > > > proposed by Laurent. His changes "complete" the already introduced
> > > metadata
> > > > and prepare apis the server exposes. It provides an improved BI user
> > > > experience. It also introduces unit tests in the C++ client,
> something
> > > that
> > > > was previously sorely missing.
> > > >
> > > >
> > > >
> > > > --
> > > > Jacques Nadeau
> > > > CTO and Co-Founder, Dremio
> > > >
> > > > On Tue, Oct 4, 2016 at 9:47 AM, Parth Chandra  >
> > > > wrote:
> > > >
> > > > > Hi guys,
> > > > >
> > > > >   I won't be able to join the hangout but it would be good to
> discuss
> > > the
> > > > > plan for the related backend changes.
> > > > >
> > > > >   As I mentioned before I would like to see a concrete proposal for
> > the
> > > > > backend that will accompany these changes. Without that, 

[GitHub] drill pull request #595: DRILL-4203: Parquet File. Date is stored wrongly

2016-10-06 Thread jaltekruse
Github user jaltekruse commented on a diff in the pull request:

https://github.com/apache/drill/pull/595#discussion_r82274813
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ---
@@ -184,17 +201,15 @@ public static DateCorruptionStatus 
detectCorruptDates(ParquetMetadata footer,
   if (parsedCreatedByVersion.hasSemanticVersion()) {
 SemanticVersion semVer = 
parsedCreatedByVersion.getSemanticVersion();
 String pre = semVer.pre + "";
-if (semVer != null && semVer.major == 1 && semVer.minor == 8 
&& semVer.patch == 1 && pre.contains("drill")) {
--- End diff --

I don't know if you are guarding against a null value for semVer at a 
higher level, but I'm pretty sure the reason I added it is that older versions 
of parquet files can be lacking this field. I assume with all of the tests 
cases that were added we should be okay, but it might be better to keep it in 
defensively. If this field isn't set in the metadata, it shouldn't make it an 
invalid file, but if we get an NPE it will stop query processing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #595: DRILL-4203: Parquet File. Date is stored wrongly

2016-10-06 Thread jaltekruse
Github user jaltekruse commented on a diff in the pull request:

https://github.com/apache/drill/pull/595#discussion_r82273282
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -918,18 +916,22 @@ public void setMax(Object max) {
 @JsonProperty public ConcurrentHashMap columnTypeInfo;
 @JsonProperty List files;
 @JsonProperty List directories;
-@JsonProperty String drillVersion;
--- End diff --

This sounds reasonable, having both is okay.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #595: DRILL-4203: Parquet File. Date is stored wrongly

2016-10-06 Thread jaltekruse
Github user jaltekruse commented on a diff in the pull request:

https://github.com/apache/drill/pull/595#discussion_r82272848
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java
 ---
@@ -123,6 +123,7 @@ public ScanBatch getBatch(FragmentContext context, 
HiveDrillNativeParquetSubScan
   // in the first row group
   ParquetReaderUtility.DateCorruptionStatus containsCorruptDates =
   ParquetReaderUtility.detectCorruptDates(parquetMetadata, 
config.getColumns(), true);
+  logger.info(containsCorruptDates.toString());
--- End diff --

not a very useful logging statement to check in, should at least contain a 
message about what this value is. This will just print a boolean value. If it 
was just for local debugging it can be removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #574: DRILL-4726: Dynamic UDFs support

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/574#discussion_r82262083
  
--- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
@@ -194,10 +194,12 @@ drill.exec: {
   udf: {
 retry-attempts: 5,
 directory: {
-  base: ${drill.home}"/"${drill.exec.zk.root}"/udf",
-  staging: ${drill.exec.udf.directory.base}"/staging",
-  registry: ${drill.exec.udf.directory.base}"/registry",
-  tmp: ${drill.exec.udf.directory.base}"/tmp"
+  base: ${drill.exec.zk.root}"/udf",
+  local: ${drill.exec.udf.directory.base}"/local",
--- End diff --

How is this used? The path given by local seems relative:

drill/udf

Maybe include a simple comment to describe each config setting.

Otherwise, I like how this has come together!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #574: DRILL-4726: Dynamic UDFs support

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/574#discussion_r82260717
  
--- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
@@ -45,7 +45,7 @@ drill.client: {
   supports-complex-types: true
 }
 
-drill.home: "/tmp"
+drill.dfs-home: "/tmp"
--- End diff --

On a standard HDFS setup, is there a /tmp folder? All the examples suggest 
that stuff often goes into "/user/something" such as "/user/drill".

Also, if "/tmp" behaves like a Linux /tmp, might something come along and 
clean up the UDFs during a long Drill run?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #574: DRILL-4726: Dynamic UDFs support

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/574#discussion_r82251309
  
--- Diff: distribution/src/resources/drill-config.sh ---
@@ -324,6 +324,21 @@ if [ -n "$DRILL_CLASSPATH" ]; then
   CP="$CP:$DRILL_CLASSPATH"
 fi
 
+# If tmp dir is given, it must exist.
+if [ -n "$DRILL_TMP_DIR" ]; then
+  if [[ ! -d "$DRILL_TMP_DIR" ]]; then
+fatal_error "temporary dir does not exist:" $DRILL_TMP_DIR
--- End diff --

Capitalize "Temporary".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #574: DRILL-4726: Dynamic UDFs support

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/574#discussion_r82252246
  
--- Diff: distribution/src/resources/sqlline.bat ---
@@ -114,6 +114,10 @@ if "test%DRILL_LOG_DIR%" == "test" (
   set DRILL_LOG_DIR=%DRILL_HOME%\log
 )
 
+if "test%DRILL_TMP_DIR%" == "test" (
+  set DRILL_TMP_DIR=\tmp
--- End diff --

On Windows, I think the standard is to use %TEMP%. There generally is no 
directory called "\tmp", sometimes there is a "\temp". But %TEMP% is the 
accepted practice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #574: DRILL-4726: Dynamic UDFs support

2016-10-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/574#discussion_r82259602
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionImplementationRegistry.java
 ---
@@ -378,38 +399,33 @@ private Path getLocalUdfDir() {
   }
 
   /**
-   * First tries to get drill conf directory value from system properties,
+   * First tries to get drill temporary directory value from system 
properties,
* if value is missing, checks environment properties.
* Throws exception is value is null.
-   * @return drill conf dir path
+   * @return drill temporary directory path
*/
-  private String getConfDir() {
-String drillConfDir = "DRILL_CONF_DIR";
-String value = System.getProperty(drillConfDir);
+  private String getTmpDir() {
--- End diff --

Can we be more forgiving here?

1. Use DRILL_TMP_DIR, if set.
2. Use a config file setting, if set.
3. Use Google's Files.createTempDir( ) which "Atomically creates a new 
directory somewhere beneath the system's temporary directory (as defined by the 
java.io.tmpdir system property)"

For most users, the choice in 3 should work fine. It would only be folks 
who have special needs that would set one of the other two properties.

Then, we can use a TypeSafe trick to combine 1 and 2. Define the config 
property something like this:

drill.temp-dir: "${DRILL_TMP_DIR"

Now, you just have to check drill.temp-dir in your function. If not set, 
use the Files approach as a default.

The nice thing about the Files approach is that each Drillbit will have a 
different directory (if two happen to be running (with different ports) at the 
same time.)

I wonder, however, does the Files temp directory get deleted on Drillbit 
exit?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [HANGOUT] Topics for 10/04/16

2016-10-06 Thread Parth Chandra
I see, so it is not the prepare query where we see the improvement. I
didn't realise that the BI tools can make so many information schema
queries.

I haven't been able to go thru the PR in detail. There is extensive
refactoring that always makes it hard to tell the exact changes.  (Make me
nervous as we don't have _any_ tests to test for memory leaks, concurrency
and error conditions). I'll try to test some of these conditions out.

Re DRILL-4280 - I was basing my comment on the PR comment.  I'm assuming
Sudheesh will follow up and address this.


On Wed, Oct 5, 2016 at 1:37 PM, Jacques Nadeau  wrote:

> -user (since this is really a dev discussion)
>
> Looking at DRILL-4280, I see the comment about C++ changes but don't see
> them there or in the linked squash branch. Maybe I'm missing something
> obvious?
>
> The key performance benefits from the provided patch are about all the
> other interrogations that Tableau (and other BI tools) makes against the
> ODBC metadata interfaces (not prepare). With the new driver, those items
> can be answered directly (such as list of schemas) rather than through ODBC
> driver generated queries. I've commonly seen situations where even a simple
> tableau workflow can take 7-10s even in the situation that the actual
> "work" query (and associated limit 0 query) is quite short (<1s). This has
> been especially problematic in situations where you're interacting with a
> large number of sources since those information schema queries are not
> always optimal to what is actually needed.
>
> With regards to benchmarks, we haven't done anything formal but you should
> be able to easily do a comparison and see the improvements, especially when
> dealing with larger catalogs.
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Wed, Oct 5, 2016 at 9:49 AM, Parth Chandra 
> wrote:
>
> > Yup. I agree that we need to make sure that both clients are in sync. I
> > believe DRILL-4280's PR refers to making changes in both APIs as well.
> >
> > Do you have a sense of how these changes give us a performance boost? As
> > far as I can see, the APIs result in nearly the same code path being
> > executed, with the difference being that the limit 0 query is now
> submitted
> > by the server instead of the client.
> >
> > I don't know much about the tweaking of performance for various BI tools;
> > is there something that Tableau et al do different? I don't see how,
> since
> > the the ODBC/JDBC interface remains the same. Just trying to understand
> > this.
> >
> > Anyway, any performance gain is wonderful. Do you have any numbers to
> > share?
> >
> >
> > On Tue, Oct 4, 2016 at 10:29 AM, Jacques Nadeau 
> > wrote:
> >
> > > Both the C++ and the JDBC changes are updates that leverage a number of
> > > pre-existing APIs already on the server. Our initial evaluations, we
> have
> > > already seen substantially improved BI tool performance with the
> proposed
> > > changes (with no additional server side changes). Are you seeing
> > something
> > > different? If you haven't yet looked at the changes in that light, I
> > > suggest you do.
> > >
> > > If anything, I'm more concerned about client feature proposals that
> don't
> > > cover both the C++ and Java client. For example, I think we should be
> > > cautious about merging something like DRILL-4280. We should be cautious
> > > about introducing new server APIs unless there is a concrete plan
> around
> > > support in all clients.
> > >
> > > So I agree with the spirit of your ask: change proposals should be
> > > "complete". However, I don't think it reasonably applies to the changes
> > > proposed by Laurent. His changes "complete" the already introduced
> > metadata
> > > and prepare apis the server exposes. It provides an improved BI user
> > > experience. It also introduces unit tests in the C++ client, something
> > that
> > > was previously sorely missing.
> > >
> > >
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Tue, Oct 4, 2016 at 9:47 AM, Parth Chandra 
> > > wrote:
> > >
> > > > Hi guys,
> > > >
> > > >   I won't be able to join the hangout but it would be good to discuss
> > the
> > > > plan for the related backend changes.
> > > >
> > > >   As I mentioned before I would like to see a concrete proposal for
> the
> > > > backend that will accompany these changes. Without that, I feel there
> > is
> > > no
> > > > point to adding so much new code.
> > > >
> > > > Thanks
> > > >
> > > > Parth
> > > >
> > > >
> > > > On Mon, Oct 3, 2016 at 7:52 PM, Laurent Goujon 
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'm currently working on improving metadata support for both the
> JDBC
> > > > > driver and the C++ connector, more specifically the following
> JIRAs:
> > > > >
> > > > > DRILL-4853: Update C++ protobuf source files
> > > > > DRILL-4420: Server-side metadata and