[GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/518 Ran some tests. The results look good. In particular, files with nested structures produced the correct results. Since it was the nested structure case that had me a bit worried, looks like the code is good to go. +1 (non-binding) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #509: DRILL-4618: Fix hive function loader not correctly take ra...
Github user Ben-Zvi commented on the issue: https://github.com/apache/drill/pull/509 Tested and checked ... LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #604: DRILL-4862: binary_string should use another buffer as out...
Github user gparai commented on the issue: https://github.com/apache/drill/pull/604 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #604: DRILL-4862: binary_string should use another buffer...
Github user chunhui-shi commented on a diff in the pull request: https://github.com/apache/drill/pull/604#discussion_r82315506 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java --- @@ -1540,15 +1540,16 @@ public void eval() { public static class BinaryString implements DrillSimpleFunc { @Param VarCharHolder in; @Output VarBinaryHolder out; +@Inject DrillBuf buffer; @Override public void setup() {} @Override public void eval() { - out.buffer = in.buffer; - out.start = in.start; - out.end = org.apache.drill.common.util.DrillStringUtils.parseBinaryString(in.buffer, in.start, in.end); + out.buffer = buffer.reallocIfNeeded(in.end - in.start); + out.start = out.end = 0; + out.end = org.apache.drill.common.util.DrillStringUtils.parseBinaryString(in.buffer, in.start, in.end, out.buffer); out.buffer.setIndex(out.start, out.end); --- End diff -- Yes, we need to set readerIndex and writerIndex. Otherwise if there is another function consume the output of this function, it will hit error. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #600: DRILL-4373: Drill and Hive have incompatible timest...
Github user bitblender commented on a diff in the pull request: https://github.com/apache/drill/pull/600#discussion_r82314071 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java --- @@ -45,4 +53,34 @@ public static int getIntFromLEBytes(byte[] input, int start) { } return out; } + + /** + * Utilities for converting from parquet INT96 binary (impala, hive timestamp) + * to date time value. This utilizes the Joda library. + */ + public static class NanoTimeUtils { + +public static final long NANOS_PER_DAY = TimeUnit.DAYS.toNanos(1); +public static final long NANOS_PER_HOUR = TimeUnit.HOURS.toNanos(1); +public static final long NANOS_PER_MINUTE = TimeUnit.MINUTES.toNanos(1); +public static final long NANOS_PER_SECOND = TimeUnit.SECONDS.toNanos(1); +public static final long NANOS_PER_MILLISECOND = TimeUnit.MILLISECONDS.toNanos(1); + + /** + * @param binaryTimeStampValue + * hive, impala timestamp values with nanoseconds precision + * are stored in parquet Binary as INT96 + * + * @return the number of milliseconds since January 1, 1970, 00:00:00 GMT + * represented by @param binaryTimeStampValue . + */ +public static long getDateTimeValueFromBinary(Binary binaryTimeStampValue) { + NanoTime nt = NanoTime.fromBinary(binaryTimeStampValue); + int julianDay = nt.getJulianDay(); + long nanosOfDay = nt.getTimeOfDayNanos(); + return DateTimeUtils.fromJulianDay(julianDay-0.5d) + nanosOfDay/NANOS_PER_MILLISECOND; --- End diff -- 1. I would recommend not using Joda. Do the calculations directly, like in ConvertFromImpalaTimestamp. Joda uses non-standard, hence confusing, terminology. What Joda calls and uses as JulianDay, is actually Julian Date. Seems like you have identified this discrepancy and adjusted for it by subtracting 0.5 from _julianDay_. Note: (I guess you have already figured this out) : The actual code and the Joda code in the comment, in ConvertFromImpalaTimestamp, are inconsistent. Took me a day to figure out the reason behind this ! A bug should be opened to delete the comment. 2. Can you please also leave a comment stating that 2440588 is the JDN for the Unix Epoch. 3. Please leave a comment stating that the order of the calls to get _julianDay_ and _nanosOfDay_ matters. You can do this by just stating how timestamps are stored in INT96 i.e 32-bit JDN followed by 64-bit nanosOfDay. 4. Consistent(single or none) spacing for binary operators (+-/) used here would be nice. Single spacing would be preferable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #496: Update for MapR profile
Github user pwong-mapr closed the pull request at: https://github.com/apache/drill/pull/496 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #496: Update for MapR profile
Github user pwong-mapr commented on the issue: https://github.com/apache/drill/pull/496 Not sure why this PR is still open - go ahead and delete it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #487: remove useless text
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/487 +1 (non-binding) A very simple change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #478: DRILL-4581: Minor fixes to drill startup scripts for linux
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/478 Replaced by later commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #478: DRILL-4581: Minor fixes to drill startup scripts fo...
Github user paul-rogers closed the pull request at: https://github.com/apache/drill/pull/478 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #573: Drill 4858
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/573#discussion_r82305362 --- Diff: exec/java-exec/src/main/codegen/templates/TypeHelper.java --- @@ -68,8 +68,23 @@ public static JType getHolderType(JCodeModel model, MinorType type, DataMode mod case UNION: return model._ref(UnionHolder.class); case MAP: +switch (mode) { + case REQUIRED: --- End diff -- Perhaps combine the case statements to avoid repeated code: case REQUIRED: case OPTIONAL: return model... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #573: Drill 4858
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/573 Unit tests? If this repeated maps have never been enabled, how do we know that the code actually works for this case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #593: DRILL-3178 csv reader should allow newlines inside ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/593#discussion_r82303401 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/TextReader.java --- @@ -231,33 +231,34 @@ private void parseQuotedValue(byte prev) throws IOException { final TextInput input = this.input; final byte quote = this.quote; -ch = input.nextChar(); +try { + input.setMonitorForNewLine(false); --- End diff -- Seems an overly complex way to do the parsing. Is there any reason we want to capture the original newline character rather than the normalized one? If we need to capture the original one, then a cleaner way to do that is to keep track of the start & end position of the current token (character), and provide a method to return that block as a string. Then, scan for a close quote, reading characters & special-casing any newlines. If we want to include newlines in quoted strings sometimes, but not other times, then the check logic can be a bit more complex. But, the proposed solution of making newlines not be newlines seems a bit odd... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #593: DRILL-3178 csv reader should allow newlines inside ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/593#discussion_r82303872 --- Diff: exec/java-exec/src/test/resources/store/text/WithQuotedCrLf.tbl --- @@ -0,0 +1,6 @@ +"a +1"|a|a +a|"a +2"|a +a|a|"a +3" --- End diff -- Is there an issue with git converting Windows-style newlines (\r\n) into Unix-style (\n) when this file is checked in & out? Will that mess up the test? Should the test generate this file to handle this particular special case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #593: DRILL-3178 csv reader should allow newlines inside ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/593#discussion_r82304834 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/TextReader.java --- @@ -231,33 +231,34 @@ private void parseQuotedValue(byte prev) throws IOException { final TextInput input = this.input; final byte quote = this.quote; -ch = input.nextChar(); +try { + input.setMonitorForNewLine(false); + ch = input.nextChar(); -while (!(prev == quote && (ch == delimiter || ch == newLine || isWhite(ch { - if (ch != quote) { -if (prev == quote) { // unescaped quote detected - if (parseUnescapedQuotes) { -output.append(quote); -output.append(ch); -parseQuotedValue(ch); -break; - } else { -throw new TextParsingException( -context, -"Unescaped quote character '" -+ quote -+ "' inside quoted value of CSV field. To allow unescaped quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser settings. Cannot parse CSV input."); + while (!(prev == quote && (ch == delimiter || ch == newLine || isWhite(ch { +if (ch != quote) { + if (prev == quote) { // unescaped quote detected +if (parseUnescapedQuotes) { + output.append(quote); + output.append(ch); + parseQuotedValue(ch); + break; +} else { + throw new TextParsingException(context, "Unescaped quote character '" + quote + "' inside quoted value of CSV field. To allow unescaped quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser settings. Cannot parse CSV input."); +} } + output.append(ch); + prev = ch; +} else if (prev == quoteEscape) { + output.append(quote); + prev = NULL_BYTE; +} else { + prev = ch; } -output.append(ch); -prev = ch; - } else if (prev == quoteEscape) { -output.append(quote); -prev = NULL_BYTE; - } else { -prev = ch; +ch = input.nextChar(); } - ch = input.nextChar(); +} finally { --- End diff -- I see why it is done in finally. However, as noted above, I'm not sure that pushing this kind of flag into the getChar function is the optimal approach... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #593: DRILL-3178 csv reader should allow newlines inside ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/593#discussion_r82296690 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/TextInput.java --- @@ -88,6 +88,11 @@ private boolean endFound = false; /** + * Switch for enabling/disabling new line detection --- End diff -- Explain a bit more? Presumably, we already "monitor" and "detect" new lines in some way. What, specifically does this add? Presumably, it sets the mode to enable new line detection within quotes (the title of the Jira entry)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #595: DRILL-4203: Parquet File. Date is stored wrongly
Github user vdiravka commented on the issue: https://github.com/apache/drill/pull/595 Thanks @jaltekruse. @parthchandra Could you please take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #565: DRILL-4841: Use server event loop for web clients
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/565#discussion_r82266459 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -688,4 +802,142 @@ public DrillBuf getBuffer() { return null; } } + + /** + * Return a new {@link DrillClient.Builder Drill client builder}. + * @return a new builder + */ + public static Builder newBuilder() { +return new Builder(); + } + + /** + * Helper class to construct a {@link DrillClient Drill client}. + */ + public static class Builder { + +private DrillConfig config; +private BufferAllocator allocator; +private ClusterCoordinator clusterCoordinator; +private EventLoopGroup eventLoopGroup; +private ExecutorService executor; + +// defaults +private boolean supportComplexTypes = true; +private boolean isDirectConnection = false; + +/** + * Sets the {@link DrillConfig configuration} for this client. + * + * @param drillConfig drill configuration + * @return this builder + */ +public Builder setConfig(DrillConfig drillConfig) { + this.config = drillConfig; + return this; +} + +/** + * Sets the {@link DrillConfig configuration} for this client based on the given file. --- End diff -- Better to set this based on a File (or, for the trendy, Path) object, since that is the Java standard for local files. Does this replace the default class-path config? Or, is this added to the defaults? Can I use this with the above? Or, can I set either the config directory OR via a file? If one or the other, should we check that case and throw an IllegalStateException (unchecked) or the like? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #565: DRILL-4841: Use server event loop for web clients
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/565#discussion_r82267594 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -688,4 +802,142 @@ public DrillBuf getBuffer() { return null; } } + + /** + * Return a new {@link DrillClient.Builder Drill client builder}. + * @return a new builder + */ + public static Builder newBuilder() { +return new Builder(); + } + + /** + * Helper class to construct a {@link DrillClient Drill client}. + */ + public static class Builder { + +private DrillConfig config; +private BufferAllocator allocator; +private ClusterCoordinator clusterCoordinator; +private EventLoopGroup eventLoopGroup; +private ExecutorService executor; + +// defaults +private boolean supportComplexTypes = true; +private boolean isDirectConnection = false; + +/** + * Sets the {@link DrillConfig configuration} for this client. + * + * @param drillConfig drill configuration + * @return this builder + */ +public Builder setConfig(DrillConfig drillConfig) { + this.config = drillConfig; + return this; +} + +/** + * Sets the {@link DrillConfig configuration} for this client based on the given file. + * + * @param fileName configuration file name + * @return this builder + */ +public Builder setConfigFromFile(final String fileName) { + this.config = DrillConfig.create(fileName); + return this; +} + +/** + * Sets the {@link BufferAllocator buffer allocator} to be used by this client. + * If this is not set, an allocator will be created based on the configuration. + * + * If this is set, the caller is responsible for closing the given allocator. + * + * @param allocator buffer allocator + * @return this builder + */ +public Builder setAllocator(final BufferAllocator allocator) { + this.allocator = allocator; + return this; +} + +/** + * Sets the {@link ClusterCoordinator cluster coordinator} that this client + * registers with. If this is not set and the this client does not use a + * {@link #setDirectConnection direct connection}, a cluster coordinator will + * be created based on the configuration. + * + * If this is set, the caller is responsible for closing the given coordinator. + * + * @param clusterCoordinator cluster coordinator + * @return this builder + */ +public Builder setClusterCoordinator(final ClusterCoordinator clusterCoordinator) { + this.clusterCoordinator = clusterCoordinator; + return this; +} + +/** + * Sets the event loop group that to be used by the client. If this is not set, + * an event loop group will be created based on the configuration. + * + * If this is set, the caller is responsible for closing the given event loop group. + * + * @param eventLoopGroup event loop group + * @return this builder + */ +public Builder setEventLoopGroup(final EventLoopGroup eventLoopGroup) { + this.eventLoopGroup = eventLoopGroup; + return this; +} + +/** + * Sets the executor service to be used by the client. If this is not set, + * an executor will be created based on the configuration. + * + * If this is set, the caller is responsible for closing the given executor. + * + * @param executor executor service + * @return this builder + */ +public Builder setExecutorService(final ExecutorService executor) { + this.executor = executor; + return this; +} + +/** + * Sets whether the application is willing to accept complex types (Map, Arrays) --- End diff -- Default value? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #565: DRILL-4841: Use server event loop for web clients
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/565#discussion_r82267107 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -688,4 +802,142 @@ public DrillBuf getBuffer() { return null; } } + + /** + * Return a new {@link DrillClient.Builder Drill client builder}. + * @return a new builder + */ + public static Builder newBuilder() { +return new Builder(); + } + + /** + * Helper class to construct a {@link DrillClient Drill client}. + */ + public static class Builder { + +private DrillConfig config; +private BufferAllocator allocator; +private ClusterCoordinator clusterCoordinator; +private EventLoopGroup eventLoopGroup; +private ExecutorService executor; + +// defaults +private boolean supportComplexTypes = true; +private boolean isDirectConnection = false; + +/** + * Sets the {@link DrillConfig configuration} for this client. + * + * @param drillConfig drill configuration + * @return this builder + */ +public Builder setConfig(DrillConfig drillConfig) { + this.config = drillConfig; + return this; +} + +/** + * Sets the {@link DrillConfig configuration} for this client based on the given file. + * + * @param fileName configuration file name + * @return this builder + */ +public Builder setConfigFromFile(final String fileName) { + this.config = DrillConfig.create(fileName); + return this; +} + +/** + * Sets the {@link BufferAllocator buffer allocator} to be used by this client. + * If this is not set, an allocator will be created based on the configuration. + * + * If this is set, the caller is responsible for closing the given allocator. + * + * @param allocator buffer allocator + * @return this builder + */ +public Builder setAllocator(final BufferAllocator allocator) { + this.allocator = allocator; + return this; +} + +/** + * Sets the {@link ClusterCoordinator cluster coordinator} that this client --- End diff -- When would I use this rather than letting Drill create it from the config? Would I do this if I have multiple Drill clients in a single app? Or, will the underlying code know to share the same coordinator? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #565: DRILL-4841: Use server event loop for web clients
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/565#discussion_r82266904 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -688,4 +802,142 @@ public DrillBuf getBuffer() { return null; } } + + /** + * Return a new {@link DrillClient.Builder Drill client builder}. + * @return a new builder + */ + public static Builder newBuilder() { +return new Builder(); + } + + /** + * Helper class to construct a {@link DrillClient Drill client}. + */ + public static class Builder { + +private DrillConfig config; +private BufferAllocator allocator; +private ClusterCoordinator clusterCoordinator; +private EventLoopGroup eventLoopGroup; +private ExecutorService executor; + +// defaults +private boolean supportComplexTypes = true; +private boolean isDirectConnection = false; + +/** + * Sets the {@link DrillConfig configuration} for this client. + * + * @param drillConfig drill configuration + * @return this builder + */ +public Builder setConfig(DrillConfig drillConfig) { + this.config = drillConfig; + return this; +} + +/** + * Sets the {@link DrillConfig configuration} for this client based on the given file. + * + * @param fileName configuration file name + * @return this builder + */ +public Builder setConfigFromFile(final String fileName) { + this.config = DrillConfig.create(fileName); + return this; +} + +/** + * Sets the {@link BufferAllocator buffer allocator} to be used by this client. + * If this is not set, an allocator will be created based on the configuration. --- End diff -- But, what if I don't provide a config? Or, create my own? What do I have to set in that config to make it work? More to the point, WHEN do I have to provide my own root? When I do multiple clients in a single app? If so, can we provide a short example showing how this works? Even easier, can the builder itself create a static allocator that it shares across connections? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #565: DRILL-4841: Use server event loop for web clients
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/565#discussion_r82267967 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -688,4 +802,142 @@ public DrillBuf getBuffer() { return null; } } + + /** + * Return a new {@link DrillClient.Builder Drill client builder}. + * @return a new builder + */ + public static Builder newBuilder() { +return new Builder(); + } + + /** + * Helper class to construct a {@link DrillClient Drill client}. + */ + public static class Builder { + +private DrillConfig config; +private BufferAllocator allocator; +private ClusterCoordinator clusterCoordinator; +private EventLoopGroup eventLoopGroup; +private ExecutorService executor; + +// defaults +private boolean supportComplexTypes = true; +private boolean isDirectConnection = false; + +/** + * Sets the {@link DrillConfig configuration} for this client. + * + * @param drillConfig drill configuration + * @return this builder + */ +public Builder setConfig(DrillConfig drillConfig) { + this.config = drillConfig; + return this; +} + +/** + * Sets the {@link DrillConfig configuration} for this client based on the given file. + * + * @param fileName configuration file name + * @return this builder + */ +public Builder setConfigFromFile(final String fileName) { + this.config = DrillConfig.create(fileName); + return this; +} + +/** + * Sets the {@link BufferAllocator buffer allocator} to be used by this client. + * If this is not set, an allocator will be created based on the configuration. + * + * If this is set, the caller is responsible for closing the given allocator. + * + * @param allocator buffer allocator + * @return this builder + */ +public Builder setAllocator(final BufferAllocator allocator) { + this.allocator = allocator; + return this; +} + +/** + * Sets the {@link ClusterCoordinator cluster coordinator} that this client + * registers with. If this is not set and the this client does not use a + * {@link #setDirectConnection direct connection}, a cluster coordinator will + * be created based on the configuration. + * + * If this is set, the caller is responsible for closing the given coordinator. + * + * @param clusterCoordinator cluster coordinator + * @return this builder + */ +public Builder setClusterCoordinator(final ClusterCoordinator clusterCoordinator) { + this.clusterCoordinator = clusterCoordinator; + return this; +} + +/** + * Sets the event loop group that to be used by the client. If this is not set, + * an event loop group will be created based on the configuration. + * + * If this is set, the caller is responsible for closing the given event loop group. + * + * @param eventLoopGroup event loop group + * @return this builder + */ +public Builder setEventLoopGroup(final EventLoopGroup eventLoopGroup) { + this.eventLoopGroup = eventLoopGroup; + return this; +} + +/** + * Sets the executor service to be used by the client. If this is not set, + * an executor will be created based on the configuration. + * + * If this is set, the caller is responsible for closing the given executor. + * + * @param executor executor service + * @return this builder + */ +public Builder setExecutorService(final ExecutorService executor) { + this.executor = executor; + return this; +} + +/** + * Sets whether the application is willing to accept complex types (Map, Arrays) + * in the returned result set. Default is {@code true}. If set to {@code false}, + * the complex types are returned as JSON encoded VARCHAR type. + * + * @param supportComplexTypes if client accepts complex types + * @return this builder + */ +public Builder setSupportsComplexTypes(final boolean supportComplexTypes) { + this.supportComplexTypes = supportComplexTypes; + return this; +} + +/** + * Sets whether the client will connect directly to the drillbit instead of going --- End diff -- Here (or in the class comment), explain how to use this with an embedded Drillbit. There is no ZK in that case. Would I use a direct
[GitHub] drill pull request #565: DRILL-4841: Use server event loop for web clients
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/565#discussion_r82266574 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -688,4 +802,142 @@ public DrillBuf getBuffer() { return null; } } + + /** + * Return a new {@link DrillClient.Builder Drill client builder}. + * @return a new builder + */ + public static Builder newBuilder() { +return new Builder(); + } + + /** + * Helper class to construct a {@link DrillClient Drill client}. + */ + public static class Builder { + +private DrillConfig config; +private BufferAllocator allocator; +private ClusterCoordinator clusterCoordinator; +private EventLoopGroup eventLoopGroup; +private ExecutorService executor; + +// defaults +private boolean supportComplexTypes = true; +private boolean isDirectConnection = false; + +/** + * Sets the {@link DrillConfig configuration} for this client. + * + * @param drillConfig drill configuration + * @return this builder + */ +public Builder setConfig(DrillConfig drillConfig) { --- End diff -- DrillConfig objects are pretty complex. Should we accept the underlying Config (TypeSafe object) instead/in addition? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #565: DRILL-4841: Use server event loop for web clients
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/565#discussion_r82269228 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/auth/DrillUserPrincipal.java --- @@ -118,8 +118,11 @@ public AnonDrillUserPrincipal(final DrillbitContext drillbitContext) { public DrillClient getDrillClient() throws IOException { try { // Create a DrillClient -drillClient = new DrillClient(drillbitContext.getConfig(), -drillbitContext.getClusterCoordinator(), drillbitContext.getAllocator()); +drillClient = DrillClient.newBuilder() --- End diff -- Is the context something we should expose as part of our changes: DrillbitContext context = ... DrillClient client = DrillClient.builder( ) .fromContext( context ) .build( ); Doing the above might simplify test code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #565: DRILL-4841: Use server event loop for web clients
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/565#discussion_r82268707 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/auth/AbstractDrillLoginService.java --- @@ -46,8 +46,11 @@ protected DrillClient createDrillClient(final String userName, final String pass try { // Create a DrillClient - drillClient = new DrillClient(drillbitContext.getConfig(), - drillbitContext.getClusterCoordinator(), drillbitContext.getAllocator()); + drillClient = DrillClient.newBuilder() + .setConfig(drillbitContext.getConfig()) + .setClusterCoordinator(drillbitContext.getClusterCoordinator()) + .setAllocator(drillbitContext.getAllocator()) + .build(); --- End diff -- Should the builder be extended to also (optionally) do connect? Or, return a connect builder? Here I'm thinking that it would be helpful to have a single builder gather things like the user name property and the connection string (which would seem to be a DrillClient parameter but is actually a connection property.) That is, either: DrillClient.builder( ) ... .withUser( userName ) .connectTo( "myHost", 1234 ) .build( ); Or DrillClient.builder( ) ... .buildClient( ) .withUser( userName ) .connectTo( "myHost", 1234 ) .connect( ); --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #565: DRILL-4841: Use server event loop for web clients
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/565#discussion_r82295775 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -89,79 +88,175 @@ import com.google.common.annotations.VisibleForTesting; import com.google.common.base.Strings; import com.google.common.util.concurrent.AbstractCheckedFuture; +import com.google.common.util.concurrent.MoreExecutors; import com.google.common.util.concurrent.SettableFuture; /** * Thin wrapper around a UserClient that handles connect/close and transforms * String into ByteBuf. + * + * To create non-default objects, use {@link DrillClient.Builder the builder class}. + * E.g. + * + * DrillClient client = DrillClient.newBuilder() + * .setConfig(...) + * .setIsDirectConnection(true) + * .build(); + * + * + * Except for {@link #runQuery} and {@link #cancelQuery}, this class is generally not thread safe. */ public class DrillClient implements Closeable, ConnectionThrottle { private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(DrillClient.class); private static final ObjectMapper objectMapper = new ObjectMapper(); private final DrillConfig config; - private UserClient client; - private UserProperties props = null; - private volatile ClusterCoordinator clusterCoordinator; - private volatile boolean connected = false; private final BufferAllocator allocator; - private int reconnectTimes; - private int reconnectDelay; - private boolean supportComplexTypes; - private final boolean ownsZkConnection; + private final EventLoopGroup eventLoopGroup; + private final ExecutorService executor; + private final boolean isDirectConnection; + private final int reconnectTimes; + private final int reconnectDelay; + + // TODO: clusterCoordinator should be initialized in the constructor. + // Currently, initialization is tightly coupled with #connect. + private ClusterCoordinator clusterCoordinator; + + // checks if this client owns these resources (used when closing) private final boolean ownsAllocator; - private final boolean isDirectConnection; // true if the connection bypasses zookeeper and connects directly to a drillbit - private EventLoopGroup eventLoopGroup; - private ExecutorService executor; + private final boolean ownsZkConnection; + private final boolean ownsEventLoopGroup; + private final boolean ownsExecutor; + + // once #setSupportComplexTypes() is removed, make this final + private boolean supportComplexTypes; + + private UserClient client; + private UserProperties props; + private boolean connected; - public DrillClient() throws OutOfMemoryException { -this(DrillConfig.create(), false); + public DrillClient() { +this(newBuilder()); } - public DrillClient(boolean isDirect) throws OutOfMemoryException { -this(DrillConfig.create(), isDirect); + /** + * @deprecated Create a DrillClient using {@link DrillClient.Builder}. + */ + @Deprecated + public DrillClient(boolean isDirect) { +this(newBuilder() +.setDirectConnection(isDirect)); } - public DrillClient(String fileName) throws OutOfMemoryException { -this(DrillConfig.create(fileName), false); + /** + * @deprecated Create a DrillClient using {@link DrillClient.Builder}. + */ + @Deprecated + public DrillClient(String fileName) { +this(newBuilder() +.setConfigFromFile(fileName)); } - public DrillClient(DrillConfig config) throws OutOfMemoryException { -this(config, null, false); + /** + * @deprecated Create a DrillClient using {@link DrillClient.Builder}. + */ + @Deprecated + public DrillClient(DrillConfig config) { +this(newBuilder() +.setConfig(config)); } - public DrillClient(DrillConfig config, boolean isDirect) - throws OutOfMemoryException { -this(config, null, isDirect); + /** + * @deprecated Create a DrillClient using {@link DrillClient.Builder}. + */ + @Deprecated + public DrillClient(DrillConfig config, boolean isDirect) { +this(newBuilder() +.setConfig(config) +.setDirectConnection(isDirect)); } - public DrillClient(DrillConfig config, ClusterCoordinator coordinator) -throws OutOfMemoryException { -this(config, coordinator, null, false); + /** + * @deprecated Create a DrillClient using {@link DrillClient.Builder}. + */ + @Deprecated + public DrillClient(DrillConfig
[GitHub] drill pull request #565: DRILL-4841: Use server event loop for web clients
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/565#discussion_r82265866 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -688,4 +802,142 @@ public DrillBuf getBuffer() { return null; } } + + /** + * Return a new {@link DrillClient.Builder Drill client builder}. + * @return a new builder + */ + public static Builder newBuilder() { +return new Builder(); + } + + /** + * Helper class to construct a {@link DrillClient Drill client}. + */ + public static class Builder { --- End diff -- Seems useful/visible enough to justify being a top-level class, with its own file. That is, any reason not to say: DrillClient client = new ClientBuilder( ) ... .build( ); Rather than: DrillClient client = DrillClient.builder( ) ... .build( ); Because the Builder becomes the gateway to Drill, having extensive JavaDoc comments would be very helpful. See suggestions below. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #565: DRILL-4841: Use server event loop for web clients
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/565#discussion_r82269525 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestDistributedFragmentRun.java --- @@ -41,7 +41,11 @@ public void oneBitOneExchangeOneEntryRun() throws Exception{ RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet(); -try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); DrillClient client = new DrillClient(CONFIG, serviceSet.getCoordinator());){ +try(Drillbit bit1 = new Drillbit(CONFIG, serviceSet); --- End diff -- This pattern show up over and over. Should it just be moved into a function rather than as a duplicated "code injection" in a zillion files? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #604: DRILL-4862: binary_string should use another buffer...
Github user gparai commented on a diff in the pull request: https://github.com/apache/drill/pull/604#discussion_r82294640 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java --- @@ -1540,15 +1540,16 @@ public void eval() { public static class BinaryString implements DrillSimpleFunc { @Param VarCharHolder in; @Output VarBinaryHolder out; +@Inject DrillBuf buffer; @Override public void setup() {} @Override public void eval() { - out.buffer = in.buffer; - out.start = in.start; - out.end = org.apache.drill.common.util.DrillStringUtils.parseBinaryString(in.buffer, in.start, in.end); + out.buffer = buffer.reallocIfNeeded(in.end - in.start); + out.start = out.end = 0; + out.end = org.apache.drill.common.util.DrillStringUtils.parseBinaryString(in.buffer, in.start, in.end, out.buffer); out.buffer.setIndex(out.start, out.end); --- End diff -- Do we still need to set the index? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #604: DRILL-4862: binary_string should use another buffer...
Github user gparai commented on a diff in the pull request: https://github.com/apache/drill/pull/604#discussion_r82293808 --- Diff: common/src/main/java/org/apache/drill/common/util/DrillStringUtils.java --- @@ -160,10 +160,9 @@ private static void appendByte(StringBuilder result, byte b) { * * @return Index in the byte buffer just after the last written byte. */ - public static int parseBinaryString(ByteBuf str, int strStart, int strEnd) { -int length = (strEnd - strStart); -int dstEnd = strStart; -for (int i = strStart; i < strStart+length ; i++) { + public static int parseBinaryString(ByteBuf str, int strStart, int strEnd, ByteBuf out) { +int dstEnd = 0; +for (int i = strStart; i < strEnd ; i++) { --- End diff -- Please change to `strEnd;` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #600: DRILL-4373: Drill and Hive have incompatible timestamp rep...
Github user vdiravka commented on the issue: https://github.com/apache/drill/pull/600 @bitblender Sorry about this. That was hidden `\u` symbols. Fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #604: DRILL-4862: binary_string should use another buffer...
Github user gparai commented on a diff in the pull request: https://github.com/apache/drill/pull/604#discussion_r82293523 --- Diff: common/src/main/java/org/apache/drill/common/util/DrillStringUtils.java --- @@ -160,10 +160,9 @@ private static void appendByte(StringBuilder result, byte b) { * --- End diff -- Please update the comment - it no longer does in-place parsing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #595: DRILL-4203: Parquet File. Date is stored wrongly
Github user jaltekruse commented on a diff in the pull request: https://github.com/apache/drill/pull/595#discussion_r82291973 --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java --- @@ -123,6 +123,7 @@ public ScanBatch getBatch(FragmentContext context, HiveDrillNativeParquetSubScan // in the first row group ParquetReaderUtility.DateCorruptionStatus containsCorruptDates = ParquetReaderUtility.detectCorruptDates(parquetMetadata, config.getColumns(), true); + logger.info(containsCorruptDates.toString()); --- End diff -- Sounds good --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #595: DRILL-4203: Parquet File. Date is stored wrongly
Github user jaltekruse commented on a diff in the pull request: https://github.com/apache/drill/pull/595#discussion_r82292165 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java --- @@ -184,17 +201,15 @@ public static DateCorruptionStatus detectCorruptDates(ParquetMetadata footer, if (parsedCreatedByVersion.hasSemanticVersion()) { SemanticVersion semVer = parsedCreatedByVersion.getSemanticVersion(); String pre = semVer.pre + ""; -if (semVer != null && semVer.major == 1 && semVer.minor == 8 && semVer.patch == 1 && pre.contains("drill")) { --- End diff -- Sorry I should have looked through the rest of the changes, thank you for fixing this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #595: DRILL-4203: Parquet File. Date is stored wrongly
Github user jaltekruse commented on a diff in the pull request: https://github.com/apache/drill/pull/595#discussion_r82292241 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java --- @@ -107,6 +107,7 @@ public ScanBatch getBatch(FragmentContext context, ParquetRowGroupScan rowGroupS boolean autoCorrectCorruptDates = rowGroupScan.formatConfig.autoCorrectCorruptDates; ParquetReaderUtility.DateCorruptionStatus containsCorruptDates = ParquetReaderUtility.detectCorruptDates(footers.get(e.getPath()), rowGroupScan.getColumns(), autoCorrectCorruptDates); +logger.info(containsCorruptDates.toString()); --- End diff -- addressed, in other similar comment, this is good --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #595: DRILL-4203: Parquet File. Date is stored wrongly
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/595#discussion_r82287507 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java --- @@ -184,17 +201,15 @@ public static DateCorruptionStatus detectCorruptDates(ParquetMetadata footer, if (parsedCreatedByVersion.hasSemanticVersion()) { SemanticVersion semVer = parsedCreatedByVersion.getSemanticVersion(); String pre = semVer.pre + ""; -if (semVer != null && semVer.major == 1 && semVer.minor == 8 && semVer.patch == 1 && pre.contains("drill")) { --- End diff -- If `semVer `will be null we get NPE above, where `semVer.pre` is called. It happened with tests on our regression test framework. That's why I added `hasSemanticVersion()` above. So if this method return `true`, `semVer `cann't be `NULL` according `org.apache.parquet.SemanticVersion` and the following `semVer!=null` checking is redundant. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #595: DRILL-4203: Parquet File. Date is stored wrongly
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/595#discussion_r82290736 --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java --- @@ -123,6 +123,7 @@ public ScanBatch getBatch(FragmentContext context, HiveDrillNativeParquetSubScan // in the first row group ParquetReaderUtility.DateCorruptionStatus containsCorruptDates = ParquetReaderUtility.detectCorruptDates(parquetMetadata, config.getColumns(), true); + logger.info(containsCorruptDates.toString()); --- End diff -- But it is not a regular Boolean value, `DateCorruptionStatus` is an enum with overrided toString method. For example how log file looks after querying hive parquet file with correct date values: `2016-10-06 23:32:30,598 [280920f1-e362-136e-0fdd-24779fef2c4a:frag:0:0] INFO o.a.d.e.s.p.ParquetScanBatchCreator - It is determined from metadata that the date values are definitely CORRECT` Is it enought? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [HANGOUT] Topics for 10/04/16
For the PR I didn't squash the changes into one single commit to make it easier to review, but let me know if you would prefer the changes or pull requests organized differently to speed up the review... On Thu, Oct 6, 2016 at 11:50 AM, Parth Chandrawrote: > I see, so it is not the prepare query where we see the improvement. I > didn't realise that the BI tools can make so many information schema > queries. > > I haven't been able to go thru the PR in detail. There is extensive > refactoring that always makes it hard to tell the exact changes. (Make me > nervous as we don't have _any_ tests to test for memory leaks, concurrency > and error conditions). I'll try to test some of these conditions out. > > Re DRILL-4280 - I was basing my comment on the PR comment. I'm assuming > Sudheesh will follow up and address this. > > > On Wed, Oct 5, 2016 at 1:37 PM, Jacques Nadeau wrote: > > > -user (since this is really a dev discussion) > > > > Looking at DRILL-4280, I see the comment about C++ changes but don't see > > them there or in the linked squash branch. Maybe I'm missing something > > obvious? > > > > The key performance benefits from the provided patch are about all the > > other interrogations that Tableau (and other BI tools) makes against the > > ODBC metadata interfaces (not prepare). With the new driver, those items > > can be answered directly (such as list of schemas) rather than through > ODBC > > driver generated queries. I've commonly seen situations where even a > simple > > tableau workflow can take 7-10s even in the situation that the actual > > "work" query (and associated limit 0 query) is quite short (<1s). This > has > > been especially problematic in situations where you're interacting with a > > large number of sources since those information schema queries are not > > always optimal to what is actually needed. > > > > With regards to benchmarks, we haven't done anything formal but you > should > > be able to easily do a comparison and see the improvements, especially > when > > dealing with larger catalogs. > > > > > > > > -- > > Jacques Nadeau > > CTO and Co-Founder, Dremio > > > > On Wed, Oct 5, 2016 at 9:49 AM, Parth Chandra > > wrote: > > > > > Yup. I agree that we need to make sure that both clients are in sync. I > > > believe DRILL-4280's PR refers to making changes in both APIs as well. > > > > > > Do you have a sense of how these changes give us a performance boost? > As > > > far as I can see, the APIs result in nearly the same code path being > > > executed, with the difference being that the limit 0 query is now > > submitted > > > by the server instead of the client. > > > > > > I don't know much about the tweaking of performance for various BI > tools; > > > is there something that Tableau et al do different? I don't see how, > > since > > > the the ODBC/JDBC interface remains the same. Just trying to understand > > > this. > > > > > > Anyway, any performance gain is wonderful. Do you have any numbers to > > > share? > > > > > > > > > On Tue, Oct 4, 2016 at 10:29 AM, Jacques Nadeau > > > wrote: > > > > > > > Both the C++ and the JDBC changes are updates that leverage a number > of > > > > pre-existing APIs already on the server. Our initial evaluations, we > > have > > > > already seen substantially improved BI tool performance with the > > proposed > > > > changes (with no additional server side changes). Are you seeing > > > something > > > > different? If you haven't yet looked at the changes in that light, I > > > > suggest you do. > > > > > > > > If anything, I'm more concerned about client feature proposals that > > don't > > > > cover both the C++ and Java client. For example, I think we should be > > > > cautious about merging something like DRILL-4280. We should be > cautious > > > > about introducing new server APIs unless there is a concrete plan > > around > > > > support in all clients. > > > > > > > > So I agree with the spirit of your ask: change proposals should be > > > > "complete". However, I don't think it reasonably applies to the > changes > > > > proposed by Laurent. His changes "complete" the already introduced > > > metadata > > > > and prepare apis the server exposes. It provides an improved BI user > > > > experience. It also introduces unit tests in the C++ client, > something > > > that > > > > was previously sorely missing. > > > > > > > > > > > > > > > > -- > > > > Jacques Nadeau > > > > CTO and Co-Founder, Dremio > > > > > > > > On Tue, Oct 4, 2016 at 9:47 AM, Parth Chandra > > > > > wrote: > > > > > > > > > Hi guys, > > > > > > > > > > I won't be able to join the hangout but it would be good to > discuss > > > the > > > > > plan for the related backend changes. > > > > > > > > > > As I mentioned before I would like to see a concrete proposal for > > the > > > > > backend that will accompany these changes. Without that,
[GitHub] drill pull request #595: DRILL-4203: Parquet File. Date is stored wrongly
Github user jaltekruse commented on a diff in the pull request: https://github.com/apache/drill/pull/595#discussion_r82274813 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java --- @@ -184,17 +201,15 @@ public static DateCorruptionStatus detectCorruptDates(ParquetMetadata footer, if (parsedCreatedByVersion.hasSemanticVersion()) { SemanticVersion semVer = parsedCreatedByVersion.getSemanticVersion(); String pre = semVer.pre + ""; -if (semVer != null && semVer.major == 1 && semVer.minor == 8 && semVer.patch == 1 && pre.contains("drill")) { --- End diff -- I don't know if you are guarding against a null value for semVer at a higher level, but I'm pretty sure the reason I added it is that older versions of parquet files can be lacking this field. I assume with all of the tests cases that were added we should be okay, but it might be better to keep it in defensively. If this field isn't set in the metadata, it shouldn't make it an invalid file, but if we get an NPE it will stop query processing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #595: DRILL-4203: Parquet File. Date is stored wrongly
Github user jaltekruse commented on a diff in the pull request: https://github.com/apache/drill/pull/595#discussion_r82273282 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java --- @@ -918,18 +916,22 @@ public void setMax(Object max) { @JsonProperty public ConcurrentHashMapcolumnTypeInfo; @JsonProperty List files; @JsonProperty List directories; -@JsonProperty String drillVersion; --- End diff -- This sounds reasonable, having both is okay. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #595: DRILL-4203: Parquet File. Date is stored wrongly
Github user jaltekruse commented on a diff in the pull request: https://github.com/apache/drill/pull/595#discussion_r82272848 --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java --- @@ -123,6 +123,7 @@ public ScanBatch getBatch(FragmentContext context, HiveDrillNativeParquetSubScan // in the first row group ParquetReaderUtility.DateCorruptionStatus containsCorruptDates = ParquetReaderUtility.detectCorruptDates(parquetMetadata, config.getColumns(), true); + logger.info(containsCorruptDates.toString()); --- End diff -- not a very useful logging statement to check in, should at least contain a message about what this value is. This will just print a boolean value. If it was just for local debugging it can be removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #574: DRILL-4726: Dynamic UDFs support
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/574#discussion_r82262083 --- Diff: exec/java-exec/src/main/resources/drill-module.conf --- @@ -194,10 +194,12 @@ drill.exec: { udf: { retry-attempts: 5, directory: { - base: ${drill.home}"/"${drill.exec.zk.root}"/udf", - staging: ${drill.exec.udf.directory.base}"/staging", - registry: ${drill.exec.udf.directory.base}"/registry", - tmp: ${drill.exec.udf.directory.base}"/tmp" + base: ${drill.exec.zk.root}"/udf", + local: ${drill.exec.udf.directory.base}"/local", --- End diff -- How is this used? The path given by local seems relative: drill/udf Maybe include a simple comment to describe each config setting. Otherwise, I like how this has come together! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #574: DRILL-4726: Dynamic UDFs support
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/574#discussion_r82260717 --- Diff: exec/java-exec/src/main/resources/drill-module.conf --- @@ -45,7 +45,7 @@ drill.client: { supports-complex-types: true } -drill.home: "/tmp" +drill.dfs-home: "/tmp" --- End diff -- On a standard HDFS setup, is there a /tmp folder? All the examples suggest that stuff often goes into "/user/something" such as "/user/drill". Also, if "/tmp" behaves like a Linux /tmp, might something come along and clean up the UDFs during a long Drill run? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #574: DRILL-4726: Dynamic UDFs support
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/574#discussion_r82251309 --- Diff: distribution/src/resources/drill-config.sh --- @@ -324,6 +324,21 @@ if [ -n "$DRILL_CLASSPATH" ]; then CP="$CP:$DRILL_CLASSPATH" fi +# If tmp dir is given, it must exist. +if [ -n "$DRILL_TMP_DIR" ]; then + if [[ ! -d "$DRILL_TMP_DIR" ]]; then +fatal_error "temporary dir does not exist:" $DRILL_TMP_DIR --- End diff -- Capitalize "Temporary". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #574: DRILL-4726: Dynamic UDFs support
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/574#discussion_r82252246 --- Diff: distribution/src/resources/sqlline.bat --- @@ -114,6 +114,10 @@ if "test%DRILL_LOG_DIR%" == "test" ( set DRILL_LOG_DIR=%DRILL_HOME%\log ) +if "test%DRILL_TMP_DIR%" == "test" ( + set DRILL_TMP_DIR=\tmp --- End diff -- On Windows, I think the standard is to use %TEMP%. There generally is no directory called "\tmp", sometimes there is a "\temp". But %TEMP% is the accepted practice. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #574: DRILL-4726: Dynamic UDFs support
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/574#discussion_r82259602 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionImplementationRegistry.java --- @@ -378,38 +399,33 @@ private Path getLocalUdfDir() { } /** - * First tries to get drill conf directory value from system properties, + * First tries to get drill temporary directory value from system properties, * if value is missing, checks environment properties. * Throws exception is value is null. - * @return drill conf dir path + * @return drill temporary directory path */ - private String getConfDir() { -String drillConfDir = "DRILL_CONF_DIR"; -String value = System.getProperty(drillConfDir); + private String getTmpDir() { --- End diff -- Can we be more forgiving here? 1. Use DRILL_TMP_DIR, if set. 2. Use a config file setting, if set. 3. Use Google's Files.createTempDir( ) which "Atomically creates a new directory somewhere beneath the system's temporary directory (as defined by the java.io.tmpdir system property)" For most users, the choice in 3 should work fine. It would only be folks who have special needs that would set one of the other two properties. Then, we can use a TypeSafe trick to combine 1 and 2. Define the config property something like this: drill.temp-dir: "${DRILL_TMP_DIR" Now, you just have to check drill.temp-dir in your function. If not set, use the Files approach as a default. The nice thing about the Files approach is that each Drillbit will have a different directory (if two happen to be running (with different ports) at the same time.) I wonder, however, does the Files temp directory get deleted on Drillbit exit? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [HANGOUT] Topics for 10/04/16
I see, so it is not the prepare query where we see the improvement. I didn't realise that the BI tools can make so many information schema queries. I haven't been able to go thru the PR in detail. There is extensive refactoring that always makes it hard to tell the exact changes. (Make me nervous as we don't have _any_ tests to test for memory leaks, concurrency and error conditions). I'll try to test some of these conditions out. Re DRILL-4280 - I was basing my comment on the PR comment. I'm assuming Sudheesh will follow up and address this. On Wed, Oct 5, 2016 at 1:37 PM, Jacques Nadeauwrote: > -user (since this is really a dev discussion) > > Looking at DRILL-4280, I see the comment about C++ changes but don't see > them there or in the linked squash branch. Maybe I'm missing something > obvious? > > The key performance benefits from the provided patch are about all the > other interrogations that Tableau (and other BI tools) makes against the > ODBC metadata interfaces (not prepare). With the new driver, those items > can be answered directly (such as list of schemas) rather than through ODBC > driver generated queries. I've commonly seen situations where even a simple > tableau workflow can take 7-10s even in the situation that the actual > "work" query (and associated limit 0 query) is quite short (<1s). This has > been especially problematic in situations where you're interacting with a > large number of sources since those information schema queries are not > always optimal to what is actually needed. > > With regards to benchmarks, we haven't done anything formal but you should > be able to easily do a comparison and see the improvements, especially when > dealing with larger catalogs. > > > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Wed, Oct 5, 2016 at 9:49 AM, Parth Chandra > wrote: > > > Yup. I agree that we need to make sure that both clients are in sync. I > > believe DRILL-4280's PR refers to making changes in both APIs as well. > > > > Do you have a sense of how these changes give us a performance boost? As > > far as I can see, the APIs result in nearly the same code path being > > executed, with the difference being that the limit 0 query is now > submitted > > by the server instead of the client. > > > > I don't know much about the tweaking of performance for various BI tools; > > is there something that Tableau et al do different? I don't see how, > since > > the the ODBC/JDBC interface remains the same. Just trying to understand > > this. > > > > Anyway, any performance gain is wonderful. Do you have any numbers to > > share? > > > > > > On Tue, Oct 4, 2016 at 10:29 AM, Jacques Nadeau > > wrote: > > > > > Both the C++ and the JDBC changes are updates that leverage a number of > > > pre-existing APIs already on the server. Our initial evaluations, we > have > > > already seen substantially improved BI tool performance with the > proposed > > > changes (with no additional server side changes). Are you seeing > > something > > > different? If you haven't yet looked at the changes in that light, I > > > suggest you do. > > > > > > If anything, I'm more concerned about client feature proposals that > don't > > > cover both the C++ and Java client. For example, I think we should be > > > cautious about merging something like DRILL-4280. We should be cautious > > > about introducing new server APIs unless there is a concrete plan > around > > > support in all clients. > > > > > > So I agree with the spirit of your ask: change proposals should be > > > "complete". However, I don't think it reasonably applies to the changes > > > proposed by Laurent. His changes "complete" the already introduced > > metadata > > > and prepare apis the server exposes. It provides an improved BI user > > > experience. It also introduces unit tests in the C++ client, something > > that > > > was previously sorely missing. > > > > > > > > > > > > -- > > > Jacques Nadeau > > > CTO and Co-Founder, Dremio > > > > > > On Tue, Oct 4, 2016 at 9:47 AM, Parth Chandra > > > wrote: > > > > > > > Hi guys, > > > > > > > > I won't be able to join the hangout but it would be good to discuss > > the > > > > plan for the related backend changes. > > > > > > > > As I mentioned before I would like to see a concrete proposal for > the > > > > backend that will accompany these changes. Without that, I feel there > > is > > > no > > > > point to adding so much new code. > > > > > > > > Thanks > > > > > > > > Parth > > > > > > > > > > > > On Mon, Oct 3, 2016 at 7:52 PM, Laurent Goujon > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > I'm currently working on improving metadata support for both the > JDBC > > > > > driver and the C++ connector, more specifically the following > JIRAs: > > > > > > > > > > DRILL-4853: Update C++ protobuf source files > > > > > DRILL-4420: Server-side metadata and