[jira] [Commented] (ORC-345) Create a Decimal64StatisticsImpl
[ https://issues.apache.org/jira/browse/ORC-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461332#comment-16461332 ] ASF GitHub Bot commented on ORC-345: Github user omalley commented on the issue: https://github.com/apache/orc/pull/252 Ok, I update this pull request: * I changed updateDecimal64 to take a value and scale. * I check for more overflows in Decimal64ColumnStatisticsImpl. * I added a test that tests the new code. > Create a Decimal64StatisticsImpl > > > Key: ORC-345 > URL: https://issues.apache.org/jira/browse/ORC-345 > Project: ORC > Issue Type: Improvement >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Major > > We should create a fast path for handling decimal statistics where precision > <= 18 where the values can be handled as longs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ORC-357) Use orc::InputStream in getTimezoneByFilename
[ https://issues.apache.org/jira/browse/ORC-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461350#comment-16461350 ] ASF GitHub Bot commented on ORC-357: GitHub user rip-nsk opened a pull request: https://github.com/apache/orc/pull/263 ORC-357: [C++] Use orc::InputStream in getTimezoneByFilename You can merge this pull request into a Git repository by running: $ git pull https://github.com/rip-nsk/orc _ORC-357 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/orc/pull/263.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #263 commit 07fe9ed11a1443bef245723237a3013b2df266e9 Author: rip-nskDate: 2018-05-02T17:29:58Z Use orc::InputStream in getTimezoneByFilename > Use orc::InputStream in getTimezoneByFilename > - > > Key: ORC-357 > URL: https://issues.apache.org/jira/browse/ORC-357 > Project: ORC > Issue Type: Improvement > Components: C++ >Reporter: rip.nsk >Priority: Minor > > This simplify function and makes it portable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ORC-352) Update, cleanup and add support of MSVC to ThirdpartyToolchain
[ https://issues.apache.org/jira/browse/ORC-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved ORC-352. --- Resolution: Fixed Fix Version/s: 1.5.0 I committed this. Thanks, RIP! > Update, cleanup and add support of MSVC to ThirdpartyToolchain > -- > > Key: ORC-352 > URL: https://issues.apache.org/jira/browse/ORC-352 > Project: ORC > Issue Type: Improvement > Components: C++ >Reporter: rip.nsk >Assignee: rip.nsk >Priority: Major > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ORC-356) fix and extend Adaptor (.cc/.hh)
[ https://issues.apache.org/jira/browse/ORC-356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved ORC-356. --- Resolution: Fixed Fix Version/s: 1.5.0 I committed this. Thanks, RIP! > fix and extend Adaptor (.cc/.hh) > > > Key: ORC-356 > URL: https://issues.apache.org/jira/browse/ORC-356 > Project: ORC > Issue Type: Improvement > Components: C++ >Reporter: rip.nsk >Assignee: rip.nsk >Priority: Major > Fix For: 1.5.0 > > > Currently C09Adapter.cc is excluded from the build (and includes non-exist > C09Adapter.hh). > I'll fix this and extend it by adopters to support windows/msvc build. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ORC-357) Use orc::InputStream in getTimezoneByFilename
[ https://issues.apache.org/jira/browse/ORC-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461360#comment-16461360 ] ASF GitHub Bot commented on ORC-357: Github user rip-nsk commented on a diff in the pull request: https://github.com/apache/orc/pull/263#discussion_r185577344 --- Diff: c++/src/Timezone.cc --- @@ -698,40 +694,15 @@ namespace orc { if (itr != timezoneCache.end()) { return *(itr->second).get(); } -int in = open(filename.c_str(), O_RDONLY); -if (in == -1) { - std::stringstream buffer; - buffer << "failed to open " << filename << " - " << strerror(errno); - throw TimezoneError(buffer.str()); -} -struct stat fileInfo; -if (fstat(in, ) == -1) { - std::stringstream buffer; - buffer << "failed to stat " << filename << " - " << strerror(errno); - throw TimezoneError(buffer.str()); -} -if ((fileInfo.st_mode & S_IFMT) != S_IFREG) { - std::stringstream buffer; - buffer << "non-file in tzfile reader " << filename; - throw TimezoneError(buffer.str()); -} -size_t size = static_cast(fileInfo.st_size); -std::vector buffer(size); -size_t posn = 0; -while (posn < size) { - ssize_t ret = read(in, [posn], size - posn); - if (ret == -1) { -throw TimezoneError(std::string("Failure to read timezone file ") + -filename + " - " + strerror(errno)); - } - posn += static_cast(ret); -} -if (close(in) == -1) { - std::stringstream err; - err << "failed to close " << filename << " - " << strerror(errno); - throw TimezoneError(err.str()); +try { + ORC_UNIQUE_PTR file = readFile(filename); --- End diff -- 'auto' is more suitable here > Use orc::InputStream in getTimezoneByFilename > - > > Key: ORC-357 > URL: https://issues.apache.org/jira/browse/ORC-357 > Project: ORC > Issue Type: Improvement > Components: C++ >Reporter: rip.nsk >Priority: Minor > > This simplify function and makes it portable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ORC-354) Restore the benchmarks after clarification from apache legal
[ https://issues.apache.org/jira/browse/ORC-354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved ORC-354. --- Resolution: Fixed Fix Version/s: 1.5.0 > Restore the benchmarks after clarification from apache legal > > > Key: ORC-354 > URL: https://issues.apache.org/jira/browse/ORC-354 > Project: ORC > Issue Type: Improvement >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Major > Fix For: 1.5.0 > > > Ok, we can add back the benchmarks as long as the component isn't required > for users and we don't distribute the binaries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ORC-354) Restore the benchmarks after clarification from apache legal
[ https://issues.apache.org/jira/browse/ORC-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461351#comment-16461351 ] ASF GitHub Bot commented on ORC-354: Github user omalley closed the pull request at: https://github.com/apache/orc/pull/262 > Restore the benchmarks after clarification from apache legal > > > Key: ORC-354 > URL: https://issues.apache.org/jira/browse/ORC-354 > Project: ORC > Issue Type: Improvement >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Major > > Ok, we can add back the benchmarks as long as the component isn't required > for users and we don't distribute the binaries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ORC-341) Support time zone as a parameter for Java reader and writer
[ https://issues.apache.org/jira/browse/ORC-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461644#comment-16461644 ] ASF GitHub Bot commented on ORC-341: Github user jcamachor commented on the issue: https://github.com/apache/orc/pull/249 I have been testing the patch from Hive and everything seems to be working as expected. I have rebased the patch and merge both commits. Also, I had to extend my changes to the newly created ```WriterImplV2```. @omalley , @wgtmac , could you take a final look and merge if it is OK? Thanks > Support time zone as a parameter for Java reader and writer > --- > > Key: ORC-341 > URL: https://issues.apache.org/jira/browse/ORC-341 > Project: ORC > Issue Type: Improvement >Reporter: Jesus Camacho Rodriguez >Priority: Major > > Currently, time zone is hardcoded as the system default time zone and ORC > applies displacement between timestamp values read/written based on time zone. > This issue aims at adding the option to pass the time zone as a parameter to > the reader/writer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ORC-345) Create a Decimal64StatisticsImpl
[ https://issues.apache.org/jira/browse/ORC-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461667#comment-16461667 ] ASF GitHub Bot commented on ORC-345: Github user omalley commented on the issue: https://github.com/apache/orc/pull/252 Ok, with this and ORC-344, the difference in writing small vs large decimals to a null file system is huge: Benchmark (precision) Mode Cnt Score Error Units DecimalBench.write8 avgt 30 36914.791 ± 866.408 us/op DecimalBench.write 19 avgt 30 240789.318 ± 5146.880 us/op So I'm seeing a 6x speed up when using precision = 8 with the new code path instead of p = 19 and the old code path. > Create a Decimal64StatisticsImpl > > > Key: ORC-345 > URL: https://issues.apache.org/jira/browse/ORC-345 > Project: ORC > Issue Type: Improvement >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Major > > We should create a fast path for handling decimal statistics where precision > <= 18 where the values can be handled as longs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ORC-357) Use orc::InputStream in getTimezoneByFilename
[ https://issues.apache.org/jira/browse/ORC-357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rip.nsk reassigned ORC-357: --- Assignee: rip.nsk > Use orc::InputStream in getTimezoneByFilename > - > > Key: ORC-357 > URL: https://issues.apache.org/jira/browse/ORC-357 > Project: ORC > Issue Type: Improvement > Components: C++ >Reporter: rip.nsk >Assignee: rip.nsk >Priority: Minor > > This simplify function and makes it portable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ORC-345) Create a Decimal64StatisticsImpl
[ https://issues.apache.org/jira/browse/ORC-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461783#comment-16461783 ] ASF GitHub Bot commented on ORC-345: Github user prasanthj commented on a diff in the pull request: https://github.com/apache/orc/pull/252#discussion_r185673790 --- Diff: java/bench/src/java/org/apache/orc/bench/DecimalBench.java --- @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.orc.bench; + +import com.google.gson.JsonStreamParser; +import org.apache.avro.file.DataFileReader; +import org.apache.avro.generic.GenericDatumReader; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.io.DatumReader; +import org.apache.avro.mapred.FsInput; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.TrackingLocalFileSystem; +import org.apache.hadoop.hive.ql.exec.vector.ColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.Decimal64ColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; +import org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport; +import org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper; +import org.apache.hadoop.io.ArrayWritable; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.mapred.FileSplit; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.Reporter; +import org.apache.orc.CompressionKind; +import org.apache.orc.OrcFile; +import org.apache.orc.Reader; +import org.apache.orc.RecordReader; +import org.apache.orc.TypeDescription; +import org.apache.orc.Writer; +import org.apache.orc.bench.convert.BatchReader; +import org.apache.orc.bench.convert.GenerateVariants; +import org.apache.orc.bench.convert.csv.CsvReader; +import org.apache.parquet.hadoop.ParquetInputFormat; +import org.openjdk.jmh.annotations.AuxCounters; +import org.openjdk.jmh.annotations.Benchmark; +import org.openjdk.jmh.annotations.BenchmarkMode; +import org.openjdk.jmh.annotations.Fork; +import org.openjdk.jmh.annotations.Level; +import org.openjdk.jmh.annotations.Measurement; +import org.openjdk.jmh.annotations.Mode; +import org.openjdk.jmh.annotations.OutputTimeUnit; +import org.openjdk.jmh.annotations.Param; +import org.openjdk.jmh.annotations.Scope; +import org.openjdk.jmh.annotations.Setup; +import org.openjdk.jmh.annotations.State; +import org.openjdk.jmh.annotations.TearDown; +import org.openjdk.jmh.annotations.Warmup; +import org.openjdk.jmh.infra.Blackhole; +import org.openjdk.jmh.runner.Runner; +import org.openjdk.jmh.runner.options.OptionsBuilder; + +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.net.URI; +import java.nio.charset.StandardCharsets; +import java.util.concurrent.TimeUnit; + +@BenchmarkMode(Mode.AverageTime) +@Warmup(iterations=2, time=30, timeUnit = TimeUnit.SECONDS) +@Measurement(iterations=10, time=30, timeUnit = TimeUnit.SECONDS) +@State(Scope.Thread) +@OutputTimeUnit(TimeUnit.MICROSECONDS) +@Fork(2) +public class DecimalBench { + + private static final String ROOT_ENVIRONMENT_NAME = "bench.root.dir"; + private static final Path root; + static { +String value = System.getProperty(ROOT_ENVIRONMENT_NAME); +root = value == null ? null : new Path(value); + } + + /** + * Abstract out whether we are writing short or long decimals + */ + interface Loader { +/** + * Load the data from the values array into the ColumnVector. + * @param vector the output + * @param values the intput + * @param offset the first input value + *
[jira] [Commented] (ORC-345) Create a Decimal64StatisticsImpl
[ https://issues.apache.org/jira/browse/ORC-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461784#comment-16461784 ] ASF GitHub Bot commented on ORC-345: Github user prasanthj commented on a diff in the pull request: https://github.com/apache/orc/pull/252#discussion_r185673722 --- Diff: java/bench/src/java/org/apache/orc/bench/DecimalBench.java --- @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.orc.bench; + +import com.google.gson.JsonStreamParser; +import org.apache.avro.file.DataFileReader; +import org.apache.avro.generic.GenericDatumReader; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.io.DatumReader; +import org.apache.avro.mapred.FsInput; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.TrackingLocalFileSystem; +import org.apache.hadoop.hive.ql.exec.vector.ColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.Decimal64ColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; +import org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport; +import org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper; +import org.apache.hadoop.io.ArrayWritable; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.mapred.FileSplit; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.Reporter; +import org.apache.orc.CompressionKind; +import org.apache.orc.OrcFile; +import org.apache.orc.Reader; +import org.apache.orc.RecordReader; +import org.apache.orc.TypeDescription; +import org.apache.orc.Writer; +import org.apache.orc.bench.convert.BatchReader; +import org.apache.orc.bench.convert.GenerateVariants; +import org.apache.orc.bench.convert.csv.CsvReader; +import org.apache.parquet.hadoop.ParquetInputFormat; +import org.openjdk.jmh.annotations.AuxCounters; +import org.openjdk.jmh.annotations.Benchmark; +import org.openjdk.jmh.annotations.BenchmarkMode; +import org.openjdk.jmh.annotations.Fork; +import org.openjdk.jmh.annotations.Level; +import org.openjdk.jmh.annotations.Measurement; +import org.openjdk.jmh.annotations.Mode; +import org.openjdk.jmh.annotations.OutputTimeUnit; +import org.openjdk.jmh.annotations.Param; +import org.openjdk.jmh.annotations.Scope; +import org.openjdk.jmh.annotations.Setup; +import org.openjdk.jmh.annotations.State; +import org.openjdk.jmh.annotations.TearDown; +import org.openjdk.jmh.annotations.Warmup; +import org.openjdk.jmh.infra.Blackhole; +import org.openjdk.jmh.runner.Runner; +import org.openjdk.jmh.runner.options.OptionsBuilder; + +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.net.URI; +import java.nio.charset.StandardCharsets; +import java.util.concurrent.TimeUnit; + +@BenchmarkMode(Mode.AverageTime) +@Warmup(iterations=2, time=30, timeUnit = TimeUnit.SECONDS) +@Measurement(iterations=10, time=30, timeUnit = TimeUnit.SECONDS) +@State(Scope.Thread) +@OutputTimeUnit(TimeUnit.MICROSECONDS) +@Fork(2) +public class DecimalBench { + + private static final String ROOT_ENVIRONMENT_NAME = "bench.root.dir"; + private static final Path root; + static { +String value = System.getProperty(ROOT_ENVIRONMENT_NAME); +root = value == null ? null : new Path(value); + } + + /** + * Abstract out whether we are writing short or long decimals + */ + interface Loader { +/** + * Load the data from the values array into the ColumnVector. + * @param vector the output + * @param values the intput + * @param offset the first input value + *
[jira] [Commented] (ORC-345) Create a Decimal64StatisticsImpl
[ https://issues.apache.org/jira/browse/ORC-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461785#comment-16461785 ] ASF GitHub Bot commented on ORC-345: Github user prasanthj commented on a diff in the pull request: https://github.com/apache/orc/pull/252#discussion_r185673866 --- Diff: java/bench/src/java/org/apache/orc/bench/DecimalBench.java --- @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.orc.bench; + +import com.google.gson.JsonStreamParser; +import org.apache.avro.file.DataFileReader; +import org.apache.avro.generic.GenericDatumReader; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.io.DatumReader; +import org.apache.avro.mapred.FsInput; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.TrackingLocalFileSystem; +import org.apache.hadoop.hive.ql.exec.vector.ColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.Decimal64ColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; +import org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport; +import org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper; +import org.apache.hadoop.io.ArrayWritable; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.mapred.FileSplit; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.Reporter; +import org.apache.orc.CompressionKind; +import org.apache.orc.OrcFile; +import org.apache.orc.Reader; +import org.apache.orc.RecordReader; +import org.apache.orc.TypeDescription; +import org.apache.orc.Writer; +import org.apache.orc.bench.convert.BatchReader; +import org.apache.orc.bench.convert.GenerateVariants; +import org.apache.orc.bench.convert.csv.CsvReader; +import org.apache.parquet.hadoop.ParquetInputFormat; +import org.openjdk.jmh.annotations.AuxCounters; +import org.openjdk.jmh.annotations.Benchmark; +import org.openjdk.jmh.annotations.BenchmarkMode; +import org.openjdk.jmh.annotations.Fork; +import org.openjdk.jmh.annotations.Level; +import org.openjdk.jmh.annotations.Measurement; +import org.openjdk.jmh.annotations.Mode; +import org.openjdk.jmh.annotations.OutputTimeUnit; +import org.openjdk.jmh.annotations.Param; +import org.openjdk.jmh.annotations.Scope; +import org.openjdk.jmh.annotations.Setup; +import org.openjdk.jmh.annotations.State; +import org.openjdk.jmh.annotations.TearDown; +import org.openjdk.jmh.annotations.Warmup; +import org.openjdk.jmh.infra.Blackhole; +import org.openjdk.jmh.runner.Runner; +import org.openjdk.jmh.runner.options.OptionsBuilder; + +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.net.URI; +import java.nio.charset.StandardCharsets; +import java.util.concurrent.TimeUnit; + +@BenchmarkMode(Mode.AverageTime) +@Warmup(iterations=2, time=30, timeUnit = TimeUnit.SECONDS) +@Measurement(iterations=10, time=30, timeUnit = TimeUnit.SECONDS) +@State(Scope.Thread) +@OutputTimeUnit(TimeUnit.MICROSECONDS) +@Fork(2) +public class DecimalBench { + + private static final String ROOT_ENVIRONMENT_NAME = "bench.root.dir"; + private static final Path root; + static { +String value = System.getProperty(ROOT_ENVIRONMENT_NAME); +root = value == null ? null : new Path(value); + } + + /** + * Abstract out whether we are writing short or long decimals + */ + interface Loader { +/** + * Load the data from the values array into the ColumnVector. + * @param vector the output + * @param values the intput + * @param offset the first input value + *
[jira] [Commented] (ORC-341) Support time zone as a parameter for Java reader and writer
[ https://issues.apache.org/jira/browse/ORC-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461890#comment-16461890 ] ASF GitHub Bot commented on ORC-341: Github user wgtmac commented on a diff in the pull request: https://github.com/apache/orc/pull/249#discussion_r185690848 --- Diff: java/core/src/java/org/apache/orc/impl/writer/TimestampTreeWriter.java --- @@ -54,9 +57,20 @@ public TimestampTreeWriter(int columnId, if (rowIndexPosition != null) { recordPosition(rowIndexPosition); } -this.localTimezone = TimeZone.getDefault(); -// for unit tests to set different time zones -this.baseEpochSecsLocalTz = Timestamp.valueOf(BASE_TIMESTAMP_STRING).getTime() / MILLIS_PER_SECOND; +if (writer.isUseUTCTimestamp()) { + this.localTimezone = TimeZone.getTimeZone("UTC"); --- End diff -- We'd better change its name to this.writeTimezone to avoid confusion in the future. Same for localDateFormat and baseEpochSecsLocalTz below. > Support time zone as a parameter for Java reader and writer > --- > > Key: ORC-341 > URL: https://issues.apache.org/jira/browse/ORC-341 > Project: ORC > Issue Type: Improvement >Reporter: Jesus Camacho Rodriguez >Priority: Major > > Currently, time zone is hardcoded as the system default time zone and ORC > applies displacement between timestamp values read/written based on time zone. > This issue aims at adding the option to pass the time zone as a parameter to > the reader/writer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ORC-341) Support time zone as a parameter for Java reader and writer
[ https://issues.apache.org/jira/browse/ORC-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461892#comment-16461892 ] ASF GitHub Bot commented on ORC-341: Github user wgtmac commented on a diff in the pull request: https://github.com/apache/orc/pull/249#discussion_r185691649 --- Diff: java/core/src/java/org/apache/orc/impl/writer/TimestampTreeWriter.java --- @@ -28,7 +28,9 @@ import org.apache.orc.impl.SerializationUtils; import java.io.IOException; -import java.sql.Timestamp; +import java.text.DateFormat; +import java.text.ParseException; +import java.text.SimpleDateFormat; import java.util.TimeZone; public class TimestampTreeWriter extends TreeWriterBase { --- End diff -- We should also change writeBatch function below. The input vector.isUTC may be true while writer.isUseUTCTimestamp() is false; vice versa. In this case, we need to convert them to correct writer timezone. > Support time zone as a parameter for Java reader and writer > --- > > Key: ORC-341 > URL: https://issues.apache.org/jira/browse/ORC-341 > Project: ORC > Issue Type: Improvement >Reporter: Jesus Camacho Rodriguez >Priority: Major > > Currently, time zone is hardcoded as the system default time zone and ORC > applies displacement between timestamp values read/written based on time zone. > This issue aims at adding the option to pass the time zone as a parameter to > the reader/writer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ORC-341) Support time zone as a parameter for Java reader and writer
[ https://issues.apache.org/jira/browse/ORC-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461891#comment-16461891 ] ASF GitHub Bot commented on ORC-341: Github user wgtmac commented on a diff in the pull request: https://github.com/apache/orc/pull/249#discussion_r185691194 --- Diff: java/core/src/java/org/apache/orc/impl/TreeReaderFactory.java --- @@ -990,6 +1007,10 @@ public void nextVector(ColumnVector previousVector, TimestampColumnVector result = (TimestampColumnVector) previousVector; super.nextVector(previousVector, isNull, batchSize); + if (context.isUseUTCTimestamp()) { +result.setIsUTC(true); --- End diff -- result.setIsUTC(context.isUseUTCTimestamp()); Just in case result is in UTC but context.isUseUTCTimestamp() is false. > Support time zone as a parameter for Java reader and writer > --- > > Key: ORC-341 > URL: https://issues.apache.org/jira/browse/ORC-341 > Project: ORC > Issue Type: Improvement >Reporter: Jesus Camacho Rodriguez >Priority: Major > > Currently, time zone is hardcoded as the system default time zone and ORC > applies displacement between timestamp values read/written based on time zone. > This issue aims at adding the option to pass the time zone as a parameter to > the reader/writer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)