[jira] [Commented] (ORC-345) Create a Decimal64StatisticsImpl

2018-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461332#comment-16461332
 ] 

ASF GitHub Bot commented on ORC-345:


Github user omalley commented on the issue:

https://github.com/apache/orc/pull/252
  
Ok, I update this pull request:
* I changed updateDecimal64 to take a value and scale.
* I check for more overflows in Decimal64ColumnStatisticsImpl.
* I added a test that tests the new code.


> Create a Decimal64StatisticsImpl
> 
>
> Key: ORC-345
> URL: https://issues.apache.org/jira/browse/ORC-345
> Project: ORC
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>
> We should create a fast path for handling decimal statistics where precision 
> <= 18 where the values can be handled as longs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-357) Use orc::InputStream in getTimezoneByFilename

2018-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461350#comment-16461350
 ] 

ASF GitHub Bot commented on ORC-357:


GitHub user rip-nsk opened a pull request:

https://github.com/apache/orc/pull/263

ORC-357: [C++] Use orc::InputStream in getTimezoneByFilename



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rip-nsk/orc _ORC-357

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/orc/pull/263.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #263


commit 07fe9ed11a1443bef245723237a3013b2df266e9
Author: rip-nsk 
Date:   2018-05-02T17:29:58Z

Use orc::InputStream in getTimezoneByFilename




> Use orc::InputStream in getTimezoneByFilename
> -
>
> Key: ORC-357
> URL: https://issues.apache.org/jira/browse/ORC-357
> Project: ORC
>  Issue Type: Improvement
>  Components: C++
>Reporter: rip.nsk
>Priority: Minor
>
> This simplify function and makes it portable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ORC-352) Update, cleanup and add support of MSVC to ThirdpartyToolchain

2018-05-02 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/ORC-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved ORC-352.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

I committed this. Thanks, RIP!

> Update, cleanup and add support of MSVC to ThirdpartyToolchain
> --
>
> Key: ORC-352
> URL: https://issues.apache.org/jira/browse/ORC-352
> Project: ORC
>  Issue Type: Improvement
>  Components: C++
>Reporter: rip.nsk
>Assignee: rip.nsk
>Priority: Major
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ORC-356) fix and extend Adaptor (.cc/.hh)

2018-05-02 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/ORC-356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved ORC-356.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

I committed this. Thanks, RIP!

> fix and extend Adaptor (.cc/.hh)
> 
>
> Key: ORC-356
> URL: https://issues.apache.org/jira/browse/ORC-356
> Project: ORC
>  Issue Type: Improvement
>  Components: C++
>Reporter: rip.nsk
>Assignee: rip.nsk
>Priority: Major
> Fix For: 1.5.0
>
>
> Currently C09Adapter.cc is excluded from the build (and includes non-exist 
> C09Adapter.hh).
> I'll fix this and extend it by adopters to support windows/msvc build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-357) Use orc::InputStream in getTimezoneByFilename

2018-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461360#comment-16461360
 ] 

ASF GitHub Bot commented on ORC-357:


Github user rip-nsk commented on a diff in the pull request:

https://github.com/apache/orc/pull/263#discussion_r185577344
  
--- Diff: c++/src/Timezone.cc ---
@@ -698,40 +694,15 @@ namespace orc {
 if (itr != timezoneCache.end()) {
   return *(itr->second).get();
 }
-int in = open(filename.c_str(), O_RDONLY);
-if (in == -1) {
-  std::stringstream buffer;
-  buffer << "failed to open " << filename << " - " << strerror(errno);
-  throw TimezoneError(buffer.str());
-}
-struct stat fileInfo;
-if (fstat(in, ) == -1) {
-  std::stringstream buffer;
-  buffer << "failed to stat " << filename << " - " << strerror(errno);
-  throw TimezoneError(buffer.str());
-}
-if ((fileInfo.st_mode & S_IFMT) != S_IFREG) {
-  std::stringstream buffer;
-  buffer << "non-file in tzfile reader " << filename;
-  throw TimezoneError(buffer.str());
-}
-size_t size = static_cast(fileInfo.st_size);
-std::vector buffer(size);
-size_t posn = 0;
-while (posn < size) {
-  ssize_t ret = read(in, [posn], size - posn);
-  if (ret == -1) {
-throw TimezoneError(std::string("Failure to read timezone file ") +
-filename + " - " + strerror(errno));
-  }
-  posn += static_cast(ret);
-}
-if (close(in) == -1) {
-  std::stringstream err;
-  err << "failed to close " << filename << " - " << strerror(errno);
-  throw TimezoneError(err.str());
+try {
+  ORC_UNIQUE_PTR file = readFile(filename);
--- End diff --

'auto' is more suitable here


> Use orc::InputStream in getTimezoneByFilename
> -
>
> Key: ORC-357
> URL: https://issues.apache.org/jira/browse/ORC-357
> Project: ORC
>  Issue Type: Improvement
>  Components: C++
>Reporter: rip.nsk
>Priority: Minor
>
> This simplify function and makes it portable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ORC-354) Restore the benchmarks after clarification from apache legal

2018-05-02 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/ORC-354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved ORC-354.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

> Restore the benchmarks after clarification from apache legal
> 
>
> Key: ORC-354
> URL: https://issues.apache.org/jira/browse/ORC-354
> Project: ORC
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
> Fix For: 1.5.0
>
>
> Ok, we can add back the benchmarks as long as the component isn't required 
> for users and we don't distribute the binaries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-354) Restore the benchmarks after clarification from apache legal

2018-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461351#comment-16461351
 ] 

ASF GitHub Bot commented on ORC-354:


Github user omalley closed the pull request at:

https://github.com/apache/orc/pull/262


> Restore the benchmarks after clarification from apache legal
> 
>
> Key: ORC-354
> URL: https://issues.apache.org/jira/browse/ORC-354
> Project: ORC
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>
> Ok, we can add back the benchmarks as long as the component isn't required 
> for users and we don't distribute the binaries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-341) Support time zone as a parameter for Java reader and writer

2018-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461644#comment-16461644
 ] 

ASF GitHub Bot commented on ORC-341:


Github user jcamachor commented on the issue:

https://github.com/apache/orc/pull/249
  
I have been testing the patch from Hive and everything seems to be working 
as expected.

I have rebased the patch and merge both commits. Also, I had to extend my 
changes to the newly created ```WriterImplV2```.

@omalley , @wgtmac , could you take a final look and merge if it is OK? 
Thanks


> Support time zone as a parameter for Java reader and writer
> ---
>
> Key: ORC-341
> URL: https://issues.apache.org/jira/browse/ORC-341
> Project: ORC
>  Issue Type: Improvement
>Reporter: Jesus Camacho Rodriguez
>Priority: Major
>
> Currently, time zone is hardcoded as the system default time zone and ORC 
> applies displacement between timestamp values read/written based on time zone.
> This issue aims at adding the option to pass the time zone as a parameter to 
> the reader/writer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-345) Create a Decimal64StatisticsImpl

2018-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461667#comment-16461667
 ] 

ASF GitHub Bot commented on ORC-345:


Github user omalley commented on the issue:

https://github.com/apache/orc/pull/252
  
Ok, with this and ORC-344, the difference in writing small vs large 
decimals to a null file system is huge:

Benchmark   (precision)  Mode  Cnt   Score  Error  Units
DecimalBench.write8  avgt   30   36914.791 ±  866.408  us/op
DecimalBench.write   19  avgt   30  240789.318 ± 5146.880  us/op

So I'm seeing a 6x speed up when using precision = 8 with the new code path 
instead of p = 19 and the old code path.


> Create a Decimal64StatisticsImpl
> 
>
> Key: ORC-345
> URL: https://issues.apache.org/jira/browse/ORC-345
> Project: ORC
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>
> We should create a fast path for handling decimal statistics where precision 
> <= 18 where the values can be handled as longs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ORC-357) Use orc::InputStream in getTimezoneByFilename

2018-05-02 Thread rip.nsk (JIRA)

 [ 
https://issues.apache.org/jira/browse/ORC-357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rip.nsk reassigned ORC-357:
---

Assignee: rip.nsk

> Use orc::InputStream in getTimezoneByFilename
> -
>
> Key: ORC-357
> URL: https://issues.apache.org/jira/browse/ORC-357
> Project: ORC
>  Issue Type: Improvement
>  Components: C++
>Reporter: rip.nsk
>Assignee: rip.nsk
>Priority: Minor
>
> This simplify function and makes it portable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-345) Create a Decimal64StatisticsImpl

2018-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461783#comment-16461783
 ] 

ASF GitHub Bot commented on ORC-345:


Github user prasanthj commented on a diff in the pull request:

https://github.com/apache/orc/pull/252#discussion_r185673790
  
--- Diff: java/bench/src/java/org/apache/orc/bench/DecimalBench.java ---
@@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.orc.bench;
+
+import com.google.gson.JsonStreamParser;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.io.DatumReader;
+import org.apache.avro.mapred.FsInput;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.TrackingLocalFileSystem;
+import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.Decimal64ColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport;
+import 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper;
+import org.apache.hadoop.io.ArrayWritable;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.Reporter;
+import org.apache.orc.CompressionKind;
+import org.apache.orc.OrcFile;
+import org.apache.orc.Reader;
+import org.apache.orc.RecordReader;
+import org.apache.orc.TypeDescription;
+import org.apache.orc.Writer;
+import org.apache.orc.bench.convert.BatchReader;
+import org.apache.orc.bench.convert.GenerateVariants;
+import org.apache.orc.bench.convert.csv.CsvReader;
+import org.apache.parquet.hadoop.ParquetInputFormat;
+import org.openjdk.jmh.annotations.AuxCounters;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Level;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.TearDown;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+import org.openjdk.jmh.runner.Runner;
+import org.openjdk.jmh.runner.options.OptionsBuilder;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.net.URI;
+import java.nio.charset.StandardCharsets;
+import java.util.concurrent.TimeUnit;
+
+@BenchmarkMode(Mode.AverageTime)
+@Warmup(iterations=2, time=30, timeUnit = TimeUnit.SECONDS)
+@Measurement(iterations=10, time=30, timeUnit = TimeUnit.SECONDS)
+@State(Scope.Thread)
+@OutputTimeUnit(TimeUnit.MICROSECONDS)
+@Fork(2)
+public class DecimalBench {
+
+  private static final String ROOT_ENVIRONMENT_NAME = "bench.root.dir";
+  private static final Path root;
+  static {
+String value = System.getProperty(ROOT_ENVIRONMENT_NAME);
+root = value == null ? null : new Path(value);
+  }
+
+  /**
+   * Abstract out whether we are writing short or long decimals
+   */
+  interface Loader {
+/**
+ * Load the data from the values array into the ColumnVector.
+ * @param vector the output
+ * @param values the intput
+ * @param offset the first input value
+ * 

[jira] [Commented] (ORC-345) Create a Decimal64StatisticsImpl

2018-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461784#comment-16461784
 ] 

ASF GitHub Bot commented on ORC-345:


Github user prasanthj commented on a diff in the pull request:

https://github.com/apache/orc/pull/252#discussion_r185673722
  
--- Diff: java/bench/src/java/org/apache/orc/bench/DecimalBench.java ---
@@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.orc.bench;
+
+import com.google.gson.JsonStreamParser;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.io.DatumReader;
+import org.apache.avro.mapred.FsInput;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.TrackingLocalFileSystem;
+import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.Decimal64ColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport;
+import 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper;
+import org.apache.hadoop.io.ArrayWritable;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.Reporter;
+import org.apache.orc.CompressionKind;
+import org.apache.orc.OrcFile;
+import org.apache.orc.Reader;
+import org.apache.orc.RecordReader;
+import org.apache.orc.TypeDescription;
+import org.apache.orc.Writer;
+import org.apache.orc.bench.convert.BatchReader;
+import org.apache.orc.bench.convert.GenerateVariants;
+import org.apache.orc.bench.convert.csv.CsvReader;
+import org.apache.parquet.hadoop.ParquetInputFormat;
+import org.openjdk.jmh.annotations.AuxCounters;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Level;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.TearDown;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+import org.openjdk.jmh.runner.Runner;
+import org.openjdk.jmh.runner.options.OptionsBuilder;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.net.URI;
+import java.nio.charset.StandardCharsets;
+import java.util.concurrent.TimeUnit;
+
+@BenchmarkMode(Mode.AverageTime)
+@Warmup(iterations=2, time=30, timeUnit = TimeUnit.SECONDS)
+@Measurement(iterations=10, time=30, timeUnit = TimeUnit.SECONDS)
+@State(Scope.Thread)
+@OutputTimeUnit(TimeUnit.MICROSECONDS)
+@Fork(2)
+public class DecimalBench {
+
+  private static final String ROOT_ENVIRONMENT_NAME = "bench.root.dir";
+  private static final Path root;
+  static {
+String value = System.getProperty(ROOT_ENVIRONMENT_NAME);
+root = value == null ? null : new Path(value);
+  }
+
+  /**
+   * Abstract out whether we are writing short or long decimals
+   */
+  interface Loader {
+/**
+ * Load the data from the values array into the ColumnVector.
+ * @param vector the output
+ * @param values the intput
+ * @param offset the first input value
+ * 

[jira] [Commented] (ORC-345) Create a Decimal64StatisticsImpl

2018-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461785#comment-16461785
 ] 

ASF GitHub Bot commented on ORC-345:


Github user prasanthj commented on a diff in the pull request:

https://github.com/apache/orc/pull/252#discussion_r185673866
  
--- Diff: java/bench/src/java/org/apache/orc/bench/DecimalBench.java ---
@@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.orc.bench;
+
+import com.google.gson.JsonStreamParser;
+import org.apache.avro.file.DataFileReader;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.io.DatumReader;
+import org.apache.avro.mapred.FsInput;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.TrackingLocalFileSystem;
+import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.Decimal64ColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport;
+import 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper;
+import org.apache.hadoop.io.ArrayWritable;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.Reporter;
+import org.apache.orc.CompressionKind;
+import org.apache.orc.OrcFile;
+import org.apache.orc.Reader;
+import org.apache.orc.RecordReader;
+import org.apache.orc.TypeDescription;
+import org.apache.orc.Writer;
+import org.apache.orc.bench.convert.BatchReader;
+import org.apache.orc.bench.convert.GenerateVariants;
+import org.apache.orc.bench.convert.csv.CsvReader;
+import org.apache.parquet.hadoop.ParquetInputFormat;
+import org.openjdk.jmh.annotations.AuxCounters;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Level;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.TearDown;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+import org.openjdk.jmh.runner.Runner;
+import org.openjdk.jmh.runner.options.OptionsBuilder;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.net.URI;
+import java.nio.charset.StandardCharsets;
+import java.util.concurrent.TimeUnit;
+
+@BenchmarkMode(Mode.AverageTime)
+@Warmup(iterations=2, time=30, timeUnit = TimeUnit.SECONDS)
+@Measurement(iterations=10, time=30, timeUnit = TimeUnit.SECONDS)
+@State(Scope.Thread)
+@OutputTimeUnit(TimeUnit.MICROSECONDS)
+@Fork(2)
+public class DecimalBench {
+
+  private static final String ROOT_ENVIRONMENT_NAME = "bench.root.dir";
+  private static final Path root;
+  static {
+String value = System.getProperty(ROOT_ENVIRONMENT_NAME);
+root = value == null ? null : new Path(value);
+  }
+
+  /**
+   * Abstract out whether we are writing short or long decimals
+   */
+  interface Loader {
+/**
+ * Load the data from the values array into the ColumnVector.
+ * @param vector the output
+ * @param values the intput
+ * @param offset the first input value
+ * 

[jira] [Commented] (ORC-341) Support time zone as a parameter for Java reader and writer

2018-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461890#comment-16461890
 ] 

ASF GitHub Bot commented on ORC-341:


Github user wgtmac commented on a diff in the pull request:

https://github.com/apache/orc/pull/249#discussion_r185690848
  
--- Diff: 
java/core/src/java/org/apache/orc/impl/writer/TimestampTreeWriter.java ---
@@ -54,9 +57,20 @@ public TimestampTreeWriter(int columnId,
 if (rowIndexPosition != null) {
   recordPosition(rowIndexPosition);
 }
-this.localTimezone = TimeZone.getDefault();
-// for unit tests to set different time zones
-this.baseEpochSecsLocalTz = 
Timestamp.valueOf(BASE_TIMESTAMP_STRING).getTime() / MILLIS_PER_SECOND;
+if (writer.isUseUTCTimestamp()) {
+  this.localTimezone = TimeZone.getTimeZone("UTC");
--- End diff --

We'd better change its name to this.writeTimezone to avoid confusion in the 
future.
Same for localDateFormat and baseEpochSecsLocalTz below.


> Support time zone as a parameter for Java reader and writer
> ---
>
> Key: ORC-341
> URL: https://issues.apache.org/jira/browse/ORC-341
> Project: ORC
>  Issue Type: Improvement
>Reporter: Jesus Camacho Rodriguez
>Priority: Major
>
> Currently, time zone is hardcoded as the system default time zone and ORC 
> applies displacement between timestamp values read/written based on time zone.
> This issue aims at adding the option to pass the time zone as a parameter to 
> the reader/writer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-341) Support time zone as a parameter for Java reader and writer

2018-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461892#comment-16461892
 ] 

ASF GitHub Bot commented on ORC-341:


Github user wgtmac commented on a diff in the pull request:

https://github.com/apache/orc/pull/249#discussion_r185691649
  
--- Diff: 
java/core/src/java/org/apache/orc/impl/writer/TimestampTreeWriter.java ---
@@ -28,7 +28,9 @@
 import org.apache.orc.impl.SerializationUtils;
 
 import java.io.IOException;
-import java.sql.Timestamp;
+import java.text.DateFormat;
+import java.text.ParseException;
+import java.text.SimpleDateFormat;
 import java.util.TimeZone;
 
 public class TimestampTreeWriter extends TreeWriterBase {
--- End diff --

We should also change writeBatch function below.

The input vector.isUTC may be true while writer.isUseUTCTimestamp() is 
false; vice versa. In this case, we need to convert them to correct writer 
timezone.


> Support time zone as a parameter for Java reader and writer
> ---
>
> Key: ORC-341
> URL: https://issues.apache.org/jira/browse/ORC-341
> Project: ORC
>  Issue Type: Improvement
>Reporter: Jesus Camacho Rodriguez
>Priority: Major
>
> Currently, time zone is hardcoded as the system default time zone and ORC 
> applies displacement between timestamp values read/written based on time zone.
> This issue aims at adding the option to pass the time zone as a parameter to 
> the reader/writer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-341) Support time zone as a parameter for Java reader and writer

2018-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461891#comment-16461891
 ] 

ASF GitHub Bot commented on ORC-341:


Github user wgtmac commented on a diff in the pull request:

https://github.com/apache/orc/pull/249#discussion_r185691194
  
--- Diff: java/core/src/java/org/apache/orc/impl/TreeReaderFactory.java ---
@@ -990,6 +1007,10 @@ public void nextVector(ColumnVector previousVector,
   TimestampColumnVector result = (TimestampColumnVector) 
previousVector;
   super.nextVector(previousVector, isNull, batchSize);
 
+  if (context.isUseUTCTimestamp()) {
+result.setIsUTC(true);
--- End diff --

result.setIsUTC(context.isUseUTCTimestamp());

Just in case result is in UTC but context.isUseUTCTimestamp() is false.


> Support time zone as a parameter for Java reader and writer
> ---
>
> Key: ORC-341
> URL: https://issues.apache.org/jira/browse/ORC-341
> Project: ORC
>  Issue Type: Improvement
>Reporter: Jesus Camacho Rodriguez
>Priority: Major
>
> Currently, time zone is hardcoded as the system default time zone and ORC 
> applies displacement between timestamp values read/written based on time zone.
> This issue aims at adding the option to pass the time zone as a parameter to 
> the reader/writer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)