[GitHub] orc pull request #277: ORC-372: Enable valgrind for C++ travis-ci tests

2018-06-04 Thread majetideepak
Github user majetideepak commented on a diff in the pull request:

https://github.com/apache/orc/pull/277#discussion_r192919379
  
--- Diff: c++/src/TypeImpl.cc ---
@@ -258,31 +258,34 @@ namespace orc {
 
 case STRUCT: {
   StructVectorBatch *result = new StructVectorBatch(capacity, 
memoryPool);
+  std::unique_ptr return_value = 
std::unique_ptr(result);
--- End diff --

we need `result` of type `StructVectorBatch` to access fields on line 263


---


[GitHub] orc pull request #277: ORC-372: Enable valgrind for C++ travis-ci tests

2018-06-04 Thread wgtmac
Github user wgtmac commented on a diff in the pull request:

https://github.com/apache/orc/pull/277#discussion_r192909966
  
--- Diff: c++/src/TypeImpl.cc ---
@@ -498,54 +506,54 @@ namespace orc {
   }
 
   ORC_UNIQUE_PTR Type::buildTypeFromString(const std::string& input) 
{
-std::vector > res =
+std::vector > > res =
   TypeImpl::parseType(input, 0, input.size());
 if (res.size() != 1) {
   throw std::logic_error("Invalid type string.");
 }
-return ORC_UNIQUE_PTR(res[0].second);
+return std::move(res[0].second);
   }
 
   Type* TypeImpl::parseArrayType(const std::string ,
--- End diff --

is it better to change return type to unique_ptr as well to be consistent? 
same for below.


---


[GitHub] orc pull request #277: ORC-372: Enable valgrind for C++ travis-ci tests

2018-06-04 Thread wgtmac
Github user wgtmac commented on a diff in the pull request:

https://github.com/apache/orc/pull/277#discussion_r192907281
  
--- Diff: c++/src/TypeImpl.cc ---
@@ -258,31 +258,34 @@ namespace orc {
 
 case STRUCT: {
   StructVectorBatch *result = new StructVectorBatch(capacity, 
memoryPool);
+  std::unique_ptr return_value = 
std::unique_ptr(result);
--- End diff --

maybe we can directly merge line 260 and 261 to a single statement. same 
for below.


---


[GitHub] orc pull request #277: ORC-372: Enable valgrind for C++ travis-ci tests

2018-06-04 Thread wgtmac
Github user wgtmac commented on a diff in the pull request:

https://github.com/apache/orc/pull/277#discussion_r192910566
  
--- Diff: c++/src/Compression.cc ---
@@ -282,6 +285,11 @@ DIAGNOSTIC_PUSH
 }
   }
 
+  void ZlibCompressionStream::end() {
+(void)deflateEnd();
+  }
+
+
--- End diff --

remove this line


---


[GitHub] orc pull request #279: ORC-373: Option to disable dictionary encoding

2018-06-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/orc/pull/279


---


[GitHub] orc issue #279: ORC-373: Option to disable dictionary encoding

2018-06-04 Thread prasanthj
Github user prasanthj commented on the issue:

https://github.com/apache/orc/pull/279
  
lgtm


---


[GitHub] orc issue #279: ORC-373: Option to disable dictionary encoding

2018-06-04 Thread omalley
Github user omalley commented on the issue:

https://github.com/apache/orc/pull/279
  
Prasanth, rather than adding the flush count, which is really test code in 
the main source, how about creating a StringTreeWriter and catching the stream 
directly. I'd propose this - 
https://github.com/omalley/orc/blob/8a782d204f071e2bd16bcfd591a54319fadff132/java/core/src/test/org/apache/orc/TestStringDictionary.java#L251


---


[GitHub] orc issue #277: ORC-372: Enable valgrind for C++ travis-ci tests

2018-06-04 Thread majetideepak
Github user majetideepak commented on the issue:

https://github.com/apache/orc/pull/277
  
@wgtmac, @xndai  can you take a look at this patch? ZLIB compression code 
seems to be leaking memory. Thanks!


---


[GitHub] orc issue #276: ORC-373: Option to disable dictionary encoding

2018-06-04 Thread omalley
Github user omalley commented on the issue:

https://github.com/apache/orc/pull/276
  
@prasanthj yeah, in general you should make a dev branch for the pull 
request (eg. prasanthj:orc-373).  Then you can do a force-push to the branch to 
update the PR.


---


[GitHub] orc pull request #273: ORC-343 Enable C++ writer to support RleV2

2018-06-04 Thread majetideepak
Github user majetideepak commented on a diff in the pull request:

https://github.com/apache/orc/pull/273#discussion_r192729045
  
--- Diff: c++/test/TestRleEncoder.cc ---
@@ -0,0 +1,243 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+
+#include "MemoryOutputStream.hh"
+#include "RLEv1.hh"
+
+#include "wrap/orc-proto-wrapper.hh"
+#include "wrap/gtest-wrapper.h"
+
+namespace orc {
+
+  const int DEFAULT_MEM_STREAM_SIZE = 1024 * 1024; // 1M
+
+  void generateData(
+ uint64_t numValues,
+ int64_t start,
+ int64_t delta,
+ bool random,
+ int64_t* data,
+ uint64_t numNulls = 0,
+ char* notNull = nullptr) {
+if (numNulls != 0 && notNull != nullptr) {
+  memset(notNull, 1, numValues);
+  while (numNulls > 0) {
+uint64_t pos = static_cast(std::rand()) % numValues;
+if (notNull[pos]) {
+  notNull[pos] = static_cast(0);
+  --numNulls;
+}
+  }
+}
+
+for (uint64_t i = 0; i < numValues; ++i) {
+  if (notNull == nullptr || notNull[i])
+  {
+if (!random) {
+  data[i] = start + delta * static_cast(i);
+} else {
+  data[i] = std::rand();
+}
+  }
+}
+  }
+
+  void decodeAndVerify(
+   RleVersion version,
+   const MemoryOutputStream& memStream,
+   int64_t * data,
+   uint64_t numValues,
+   const char* notNull,
+   bool isSinged) {
+std::unique_ptr decoder = createRleDecoder(
+std::unique_ptr(new 
SeekableArrayInputStream(
+memStream.getData(),
+memStream.getLength())),
+isSinged, version, *getDefaultPool());
+
+int64_t* decodedData = new int64_t[numValues];
+decoder->next(decodedData, numValues, notNull);
+
+for (uint64_t i = 0; i < numValues; ++i) {
+  if (!notNull || notNull[i]) {
+EXPECT_EQ(data[i], decodedData[i]);
+  }
+}
+
+delete [] decodedData;
+  }
+
+  std::unique_ptr getEncoder(RleVersion version,
+MemoryOutputStream& memStream,
+bool isSigned)
+  {
+MemoryPool * pool = getDefaultPool();
+
+return createRleEncoder(
+std::unique_ptr(
+new BufferedOutputStream(*pool, , 500 * 
1024, 1024)),
+isSigned, version, *pool, true);
--- End diff --

can we template these tests for `alignedBitpacking =  false`?


---


[GitHub] orc issue #276: ORC-373: Option to disable dictionary encoding

2018-06-04 Thread prasanthj
Github user prasanthj commented on the issue:

https://github.com/apache/orc/pull/276
  
@damiencarol added junit tests. my branches got messed up. So had to give 
another PR. #279


---


[GitHub] orc pull request #279: ORC-373: Option to disable dictionary encoding

2018-06-04 Thread prasanthj
GitHub user prasanthj opened a pull request:

https://github.com/apache/orc/pull/279

ORC-373: Option to disable dictionary encoding

disables dictionary encoding at the time of string tree writer 
initialization.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/prasanthj/orc ORC-373

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/orc/pull/279.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #279


commit a2a091b23972523b2c55448808dbb695f19ee4ef
Author: Prasanth Jayachandran 
Date:   2018-06-02T02:24:22Z

ORC-373: Option to disable dictionary encoding

commit 7b8f90223b3d1021eff17b52ce92417ed561548c
Author: Prasanth Jayachandran 
Date:   2018-06-04T06:22:38Z

Added unit test




---


[GitHub] orc pull request #276: ORC-373: Option to disable dictionary encoding

2018-06-04 Thread prasanthj
Github user prasanthj closed the pull request at:

https://github.com/apache/orc/pull/276


---