[jira] [Resolved] (PARQUET-146) make Parquet compile with java 7 instead of java 6

2016-08-15 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PARQUET-146.
---
   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request 231
[https://github.com/apache/parquet-mr/pull/231]

> make Parquet compile with java 7 instead of java 6
> --
>
> Key: PARQUET-146
> URL: https://issues.apache.org/jira/browse/PARQUET-146
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>  Labels: beginner, noob, pick-me-up
> Fix For: 1.9.0
>
>
> currently Parquet is compatible with java 6. we should remove this constraint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PARQUET-678) Allow for custom compression codecs

2016-08-15 Thread Steven Anton (JIRA)
Steven Anton created PARQUET-678:


 Summary: Allow for custom compression codecs
 Key: PARQUET-678
 URL: https://issues.apache.org/jira/browse/PARQUET-678
 Project: Parquet
  Issue Type: Wish
Reporter: Steven Anton
Priority: Minor


I understand that the list of accepted compression codecs is explicity limited 
to uncompressed, snappy, gzip, and lzo. (See 
parquet.hadoop.metadata.CompressionCodecName.java) Is there a reason for this? 
Or is there an easy workaround? On the surface it seems like an unnecessary 
restriction.

I ask because I have written a custom codec to implement encryption and I'm 
unable to use it with Parquet, which is a real shame because it is the main 
storage format I was hoping to use.

Other thoughts on how to implement encryption in Parquet with this limitation?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-676) MAX_VALUES_PER_LITERAL_RUN causes RLE encoding failure

2016-08-15 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420782#comment-15420782
 ] 

Wes McKinney commented on PARQUET-676:
--

It looks like the implementation of {{LevelEncoder::MaxBufferSize}} may be 
incorrect if it is not resulting in a large enough buffer being allocated. 

https://github.com/apache/parquet-cpp/commit/35c8eb54aadcc18057b25db0cc6fd22239dee908#diff-33bc8df68e71e99c391f762f8e397488

[~xhochy] can you take a look at this to see what might be going wrong?

> MAX_VALUES_PER_LITERAL_RUN causes RLE encoding failure
> --
>
> Key: PARQUET-676
> URL: https://issues.apache.org/jira/browse/PARQUET-676
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
> Environment: Mac OSX
>Reporter: Mark Schaefer
>
> The following code works for NUM_TO_ENCODE <= 400, but fails greater than 
> that with the error:
> Check failed: (encoded) == (num_buffered_values_)
> It appears to have to do with how large of an RLE buffer is allocated for 
> buffering, causing Put to fail in levels.cc:78, but there doesn't seem to be 
> recovery from that, or any error indicating what the problem is. I'm assuming 
> MAX_VALUES_PER_LITERAL_RUN is somehow derived from the Parquet spec, but if 
> so, it seems that there ought to be an exception or something generated. This 
> could also be the basis of a writer example.
> // Licensed to the Apache Software Foundation (ASF) under one
> // or more contributor license agreements.  See the NOTICE file
> // distributed with this work for additional information
> // regarding copyright ownership.  The ASF licenses this file
> // to you under the Apache License, Version 2.0 (the
> // "License"); you may not use this file except in compliance
> // with the License.  You may obtain a copy of the License at
> //
> //   http://www.apache.org/licenses/LICENSE-2.0
> //
> // Unless required by applicable law or agreed to in writing,
> // software distributed under the License is distributed on an
> // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
> // KIND, either express or implied.  See the License for the
> // specific language governing permissions and limitations
> // under the License.
> #include 
> #include 
> #include 
> #include 
> using namespace parquet;
> int main(int argc, char** argv) {
>   if (argc != 2) {
> std::cerr << "Usage: " << argv[0] << " "
>   << std::endl;
> return -1;
>   }
>   std::string filename = argv[1];
>   try {
> const int NUM_TO_ENCODE = 400;
> std::shared_ptr ostream(new 
> LocalFileOutputStream(filename));
> parquet::schema::NodeVector fields;
> parquet::schema::NodePtr schema;
> fields.push_back(parquet::schema::Int32("id", Repetition::REQUIRED));
> fields.push_back(parquet::schema::ByteArray("name", 
> Repetition::OPTIONAL));
> schema = parquet::schema::GroupNode::Make("schema", Repetition::REPEATED, 
> fields);
> std::unique_ptr writer = 
> ParquetFileWriter::Open(ostream, 
> std::dynamic_pointer_cast(schema));
> RowGroupWriter* rgBlock = writer->AppendRowGroup(NUM_TO_ENCODE);
> ColumnWriter* colBlock = rgBlock->NextColumn();
> Int32Writer* intWriter = static_cast(colBlock);
> std::vector intbuf;
> std::vector defbuf;
> std::vector strbuf;
> for (int i = 0; i < NUM_TO_ENCODE; ++i) {
> intbuf.push_back( i );
> if (i % 10 == 0) {
> defbuf.push_back(0);
> } else {
> defbuf.push_back(1);
> uint8_t* buf = new uint8_t[4];
> ByteArray ba;
> sprintf((char*)buf,"%d",i);
> ba.ptr = buf;
> ba.len = strlen((const char*)ba.ptr);
> strbuf.push_back(ba);
> }
> }
> intWriter->WriteBatch(intbuf.size(), nullptr, nullptr, intbuf.data());
> intWriter->Close();
> colBlock = rgBlock->NextColumn();
> ByteArrayWriter* strWriter = static_cast(colBlock);
> std::cerr << "sizes: strings:" << strbuf.size() << " definitions: " << 
> defbuf.size() << std::endl;
> strWriter->WriteBatch(defbuf.size(), defbuf.data(), nullptr, 
> strbuf.data());
> strWriter->Close();
>   } catch (const std::exception& e) {
> std::cerr << "Parquet error: "
>   << e.what()
>   << std::endl;
> return -1;
>   }
>   return 0;
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)