[jira] [Assigned] (ARROW-7376) [C++] parquet NaN/null double statistics can result in endless loop

2020-01-13 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-7376:
--

Assignee: Neal Richardson  (was: Francois Saint-Jacques)

> [C++] parquet NaN/null double statistics can result in endless loop
> ---
>
> Key: ARROW-7376
> URL: https://issues.apache.org/jira/browse/ARROW-7376
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.15.1
>Reporter: Pierre Belzile
>Assignee: Neal Richardson
>Priority: Critical
>  Labels: parquet, pull-request-available
> Fix For: 0.16.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> There is a bug in the doubles column statistics computation when writing to 
> parquet an array with only NaNs and nulls. It loops endlessly if the last 
> cell of a write group is a Null. The line in error is 
> [https://github.com/apache/arrow/blob/master/cpp/src/parquet/statistics.cc#L633]
>  which checks for NaN but not for Null. Code then falls through and loops 
> endlessly and causes the program to appear frozen.
> This code snippet repeats:
> {noformat}
> TEST(parquet, nans) {
>   /* Create a small parquet structure */
>   std::vector> fields;
>   fields.push_back(::arrow::field("doubles", ::arrow::float64()));
>   std::shared_ptr<::arrow::Schema> schema = 
> ::arrow::schema(std::move(fields));  
> std::unique_ptr<::arrow::RecordBatchBuilder> builder;
>   ::arrow::RecordBatchBuilder::Make(schema, ::arrow::default_memory_pool(), 
> );
>   
> builder->GetFieldAs<::arrow::DoubleBuilder>(0)->Append(std::numeric_limits::quiet_NaN());
>   builder->GetFieldAs<::arrow::DoubleBuilder>(0)->AppendNull();  
> std::shared_ptr<::arrow::RecordBatch> batch;
>   builder->Flush();
>   arrow::PrettyPrint(*batch, 0, ::cout);  std::shared_ptr 
> table;
>   arrow::Table::FromRecordBatches({batch}, );  /* Attempt to write */
>   std::shared_ptr<::arrow::io::FileOutputStream> os;
>   arrow::io::FileOutputStream::Open("/tmp/test.parquet", );
>   parquet::WriterProperties::Builder writer_props_bld;
>   // writer_props_bld.disable_statistics("doubles");
>   std::shared_ptr writer_props = 
> writer_props_bld.build();
>   std::shared_ptr arrow_props =
>   parquet::ArrowWriterProperties::Builder().store_schema()->build();
>   std::unique_ptr writer;
>   parquet::arrow::FileWriter::Open(
>   *table->schema(), arrow::default_memory_pool(), os,
>   writer_props, arrow_props, );
>   writer->WriteTable(*table, 1024);
> }{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7376) [C++] parquet NaN/null double statistics can result in endless loop

2020-01-13 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-7376:
--

Assignee: Francois Saint-Jacques  (was: Neal Richardson)

> [C++] parquet NaN/null double statistics can result in endless loop
> ---
>
> Key: ARROW-7376
> URL: https://issues.apache.org/jira/browse/ARROW-7376
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.15.1
>Reporter: Pierre Belzile
>Assignee: Francois Saint-Jacques
>Priority: Critical
>  Labels: parquet, pull-request-available
> Fix For: 0.16.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> There is a bug in the doubles column statistics computation when writing to 
> parquet an array with only NaNs and nulls. It loops endlessly if the last 
> cell of a write group is a Null. The line in error is 
> [https://github.com/apache/arrow/blob/master/cpp/src/parquet/statistics.cc#L633]
>  which checks for NaN but not for Null. Code then falls through and loops 
> endlessly and causes the program to appear frozen.
> This code snippet repeats:
> {noformat}
> TEST(parquet, nans) {
>   /* Create a small parquet structure */
>   std::vector> fields;
>   fields.push_back(::arrow::field("doubles", ::arrow::float64()));
>   std::shared_ptr<::arrow::Schema> schema = 
> ::arrow::schema(std::move(fields));  
> std::unique_ptr<::arrow::RecordBatchBuilder> builder;
>   ::arrow::RecordBatchBuilder::Make(schema, ::arrow::default_memory_pool(), 
> );
>   
> builder->GetFieldAs<::arrow::DoubleBuilder>(0)->Append(std::numeric_limits::quiet_NaN());
>   builder->GetFieldAs<::arrow::DoubleBuilder>(0)->AppendNull();  
> std::shared_ptr<::arrow::RecordBatch> batch;
>   builder->Flush();
>   arrow::PrettyPrint(*batch, 0, ::cout);  std::shared_ptr 
> table;
>   arrow::Table::FromRecordBatches({batch}, );  /* Attempt to write */
>   std::shared_ptr<::arrow::io::FileOutputStream> os;
>   arrow::io::FileOutputStream::Open("/tmp/test.parquet", );
>   parquet::WriterProperties::Builder writer_props_bld;
>   // writer_props_bld.disable_statistics("doubles");
>   std::shared_ptr writer_props = 
> writer_props_bld.build();
>   std::shared_ptr arrow_props =
>   parquet::ArrowWriterProperties::Builder().store_schema()->build();
>   std::unique_ptr writer;
>   parquet::arrow::FileWriter::Open(
>   *table->schema(), arrow::default_memory_pool(), os,
>   writer_props, arrow_props, );
>   writer->WriteTable(*table, 1024);
> }{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7376) [C++] parquet NaN/null double statistics can result in endless loop

2020-01-08 Thread Francois Saint-Jacques (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques reassigned ARROW-7376:
-

Assignee: Francois Saint-Jacques

> [C++] parquet NaN/null double statistics can result in endless loop
> ---
>
> Key: ARROW-7376
> URL: https://issues.apache.org/jira/browse/ARROW-7376
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.15.1
>Reporter: Pierre Belzile
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: parquet
> Fix For: 0.16.0
>
>
> There is a bug in the doubles column statistics computation when writing to 
> parquet an array with only NaNs and nulls. It loops endlessly if the last 
> cell of a write group is a Null. The line in error is 
> [https://github.com/apache/arrow/blob/master/cpp/src/parquet/statistics.cc#L633]
>  which checks for NaN but not for Null. Code then falls through and loops 
> endlessly and causes the program to appear frozen.
> This code snippet repeats:
> {noformat}
> TEST(parquet, nans) {
>   /* Create a small parquet structure */
>   std::vector> fields;
>   fields.push_back(::arrow::field("doubles", ::arrow::float64()));
>   std::shared_ptr<::arrow::Schema> schema = 
> ::arrow::schema(std::move(fields));  
> std::unique_ptr<::arrow::RecordBatchBuilder> builder;
>   ::arrow::RecordBatchBuilder::Make(schema, ::arrow::default_memory_pool(), 
> );
>   
> builder->GetFieldAs<::arrow::DoubleBuilder>(0)->Append(std::numeric_limits::quiet_NaN());
>   builder->GetFieldAs<::arrow::DoubleBuilder>(0)->AppendNull();  
> std::shared_ptr<::arrow::RecordBatch> batch;
>   builder->Flush();
>   arrow::PrettyPrint(*batch, 0, ::cout);  std::shared_ptr 
> table;
>   arrow::Table::FromRecordBatches({batch}, );  /* Attempt to write */
>   std::shared_ptr<::arrow::io::FileOutputStream> os;
>   arrow::io::FileOutputStream::Open("/tmp/test.parquet", );
>   parquet::WriterProperties::Builder writer_props_bld;
>   // writer_props_bld.disable_statistics("doubles");
>   std::shared_ptr writer_props = 
> writer_props_bld.build();
>   std::shared_ptr arrow_props =
>   parquet::ArrowWriterProperties::Builder().store_schema()->build();
>   std::unique_ptr writer;
>   parquet::arrow::FileWriter::Open(
>   *table->schema(), arrow::default_memory_pool(), os,
>   writer_props, arrow_props, );
>   writer->WriteTable(*table, 1024);
> }{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)