[jira] [Updated] (ARROW-2372) ArrowIOError: Invalid argument

2018-04-07 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2372:
---
Fix Version/s: 0.9.1

> ArrowIOError: Invalid argument
> --
>
> Key: ARROW-2372
> URL: https://issues.apache.org/jira/browse/ARROW-2372
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0, 0.9.0
> Environment: Ubuntu 16.04
>Reporter: Kyle Barron
>Priority: Major
> Fix For: 0.9.1
>
>
> I get an ArrowIOError when reading a specific file that was also written by 
> pyarrow. Specifically, the traceback is:
> {code:python}
> >>> import pyarrow.parquet as pq
> >>> pq.ParquetFile('gaz2016zcta5distancemiles.parquet')
>  ---
>  ArrowIOError Traceback (most recent call last)
>   in ()
>  > 1 pf = pq.ParquetFile('gaz2016zcta5distancemiles.parquet')
> ~/local/anaconda3/lib/python3.6/site-packages/pyarrow/parquet.py in 
> _init_(self, source, metadata, common_metadata)
>  62 self.reader = ParquetReader()
>  63 source = _ensure_file(source)
>  ---> 64 self.reader.open(source, metadata=metadata)
>  65 self.common_metadata = common_metadata
>  66 self._nested_paths_by_prefix = self._build_nested_paths()
> _parquet.pyx in pyarrow._parquet.ParquetReader.open()
> error.pxi in pyarrow.lib.check_status()
> ArrowIOError: Arrow error: IOError: [Errno 22] Invalid argument
> {code}
> Here's a reproducible example with the specific file I'm working with. I'm 
> converting a 34 GB csv file to parquet in chunks of roughly 2GB each. To get 
> the source data:
> {code:bash}
> wget 
> https://www.nber.org/distance/2016/gaz/zcta5/gaz2016zcta5distancemiles.csv.zip
> unzip gaz2016zcta5distancemiles.csv.zip{code}
> Then the basic idea from the [pyarrow Parquet 
> documentation|https://arrow.apache.org/docs/python/parquet.html#finer-grained-reading-and-writing]
>  is instantiating the writer class; looping over chunks of the csv and 
> writing them to parquet; then closing the writer object.
>  
> {code:python}
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> from pathlib import Path
> zcta_file = Path('gaz2016zcta5distancemiles.csv')
> itr = pd.read_csv(
> zcta_file,
> header=0,
> dtype={'zip1': str, 'zip2': str, 'mi_to_zcta5': np.float64},
> engine='c',
> chunksize=64617153)
> schema = pa.schema([
> pa.field('zip1', pa.string()),
> pa.field('zip2', pa.string()),
> pa.field('mi_to_zcta5', pa.float64())])
> writer = pq.ParquetWriter('gaz2016zcta5distancemiles.parquet', schema=schema)
> print(f'Starting conversion')
> i = 0
> for df in itr:
> i += 1
> print(f'Finished reading csv block {i}')
> table = pa.Table.from_pandas(df, preserve_index=False, nthreads=3)
> writer.write_table(table)
> print(f'Finished writing parquet block {i}')
> writer.close()
> {code}
> Then running this python script produces the file 
> {code:java}
> gaz2016zcta5distancemiles.parquet{code}
> , but just attempting to read the metadata with `pq.ParquetFile()` produces 
> the above exception.
> I tested this with pyarrow 0.8 and pyarrow 0.9. I assume that pandas would 
> complain on import of the csv if the columns in the data were not `string`, 
> `string`, and `float64`, so I think creating the Parquet schema in that way 
> should be fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429645#comment-16429645
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

kou commented on issue #1852: ARROW-2411: [C++] Add StringBuilder::Append(const 
char **values)
URL: https://github.com/apache/arrow/pull/1852#issuecomment-379524557
 
 
   > "NULL-terminated  C strings" means each C string is terminated by a nul 
char, not that  the array is terminated by a NULL pointer :-)
   
   Thanks for the English lesson. :-)
   It's very helpful to me because I'm not a native English speaker.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429642#comment-16429642
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

kou commented on a change in pull request #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852#discussion_r179936915
 
 

 ##
 File path: cpp/src/arrow/builder.cc
 ##
 @@ -1413,6 +1413,41 @@ Status StringBuilder::Append(const 
std::vector& values,
   return Status::OK();
 }
 
+Status StringBuilder::Append(const char** values,
+ int64_t length,
+ const uint8_t* valid_bytes) {
+  std::size_t total_length = 0;
+  std::vector value_lengths(length);
+  for (int64_t i = 0; i < length; ++i) {
+if (values[i]) {
+  auto value_length = strlen(values[i]);
+  value_lengths[i] = value_length;
+  total_length += value_length;
+}
+  }
+  RETURN_NOT_OK(Reserve(length));
+  RETURN_NOT_OK(value_data_builder_.Reserve(total_length));
+  RETURN_NOT_OK(offsets_builder_.Reserve(length));
+
+  if (valid_bytes) {
+for (int64_t i = 0; i < length; ++i) {
+  RETURN_NOT_OK(AppendNextOffset());
+  if (valid_bytes[i]) {
+RETURN_NOT_OK(value_data_builder_.Append(
+reinterpret_cast(values[i]), value_lengths[i]));
+  }
+}
+  } else {
+for (int64_t i = 0; i < length; ++i) {
+  RETURN_NOT_OK(AppendNextOffset());
+  RETURN_NOT_OK(value_data_builder_.Append(
+  reinterpret_cast(values[i]), value_lengths[i]));
 
 Review comment:
   It's interesting.
   I've implemented as the followings:
   
 * If `values[i]` is `NULL` and `valid_bytes` isn't `nullptr`, the value is 
an empty string not a null value. It's for respecting `valid_bytes` data.
 * If `values[i]` is `NULL` and `valid_bytes` is `nullptr`, the value is a 
null value.
   
   But users may be confused...
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429643#comment-16429643
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

kou commented on a change in pull request #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852#discussion_r179936937
 
 

 ##
 File path: cpp/src/arrow/array-test.cc
 ##
 @@ -1022,6 +1022,39 @@ TEST_F(TestStringBuilder, TestAppendVector) {
   }
 }
 
+TEST_F(TestStringBuilder, TestAppendCStrings) {
 
 Review comment:
   Exactly.
   I've added.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429638#comment-16429638
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

kou commented on a change in pull request #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852#discussion_r179936463
 
 

 ##
 File path: cpp/src/arrow/builder.h
 ##
 @@ -720,6 +720,16 @@ class ARROW_EXPORT StringBuilder : public BinaryBuilder {
 
   Status Append(const std::vector& values,
 const uint8_t* valid_bytes = NULLPTR);
+
+  /// \brief Append a sequence of C strings in one shot
 
 Review comment:
   I'm using "C string" as "nul-terminated char *".
   
   I changed to use "nul-terminated char*" instead of "C string". It'll be more 
clearer.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429637#comment-16429637
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

kou commented on a change in pull request #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852#discussion_r179936398
 
 

 ##
 File path: cpp/src/arrow/builder.h
 ##
 @@ -720,6 +720,16 @@ class ARROW_EXPORT StringBuilder : public BinaryBuilder {
 
   Status Append(const std::vector& values,
 const uint8_t* valid_bytes = NULLPTR);
+
+  /// \brief Append a sequence of C strings in one shot
+  /// \param[in] values a contiguous C array of C strings
+  /// \param[in] length the number of values to append
+  /// \param[in] valid_bytes an optional sequence of bytes where non-zero
+  /// indicates a valid (non-null) value
+  /// \return Status
+  Status Append(const char** values,
 
 Review comment:
   Umm. We already use `Append()` for scalar append and vector append in other 
builders.
   If we use separated names for append variations, it may be better that we 
also change the current vector append methods to `AppendValues()` or something 
for consistency.
   
   @xhochy what do you think about this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2415) [Rust] Fix using references in pattern matching

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429592#comment-16429592
 ] 

ASF GitHub Bot commented on ARROW-2415:
---

waywardmonkeys commented on issue #1851: ARROW-2415: [Rust] Fix clippy 
ref-match-pats warnings.
URL: https://github.com/apache/arrow/pull/1851#issuecomment-379516182
 
 
   I'd planned to do subsequent things separately ... there's already a 
separate issue + PR for the `format!` calls for example. That way, each set of 
changes is independent and handled separately if people disagree on the 
usefulness of a particular check.
   
   Note that when `is_empty` is implemented for `Bitmap`, it should be 
`self.bits.is_empty()` hopefully rather than `self.len != 0` ... but I had that 
queued up for a subsequent issue + PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Fix using references in pattern matching
> ---
>
> Key: ARROW-2415
> URL: https://issues.apache.org/jira/browse/ARROW-2415
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Bruce Mitchener
>Priority: Major
>  Labels: pull-request-available
>
> Clippy reports 
> [https://rust-lang-nursery.github.io/rust-clippy/v0.0.191/index.html#match_ref_pats]
>  warnings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2408) [Rust] It should be possible to get a &mut[T] from Builder

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429557#comment-16429557
 ] 

ASF GitHub Bot commented on ARROW-2408:
---

maxim-lian commented on issue #1847: ARROW-2408: [Rust] Remove build warnings
URL: https://github.com/apache/arrow/pull/1847#issuecomment-379506765
 
 
   @xhochy lmk any feedback!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] It should be possible to get a &mut[T] from Builder
> -
>
> Key: ARROW-2408
> URL: https://issues.apache.org/jira/browse/ARROW-2408
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> I am currently adding Arrow support to the parquet-rs crate and I found a 
> need to get a mutable slice from a Buffer to pass to the parquet column 
> reader methods.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2415) [Rust] Fix using references in pattern matching

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429526#comment-16429526
 ] 

ASF GitHub Bot commented on ARROW-2415:
---

maxim-lian commented on issue #1851: ARROW-2415: [Rust] Fix clippy 
ref-match-pats warnings.
URL: https://github.com/apache/arrow/pull/1851#issuecomment-379492997
 
 
   This is (almost) all the remainder of them:
   
   ```diff
   diff --git a/rust/src/array.rs b/rust/src/array.rs
   index 09f0c950..e6ee0241 100644
   --- a/rust/src/array.rs
   +++ b/rust/src/array.rs
   @@ -93,6 +93,9 @@ impl Array {
pub fn len(&self) -> usize {
self.len as usize
}
   +pub fn is_empty(&self) -> bool {
   +self.len == 0
   +}
}

macro_rules! array_from_primitive {
   @@ -137,8 +140,8 @@ macro_rules! array_from_optional_primitive {
fn from(v: Vec>) -> Self {
let mut null_count = 0;
let mut validity_bitmap = Bitmap::new(v.len());
   -for i in 0..v.len() {
   -if v[i].is_none() {
   +for (i, item) in v.iter().enumerate() {
   +if item.is_none() {
null_count += 1;
validity_bitmap.clear(i);
}
   @@ -192,7 +195,7 @@ impl From>> for Array {
len: v.len() as i32,
null_count: 0,
validity_bitmap: None,
   -data: ArrayData::Struct(v.iter().map(|a| a.clone()).collect()),
   +data: ArrayData::Struct(v.iter().cloned().collect()),
}
}
}
   diff --git a/rust/src/bitmap.rs b/rust/src/bitmap.rs
   index 59c65139..264feb4f 100644
   --- a/rust/src/bitmap.rs
   +++ b/rust/src/bitmap.rs
   @@ -42,6 +42,9 @@ impl Bitmap {
pub fn len(&self) -> i32 {
self.bits.len()
}
   +pub fn is_empty(&self) -> bool {
   +self.len() == 0
   +}

pub fn is_set(&self, i: usize) -> bool {
let byte_offset = i / 8;
   diff --git a/rust/src/buffer.rs b/rust/src/buffer.rs
   index 1f2ec6c8..a4907e61 100644
   --- a/rust/src/buffer.rs
   +++ b/rust/src/buffer.rs
   @@ -38,6 +38,9 @@ impl Buffer {
self.len
}

   +pub fn is_empty(&self) -> bool {
   +self.len == 0
   +}
pub fn data(&self) -> *const T {
self.data
}
   @@ -166,6 +169,7 @@ mod tests {
fn test_buffer_i32() {
let b: Buffer = Buffer::from(vec![1, 2, 3, 4, 5]);
assert_eq!(5, b.len);
   +assert_eq!(false, b.is_empty());
}

#[test]
   diff --git a/rust/src/datatypes.rs b/rust/src/datatypes.rs
   index ac2c2c6e..19e85321 100644
   --- a/rust/src/datatypes.rs
   +++ b/rust/src/datatypes.rs
   @@ -40,17 +40,17 @@ pub enum DataType {
impl DataType {
fn from(json: &Value) -> Result {
//println!("DataType::from({:?})", json);
   -match json {
   -&Value::Object(ref map) => match map.get("name") {
   +match *json {
   +Value::Object(ref map) => match map.get("name") {
Some(s) if s == "bool" => Ok(DataType::Boolean),
Some(s) if s == "utf8" => Ok(DataType::Utf8),
Some(s) if s == "floatingpoint" => match 
map.get("precision") {
Some(p) if p == "HALF" => Ok(DataType::Float16),
Some(p) if p == "SINGLE" => Ok(DataType::Float32),
Some(p) if p == "DOUBLE" => Ok(DataType::Float64),
   -_ => Err(ArrowError::ParseError(format!(
   -"floatingpoint precision missing or invalid"
   -))),
   +_ => Err(ArrowError::ParseError(
   +"floatingpoint precision missing or 
invalid".to_string(),
   +)),
},
Some(s) if s == "int" => match map.get("isSigned") {
Some(&Value::Bool(true)) => match map.get("bitWidth") {
   @@ -59,13 +59,13 @@ impl DataType {
Some(16) => Ok(DataType::Int16),
Some(32) => Ok(DataType::Int32),
Some(64) => Ok(DataType::Int32),
   -_ => Err(ArrowError::ParseError(format!(
   -"int bitWidth missing or invalid"
   -))),
   +_ => Err(ArrowError::ParseError(
   +"int bitWidth missing or 
invalid".to_string(),
   +)),
},
   -_ => Err(ArrowError::ParseError(format!(
   -"int bitWidth missing or invalid"
   -   

[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429429#comment-16429429
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

pitrou commented on issue #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852#issuecomment-379478751
 
 
   > This implementation uses not NULL-terminated C strings and the length of 
values instead of NULL-terminated C strings
   
   "NULL-terminated C strings" means each C string is terminated by a nul char, 
not that the array is terminated by a NULL pointer :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429426#comment-16429426
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

pitrou commented on a change in pull request #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852#discussion_r179919647
 
 

 ##
 File path: cpp/src/arrow/builder.cc
 ##
 @@ -1413,6 +1413,41 @@ Status StringBuilder::Append(const 
std::vector& values,
   return Status::OK();
 }
 
+Status StringBuilder::Append(const char** values,
+ int64_t length,
+ const uint8_t* valid_bytes) {
+  std::size_t total_length = 0;
+  std::vector value_lengths(length);
+  for (int64_t i = 0; i < length; ++i) {
+if (values[i]) {
+  auto value_length = strlen(values[i]);
+  value_lengths[i] = value_length;
+  total_length += value_length;
+}
+  }
+  RETURN_NOT_OK(Reserve(length));
+  RETURN_NOT_OK(value_data_builder_.Reserve(total_length));
+  RETURN_NOT_OK(offsets_builder_.Reserve(length));
+
+  if (valid_bytes) {
+for (int64_t i = 0; i < length; ++i) {
+  RETURN_NOT_OK(AppendNextOffset());
+  if (valid_bytes[i]) {
+RETURN_NOT_OK(value_data_builder_.Append(
+reinterpret_cast(values[i]), value_lengths[i]));
+  }
+}
+  } else {
+for (int64_t i = 0; i < length; ++i) {
+  RETURN_NOT_OK(AppendNextOffset());
+  RETURN_NOT_OK(value_data_builder_.Append(
+  reinterpret_cast(values[i]), value_lengths[i]));
 
 Review comment:
   Does this create a non-null array item even if `values[i]` is NULL?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429427#comment-16429427
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

pitrou commented on a change in pull request #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852#discussion_r179919834
 
 

 ##
 File path: cpp/src/arrow/array-test.cc
 ##
 @@ -1022,6 +1022,39 @@ TEST_F(TestStringBuilder, TestAppendVector) {
   }
 }
 
+TEST_F(TestStringBuilder, TestAppendCStrings) {
 
 Review comment:
   You should also write a test without a `valid_bytes` argument.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429425#comment-16429425
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

pitrou commented on a change in pull request #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852#discussion_r179919633
 
 

 ##
 File path: cpp/src/arrow/builder.h
 ##
 @@ -720,6 +720,16 @@ class ARROW_EXPORT StringBuilder : public BinaryBuilder {
 
   Status Append(const std::vector& values,
 const uint8_t* valid_bytes = NULLPTR);
+
+  /// \brief Append a sequence of C strings in one shot
 
 Review comment:
   "nul-terminated C strings".


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429428#comment-16429428
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

pitrou commented on a change in pull request #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852#discussion_r179919627
 
 

 ##
 File path: cpp/src/arrow/builder.h
 ##
 @@ -720,6 +720,16 @@ class ARROW_EXPORT StringBuilder : public BinaryBuilder {
 
   Status Append(const std::vector& values,
 const uint8_t* valid_bytes = NULLPTR);
+
+  /// \brief Append a sequence of C strings in one shot
+  /// \param[in] values a contiguous C array of C strings
+  /// \param[in] length the number of values to append
+  /// \param[in] valid_bytes an optional sequence of bytes where non-zero
+  /// indicates a valid (non-null) value
+  /// \return Status
+  Status Append(const char** values,
 
 Review comment:
   I'd rather have it called `AppendCStrings` as @xhochy proposed. Too many 
overloads can create ambiguity.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2416) [C++] Support system libprotobuf

2018-04-07 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2416:
--
Labels: pull-request-available  (was: )

> [C++] Support system libprotobuf
> 
>
> Key: ARROW-2416
> URL: https://issues.apache.org/jira/browse/ARROW-2416
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2416) [C++] Support system libprotobuf

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429424#comment-16429424
 ] 

ASF GitHub Bot commented on ARROW-2416:
---

kou opened a new pull request #1854: ARROW-2416: [C++] Support system 
libprotobuf
URL: https://github.com/apache/arrow/pull/1854
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Support system libprotobuf
> 
>
> Key: ARROW-2416
> URL: https://issues.apache.org/jira/browse/ARROW-2416
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2416) [C++] Support system libprotobuf

2018-04-07 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-2416:
---

 Summary: [C++] Support system libprotobuf
 Key: ARROW-2416
 URL: https://issues.apache.org/jira/browse/ARROW-2416
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Affects Versions: 0.9.0
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou
 Fix For: 0.10.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429363#comment-16429363
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

kou opened a new pull request #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852
 
 
   This implementation uses not NULL-terminated C strings and the length of 
values instead of NULL-terminated C strings. Because other builder uses the 
interface such as `PrimitiveBuilder::Append()`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-07 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2411:
--
Labels: pull-request-available  (was: )

> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2407) [GLib] Add garrow_string_array_builder_append_values()

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429329#comment-16429329
 ] 

ASF GitHub Bot commented on ARROW-2407:
---

kou commented on a change in pull request #1845: ARROW-2407: [GLib] Add 
garrow_string_array_builder_append_values()
URL: https://github.com/apache/arrow/pull/1845#discussion_r179915478
 
 

 ##
 File path: c_glib/arrow-glib/array-builder.cpp
 ##
 @@ -2184,6 +2184,72 @@ 
garrow_string_array_builder_append(GArrowStringArrayBuilder *builder,
   return garrow_error_check(error, status, "[string-array-builder][append]");
 }
 
+/**
+ * garrow_string_array_builder_append_values:
+ * @builder: A #GArrowStringArrayBuilder.
+ * @values: (array length=values_length): The array of
+ *   strings.
+ * @values_length: The length of `values`.
+ * @is_valids: (nullable) (array length=is_valids_length): The array of
+ *   boolean that shows whether the Nth value is valid or not. If the
+ *   Nth `is_valids` is %TRUE, the Nth `values` is valid value. Otherwise
+ *   the Nth value is null value.
+ * @is_valids_length: The length of `is_valids`.
+ * @error: (nullable): Return location for a #GError or %NULL.
+ *
+ * Append multiple values at once. It's efficient than multiple
+ * `append()` and `append_null()` calls.
+ *
+ * Returns: %TRUE on success, %FALSE if there was an error.
+ *
+ * Since: 0.10.0
+ */
+gboolean
+garrow_string_array_builder_append_values(GArrowStringArrayBuilder *builder,
+  const gchar **values,
+  gint64 values_length,
+  const gboolean *is_valids,
+  gint64 is_valids_length,
+  GError **error)
+{
+  const char *context = "[string-array-builder][append-values]";
+  auto arrow_builder =
+static_cast(
+  garrow_array_builder_get_raw(GARROW_ARRAY_BUILDER(builder)));
+
+  if (is_valids_length > 0) {
+if (values_length != is_valids_length) {
+  g_set_error(error,
+  GARROW_ERROR,
+  GARROW_ERROR_INVALID,
+  "%s: values length and is_valids length must be equal: "
+  "<%" G_GINT64_FORMAT "> != "
+  "<%" G_GINT64_FORMAT ">",
+  context,
+  values_length,
+  is_valids_length);
+  return FALSE;
+}
+  }
+
+  std::vector value_vector;
+  if (is_valids_length > 0) {
+uint8_t valid_bytes[is_valids_length];
+for (gint64 i = 0; i < values_length; ++i) {
+  value_vector.push_back(std::string(values[i]));
+  valid_bytes[i] = is_valids[i];
+}
+auto status = arrow_builder->Append(value_vector, valid_bytes);
 
 Review comment:
   It makes sense.
   I'll create a pull request.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [GLib] Add garrow_string_array_builder_append_values()
> --
>
> Key: ARROW-2407
> URL: https://issues.apache.org/jira/browse/ARROW-2407
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: GLib
>Affects Versions: 0.9.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2407) [GLib] Add garrow_string_array_builder_append_values()

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429328#comment-16429328
 ] 

ASF GitHub Bot commented on ARROW-2407:
---

kou commented on a change in pull request #1845: ARROW-2407: [GLib] Add 
garrow_string_array_builder_append_values()
URL: https://github.com/apache/arrow/pull/1845#discussion_r179915453
 
 

 ##
 File path: c_glib/arrow-glib/array-builder.cpp
 ##
 @@ -2184,6 +2184,72 @@ 
garrow_string_array_builder_append(GArrowStringArrayBuilder *builder,
   return garrow_error_check(error, status, "[string-array-builder][append]");
 }
 
+/**
+ * garrow_string_array_builder_append_values:
+ * @builder: A #GArrowStringArrayBuilder.
+ * @values: (array length=values_length): The array of
+ *   strings.
+ * @values_length: The length of `values`.
+ * @is_valids: (nullable) (array length=is_valids_length): The array of
+ *   boolean that shows whether the Nth value is valid or not. If the
+ *   Nth `is_valids` is %TRUE, the Nth `values` is valid value. Otherwise
+ *   the Nth value is null value.
+ * @is_valids_length: The length of `is_valids`.
+ * @error: (nullable): Return location for a #GError or %NULL.
+ *
+ * Append multiple values at once. It's efficient than multiple
 
 Review comment:
   Thanks!
   I've fixed it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [GLib] Add garrow_string_array_builder_append_values()
> --
>
> Key: ARROW-2407
> URL: https://issues.apache.org/jira/browse/ARROW-2407
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: GLib
>Affects Versions: 0.9.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429308#comment-16429308
 ] 

ASF GitHub Bot commented on ARROW-2328:
---

Adriandorr commented on a change in pull request #1784: ARROW-2328: [C++] Fixed 
and unit tested feather writing with slice
URL: https://github.com/apache/arrow/pull/1784#discussion_r179914221
 
 

 ##
 File path: cpp/src/arrow/ipc/feather-test.cc
 ##
 @@ -406,6 +406,89 @@ TEST_F(TestTableWriter, PrimitiveNullRoundTrip) {
   }
 }
 
+class TestTableWriterSlice : public TestTableWriter,
+ public 
::testing::WithParamInterface> {
+};
+
+TEST_P(TestTableWriterSlice, SliceRoundTrip) {
+  auto p = GetParam();
+  auto start = std::get<0>(p);
+  auto size = std::get<1>(p);
+
+  std::shared_ptr batch;
+  ASSERT_OK(MakeIntBatchSized(start * 2, &batch));
+  batch = batch->Slice(start, size);
+
+  ASSERT_OK(writer_->Append("f0", *batch->column(0)));
+  ASSERT_OK(writer_->Append("f1", *batch->column(1)));
+  Finish();
+
+  std::shared_ptr col;
+  ASSERT_OK(reader_->GetColumn(0, &col));
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(0)));
+  ASSERT_EQ("f0", col->name());
+
+  ASSERT_OK(reader_->GetColumn(1, &col));
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(1)));
+  ASSERT_EQ("f1", col->name());
+}
+
+TEST_P(TestTableWriterSlice, SliceStringsRoundTrip) {
+  auto p = GetParam();
+  auto start = std::get<0>(p);
+  auto size = std::get<1>(p);
+  auto with_nulls = start % 2 == 0;
+  std::shared_ptr batch;
+  ASSERT_OK(MakeStringTypesRecordBatch(&batch, with_nulls));
+  batch = batch->Slice(start, size);
+
+  ASSERT_OK(writer_->Append("f0", *batch->column(0)));
+  ASSERT_OK(writer_->Append("f1", *batch->column(1)));
+  Finish();
+
+  std::shared_ptr col;
+  ASSERT_OK(reader_->GetColumn(0, &col));
+  SCOPED_TRACE(col->data()->chunk(0)->ToString() + "\n" + 
batch->column(0)->ToString());
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(0)));
+  ASSERT_EQ("f0", col->name());
+
+  ASSERT_OK(reader_->GetColumn(1, &col));
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(1)));
+  ASSERT_EQ("f1", col->name());
+}
+
+TEST_P(TestTableWriterSlice, SliceBooleanRoundTrip) {
+  auto p = GetParam();
+  auto start = std::get<0>(p);
+  auto size = std::get<1>(p);
+  std::shared_ptr batch;
+  ASSERT_OK(MakeBooleanBatchSized(600, &batch));
+  batch = batch->Slice(start, size);
+
+  ASSERT_OK(writer_->Append("f0", *batch->column(0)));
+  ASSERT_OK(writer_->Append("f1", *batch->column(1)));
+  Finish();
+
+  std::shared_ptr col;
+  ASSERT_OK(reader_->GetColumn(0, &col));
+  SCOPED_TRACE(col->data()->chunk(0)->ToString() + "\n" + 
batch->column(0)->ToString());
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(0)));
+  ASSERT_EQ("f0", col->name());
+
+  ASSERT_OK(reader_->GetColumn(1, &col));
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(1)));
+  ASSERT_EQ("f1", col->name());
+}
+
+INSTANTIATE_TEST_CASE_P(
+TestTableWriterSliceOffsets, TestTableWriterSlice,
+::testing::Values(std::make_tuple(300, 30), std::make_tuple(301, 30),
 
 Review comment:
   Yes it does, yesterday I was just too lazy to find out how.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429306#comment-16429306
 ] 

ASF GitHub Bot commented on ARROW-2328:
---

Adriandorr commented on a change in pull request #1784: ARROW-2328: [C++] Fixed 
and unit tested feather writing with slice
URL: https://github.com/apache/arrow/pull/1784#discussion_r179914007
 
 

 ##
 File path: cpp/src/arrow/ipc/test-common.h
 ##
 @@ -244,18 +246,22 @@ Status 
MakeStringTypesRecordBatch(std::shared_ptr* out) {
 
   // Quirk with RETURN_NOT_OK macro and templated functions
   {
-auto s = MakeRandomBinaryArray(length, true, pool, 
&a0);
+auto s = MakeRandomBinaryArray(length, with_nulls, 
pool, &a0);
 RETURN_NOT_OK(s);
   }
 
   {
-auto s = MakeRandomBinaryArray(length, true, pool, 
&a1);
+auto s = MakeRandomBinaryArray(length, with_nulls, 
pool, &a1);
 RETURN_NOT_OK(s);
   }
   *out = RecordBatch::Make(schema, length, {a0, a1});
   return Status::OK();
 }
 
+Status MakeStringTypesRecordBatchWithoutNulls(std::shared_ptr* 
out) {
 
 Review comment:
   Oops.. I'll rename it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429305#comment-16429305
 ] 

ASF GitHub Bot commented on ARROW-2328:
---

Adriandorr commented on a change in pull request #1784: ARROW-2328: [C++] Fixed 
and unit tested feather writing with slice
URL: https://github.com/apache/arrow/pull/1784#discussion_r179913956
 
 

 ##
 File path: cpp/src/arrow/ipc/feather-test.cc
 ##
 @@ -406,6 +406,89 @@ TEST_F(TestTableWriter, PrimitiveNullRoundTrip) {
   }
 }
 
+class TestTableWriterSlice : public TestTableWriter,
+ public 
::testing::WithParamInterface> {
+};
+
+TEST_P(TestTableWriterSlice, SliceRoundTrip) {
+  auto p = GetParam();
+  auto start = std::get<0>(p);
+  auto size = std::get<1>(p);
+
+  std::shared_ptr batch;
+  ASSERT_OK(MakeIntBatchSized(start * 2, &batch));
+  batch = batch->Slice(start, size);
+
+  ASSERT_OK(writer_->Append("f0", *batch->column(0)));
 
 Review comment:
   You are right the duplication can be removed without adding any complexity.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429296#comment-16429296
 ] 

ASF GitHub Bot commented on ARROW-2328:
---

pitrou commented on a change in pull request #1784: ARROW-2328: [C++] Fixed and 
unit tested feather writing with slice
URL: https://github.com/apache/arrow/pull/1784#discussion_r179913300
 
 

 ##
 File path: cpp/src/arrow/ipc/test-common.h
 ##
 @@ -244,18 +246,22 @@ Status 
MakeStringTypesRecordBatch(std::shared_ptr* out) {
 
   // Quirk with RETURN_NOT_OK macro and templated functions
   {
-auto s = MakeRandomBinaryArray(length, true, pool, 
&a0);
+auto s = MakeRandomBinaryArray(length, with_nulls, 
pool, &a0);
 RETURN_NOT_OK(s);
   }
 
   {
-auto s = MakeRandomBinaryArray(length, true, pool, 
&a1);
+auto s = MakeRandomBinaryArray(length, with_nulls, 
pool, &a1);
 RETURN_NOT_OK(s);
   }
   *out = RecordBatch::Make(schema, length, {a0, a1});
   return Status::OK();
 }
 
+Status MakeStringTypesRecordBatchWithoutNulls(std::shared_ptr* 
out) {
 
 Review comment:
   Why is it called "...WithoutNulls" if it passes `with_nulls = true`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429295#comment-16429295
 ] 

ASF GitHub Bot commented on ARROW-2328:
---

pitrou commented on a change in pull request #1784: ARROW-2328: [C++] Fixed and 
unit tested feather writing with slice
URL: https://github.com/apache/arrow/pull/1784#discussion_r179913262
 
 

 ##
 File path: cpp/src/arrow/ipc/feather-test.cc
 ##
 @@ -406,6 +406,89 @@ TEST_F(TestTableWriter, PrimitiveNullRoundTrip) {
   }
 }
 
+class TestTableWriterSlice : public TestTableWriter,
+ public 
::testing::WithParamInterface> {
+};
+
+TEST_P(TestTableWriterSlice, SliceRoundTrip) {
+  auto p = GetParam();
+  auto start = std::get<0>(p);
+  auto size = std::get<1>(p);
+
+  std::shared_ptr batch;
+  ASSERT_OK(MakeIntBatchSized(start * 2, &batch));
+  batch = batch->Slice(start, size);
+
+  ASSERT_OK(writer_->Append("f0", *batch->column(0)));
 
 Review comment:
   This part and below is still duplicated in the other two test cases. Do you 
think you can factor that out, for example as a template function?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2328) Writing a slice with feather ignores the offset

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429294#comment-16429294
 ] 

ASF GitHub Bot commented on ARROW-2328:
---

pitrou commented on a change in pull request #1784: ARROW-2328: [C++] Fixed and 
unit tested feather writing with slice
URL: https://github.com/apache/arrow/pull/1784#discussion_r179913230
 
 

 ##
 File path: cpp/src/arrow/ipc/feather-test.cc
 ##
 @@ -406,6 +406,89 @@ TEST_F(TestTableWriter, PrimitiveNullRoundTrip) {
   }
 }
 
+class TestTableWriterSlice : public TestTableWriter,
+ public 
::testing::WithParamInterface> {
+};
+
+TEST_P(TestTableWriterSlice, SliceRoundTrip) {
+  auto p = GetParam();
+  auto start = std::get<0>(p);
+  auto size = std::get<1>(p);
+
+  std::shared_ptr batch;
+  ASSERT_OK(MakeIntBatchSized(start * 2, &batch));
+  batch = batch->Slice(start, size);
+
+  ASSERT_OK(writer_->Append("f0", *batch->column(0)));
+  ASSERT_OK(writer_->Append("f1", *batch->column(1)));
+  Finish();
+
+  std::shared_ptr col;
+  ASSERT_OK(reader_->GetColumn(0, &col));
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(0)));
+  ASSERT_EQ("f0", col->name());
+
+  ASSERT_OK(reader_->GetColumn(1, &col));
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(1)));
+  ASSERT_EQ("f1", col->name());
+}
+
+TEST_P(TestTableWriterSlice, SliceStringsRoundTrip) {
+  auto p = GetParam();
+  auto start = std::get<0>(p);
+  auto size = std::get<1>(p);
+  auto with_nulls = start % 2 == 0;
+  std::shared_ptr batch;
+  ASSERT_OK(MakeStringTypesRecordBatch(&batch, with_nulls));
+  batch = batch->Slice(start, size);
+
+  ASSERT_OK(writer_->Append("f0", *batch->column(0)));
+  ASSERT_OK(writer_->Append("f1", *batch->column(1)));
+  Finish();
+
+  std::shared_ptr col;
+  ASSERT_OK(reader_->GetColumn(0, &col));
+  SCOPED_TRACE(col->data()->chunk(0)->ToString() + "\n" + 
batch->column(0)->ToString());
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(0)));
+  ASSERT_EQ("f0", col->name());
+
+  ASSERT_OK(reader_->GetColumn(1, &col));
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(1)));
+  ASSERT_EQ("f1", col->name());
+}
+
+TEST_P(TestTableWriterSlice, SliceBooleanRoundTrip) {
+  auto p = GetParam();
+  auto start = std::get<0>(p);
+  auto size = std::get<1>(p);
+  std::shared_ptr batch;
+  ASSERT_OK(MakeBooleanBatchSized(600, &batch));
+  batch = batch->Slice(start, size);
+
+  ASSERT_OK(writer_->Append("f0", *batch->column(0)));
+  ASSERT_OK(writer_->Append("f1", *batch->column(1)));
+  Finish();
+
+  std::shared_ptr col;
+  ASSERT_OK(reader_->GetColumn(0, &col));
+  SCOPED_TRACE(col->data()->chunk(0)->ToString() + "\n" + 
batch->column(0)->ToString());
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(0)));
+  ASSERT_EQ("f0", col->name());
+
+  ASSERT_OK(reader_->GetColumn(1, &col));
+  ASSERT_TRUE(col->data()->chunk(0)->Equals(batch->column(1)));
+  ASSERT_EQ("f1", col->name());
+}
+
+INSTANTIATE_TEST_CASE_P(
+TestTableWriterSliceOffsets, TestTableWriterSlice,
+::testing::Values(std::make_tuple(300, 30), std::make_tuple(301, 30),
 
 Review comment:
   Hmm... Does gtest allow making a Cartesian product here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Writing a slice with feather ignores the offset
> ---
>
> Key: ARROW-2328
> URL: https://issues.apache.org/jira/browse/ARROW-2328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Adrian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Writing a slice from row n of length m of an array to feather would write the 
> first m rows, instead of the rows starting at n.
> The null bitmap also ends up misaligned. Also tested and fixed in the pull 
> request below.
>  I've created a pull request with tests and fix here: 
> [Pullrequest#1766|https://github.com/apache/arrow/pull/1766]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-07 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429289#comment-16429289
 ] 

Krisztian Szucs commented on ARROW-2391:


Confirmed, it segfaults with the latest master.

> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2406) [Python] Segfault when creating PyArrow table from Pandas for empty string column when schema provided

2018-04-07 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429288#comment-16429288
 ] 

Krisztian Szucs commented on ARROW-2406:


Couldn't reproduce with:

OS: High Sierra

python3.6.3 4conda-forge
pyarrow   0.9.0py36_1conda-forge

How did You install pyarrow?

> [Python] Segfault when creating PyArrow table from Pandas for empty string 
> column when schema provided
> --
>
> Key: ARROW-2406
> URL: https://issues.apache.org/jira/browse/ARROW-2406
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6.3
>Reporter: Dave Challis
>Priority: Major
> Fix For: 0.10.0
>
>
> Minimal example to recreate:
> {code}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'a': []})
> df['a'] = df['a'].astype(str)
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema){code}
>  
> This causes the python interpreter to exit with "Segmentation fault: 11".
> The following examples all work without any issue:
> {code}
> # column 'a' is no longer empty
> df = pd.DataFrame({'a': ['foo']})
> df['a'] = df['a'].astype(str)
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> {code}
> # column 'a' is empty, but no schema is specified
> df = pd.DataFrame({'a': []})
> df['a'] = df['a'].astype(str)
> pa.Table.from_pandas(df)
> {code}
> {code}
> # column 'a' is empty, but no type 'str' specified in Pandas
> df = pd.DataFrame({'a': []})
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2415) [Rust] Fix using references in pattern matching

2018-04-07 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2415:
--
Labels: pull-request-available  (was: )

> [Rust] Fix using references in pattern matching
> ---
>
> Key: ARROW-2415
> URL: https://issues.apache.org/jira/browse/ARROW-2415
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Bruce Mitchener
>Priority: Major
>  Labels: pull-request-available
>
> Clippy reports 
> [https://rust-lang-nursery.github.io/rust-clippy/v0.0.191/index.html#match_ref_pats]
>  warnings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2415) [Rust] Fix using references in pattern matching

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429272#comment-16429272
 ] 

ASF GitHub Bot commented on ARROW-2415:
---

waywardmonkeys opened a new pull request #1851: ARROW-2415: [Rust] Fix clippy 
ref-match-pats warnings.
URL: https://github.com/apache/arrow/pull/1851
 
 
   It isn't necessary to use a reference in each pattern in a pattern
   match and it is more readable to just dereference the match value.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Fix using references in pattern matching
> ---
>
> Key: ARROW-2415
> URL: https://issues.apache.org/jira/browse/ARROW-2415
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Bruce Mitchener
>Priority: Major
>  Labels: pull-request-available
>
> Clippy reports 
> [https://rust-lang-nursery.github.io/rust-clippy/v0.0.191/index.html#match_ref_pats]
>  warnings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2415) [Rust] Fix using references in pattern matching

2018-04-07 Thread Bruce Mitchener (JIRA)
Bruce Mitchener created ARROW-2415:
--

 Summary: [Rust] Fix using references in pattern matching
 Key: ARROW-2415
 URL: https://issues.apache.org/jira/browse/ARROW-2415
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Bruce Mitchener


Clippy reports 
[https://rust-lang-nursery.github.io/rust-clippy/v0.0.191/index.html#match_ref_pats]
 warnings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2414) A variety of typos can be found

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429270#comment-16429270
 ] 

ASF GitHub Bot commented on ARROW-2414:
---

waywardmonkeys opened a new pull request #1850: ARROW-2414: Fix a variety of 
typos.
URL: https://github.com/apache/arrow/pull/1850
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> A variety of typos can be found
> ---
>
> Key: ARROW-2414
> URL: https://issues.apache.org/jira/browse/ARROW-2414
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Bruce Mitchener
>Priority: Trivial
>  Labels: pull-request-available
>
> This is just so that I can submit a PR for a bunch of typo fixes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2414) A variety of typos can be found

2018-04-07 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2414:
--
Labels: pull-request-available  (was: )

> A variety of typos can be found
> ---
>
> Key: ARROW-2414
> URL: https://issues.apache.org/jira/browse/ARROW-2414
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Bruce Mitchener
>Priority: Trivial
>  Labels: pull-request-available
>
> This is just so that I can submit a PR for a bunch of typo fixes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2414) A variety of typos can be found

2018-04-07 Thread Bruce Mitchener (JIRA)
Bruce Mitchener created ARROW-2414:
--

 Summary: A variety of typos can be found
 Key: ARROW-2414
 URL: https://issues.apache.org/jira/browse/ARROW-2414
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Bruce Mitchener


This is just so that I can submit a PR for a bunch of typo fixes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2413) [Rust] Remove useless use of `format!`

2018-04-07 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2413:
--
Labels: pull-request-available  (was: )

> [Rust] Remove useless use of `format!`
> --
>
> Key: ARROW-2413
> URL: https://issues.apache.org/jira/browse/ARROW-2413
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Bruce Mitchener
>Priority: Minor
>  Labels: pull-request-available
>
> Running clippy on Arrow's Rust implementation shows a number of places where 
> {{format!}} is being called when it isn't necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2413) [Rust] Remove useless use of `format!`

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429267#comment-16429267
 ] 

ASF GitHub Bot commented on ARROW-2413:
---

waywardmonkeys opened a new pull request #1849: ARROW-2413: Remove useless 
calls to format!().
URL: https://github.com/apache/arrow/pull/1849
 
 
   When there are no arguments to be formatted, we might as well
   just call `to_string` instead.  This fixes a number of warnings
   from clippy.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Remove useless use of `format!`
> --
>
> Key: ARROW-2413
> URL: https://issues.apache.org/jira/browse/ARROW-2413
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Bruce Mitchener
>Priority: Minor
>  Labels: pull-request-available
>
> Running clippy on Arrow's Rust implementation shows a number of places where 
> {{format!}} is being called when it isn't necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2413) [Rust] Remove useless use of `format!`

2018-04-07 Thread Bruce Mitchener (JIRA)
Bruce Mitchener created ARROW-2413:
--

 Summary: [Rust] Remove useless use of `format!`
 Key: ARROW-2413
 URL: https://issues.apache.org/jira/browse/ARROW-2413
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Bruce Mitchener


Running clippy on Arrow's Rust implementation shows a number of places where 
{{format!}} is being called when it isn't necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)