[jira] [Commented] (ARROW-360) C++: Add method to shrink PoolBuffer using realloc

2017-01-06 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805217#comment-15805217
 ] 

Uwe L. Korn commented on ARROW-360:
---

PR: https://github.com/apache/arrow/pull/272

> C++: Add method to shrink PoolBuffer using realloc
> --
>
> Key: ARROW-360
> URL: https://issues.apache.org/jira/browse/ARROW-360
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>
> In the case where we have optimistically allocated a large PoolBuffer, we 
> could shrink it later again using a call to {{realloc}}. This should free the 
> exceeding memory but avoids an actual {{memcpy}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-96) C++: API documentation using Doxygen

2017-01-06 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-96?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805024#comment-15805024
 ] 

Uwe L. Korn commented on ARROW-96:
--

PR: https://github.com/apache/arrow/pull/271

> C++: API documentation using Doxygen 
> -
>
> Key: ARROW-96
> URL: https://issues.apache.org/jira/browse/ARROW-96
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>
> For the developers using Arrow via C++, we should provide an automatically 
> generated API documentation via doxygen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-462) [C++] Implement in-memory conversions between non-nested primitive types and DictionaryArray equivalent

2017-01-06 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804932#comment-15804932
 ] 

Uwe L. Korn commented on ARROW-462:
---

Ah, that makes sense. This may be possible to provide with 
{{std::unordered_map}} but maybe not in a simple way.

> [C++] Implement in-memory conversions between non-nested primitive types and 
> DictionaryArray equivalent
> ---
>
> Key: ARROW-462
> URL: https://issues.apache.org/jira/browse/ARROW-462
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>
> We use a hash table to extract unique values and dictionary indices. There 
> may be an opportunity to consolidate common code from the dictionary encoding 
> implementation implemented in parquet-cpp (but the dictionary indices will 
> not be run-length encoded in Arrow):
> https://github.com/apache/parquet-cpp/blob/master/src/parquet/encodings/dictionary-encoding.h



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-462) [C++] Implement in-memory conversions between non-nested primitive types and DictionaryArray equivalent

2017-01-06 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804922#comment-15804922
 ] 

Wes McKinney commented on ARROW-462:


One issue is the handling of the hash keys (e.g. strings). After performing the 
hash table pass, you would like to minimize time to create the final dictionary 
and indices arrays. We can run various performance experiments and choose 
whatever yields best performance for simplicity. 

> [C++] Implement in-memory conversions between non-nested primitive types and 
> DictionaryArray equivalent
> ---
>
> Key: ARROW-462
> URL: https://issues.apache.org/jira/browse/ARROW-462
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>
> We use a hash table to extract unique values and dictionary indices. There 
> may be an opportunity to consolidate common code from the dictionary encoding 
> implementation implemented in parquet-cpp (but the dictionary indices will 
> not be run-length encoded in Arrow):
> https://github.com/apache/parquet-cpp/blob/master/src/parquet/encodings/dictionary-encoding.h



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-462) [C++] Implement in-memory conversions between non-nested primitive types and DictionaryArray equivalent

2017-01-06 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804914#comment-15804914
 ] 

Uwe L. Korn commented on ARROW-462:
---

Might be also a point to reconsider if it's worth to have a custom hash-table 
implementation or if using {{std:unordered_map}} is leaving us with the same 
performance.

> [C++] Implement in-memory conversions between non-nested primitive types and 
> DictionaryArray equivalent
> ---
>
> Key: ARROW-462
> URL: https://issues.apache.org/jira/browse/ARROW-462
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>
> We use a hash table to extract unique values and dictionary indices. There 
> may be an opportunity to consolidate common code from the dictionary encoding 
> implementation implemented in parquet-cpp (but the dictionary indices will 
> not be run-length encoded in Arrow):
> https://github.com/apache/parquet-cpp/blob/master/src/parquet/encodings/dictionary-encoding.h



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ARROW-462) [C++] Implement in-memory conversions between non-nested primitive types and DictionaryArray equivalent

2017-01-06 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-462:
--

 Summary: [C++] Implement in-memory conversions between non-nested 
primitive types and DictionaryArray equivalent
 Key: ARROW-462
 URL: https://issues.apache.org/jira/browse/ARROW-462
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney


We use a hash table to extract unique values and dictionary indices. There may 
be an opportunity to consolidate common code from the dictionary encoding 
implementation implemented in parquet-cpp (but the dictionary indices will not 
be run-length encoded in Arrow):

https://github.com/apache/parquet-cpp/blob/master/src/parquet/encodings/dictionary-encoding.h



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)