This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/master by this push: new 6b496f7 ARROW-3997: [Documentation] Clarify dictionary index type 6b496f7 is described below commit 6b496f7c1929a0a371fe708ae653228a9e722150 Author: Antoine Pitrou <anto...@python.org> AuthorDate: Wed Jan 9 16:16:40 2019 -0600 ARROW-3997: [Documentation] Clarify dictionary index type Mandate signed integers for dictionary index types, without constraining integer width. Author: Antoine Pitrou <anto...@python.org> Closes #3355 from pitrou/ARROW-3997-dictionary-encoding-doc and squashes the following commits: 4e05e2642 <Antoine Pitrou> ARROW-3997: Clarify dictionary index type --- docs/source/format/Layout.rst | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/docs/source/format/Layout.rst b/docs/source/format/Layout.rst index 69cbf06..f3e5290 100644 --- a/docs/source/format/Layout.rst +++ b/docs/source/format/Layout.rst @@ -614,13 +614,13 @@ Dictionary encoding ------------------- When a field is dictionary encoded, the values are represented by an array of -Int32 representing the index of the value in the dictionary. The Dictionary is -received as one or more DictionaryBatches with the id referenced by a -dictionary attribute defined in the metadata (Message.fbs) in the Field -table. The dictionary has the same layout as the type of the field would -dictate. Each entry in the dictionary can be accessed by its index in the -DictionaryBatches. When a Schema references a Dictionary id, it must send at -least one DictionaryBatch for this id. +signed integers representing the index of the value in the dictionary. +The Dictionary is received as one or more DictionaryBatches with the id +referenced by a dictionary attribute defined in the metadata (Message.fbs) +in the Field table. The dictionary has the same layout as the type of the +field would dictate. Each entry in the dictionary can be accessed by its +index in the DictionaryBatches. When a Schema references a Dictionary id, +it must send at least one DictionaryBatch for this id. As an example, you could have the following data: :: @@ -640,16 +640,17 @@ As an example, you could have the following data: :: In dictionary-encoded form, this could appear as: :: data List<String> (dictionary-encoded, dictionary id i) - indices: [0, 0, 0, 1, 1, 1, 0] + type: Int32 + values: + [0, 0, 0, 1, 1, 1, 0] dictionary i - - type: List<String> - - [ - ['a', 'b'], - ['c', 'd', 'e'], - ] + type: List<String> + values: + [ + ['a', 'b'], + ['c', 'd', 'e'], + ] References ----------