[jira] [Commented] (ARROW-257) Add a typeids Vector to Union type

2016-09-22 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514435#comment-15514435
 ] 

Julien Le Dem commented on ARROW-257:
-

The current java implementation uses the ordinal in the MinorType to denote the 
type id in the type vector.
However the Arrow spec defines it as the index in the children of the Field.
This JIRA is a way to reconcile the too.
When the Vector is not using the child index as a type id it provides the ids 
in the typeIds field. (typeIds is the same length as the children in the Field)

> Add a typeids Vector to Union type
> --
>
> Key: ARROW-257
> URL: https://issues.apache.org/jira/browse/ARROW-257
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>
> {noformat}
> enum UnionMode:int { Sparse, Dense }
> table Union {
>   mode: UnionMode;
>   typeIds: [Int32]; // optional, describes typeid of each child.
> }
> {noformat}
> The idea is to enable providing an id different from the child offset (the 
> default)
> This enables an optimization where we use predefined ids when constructing 
> the type vector of the union but want the children to be only the actually 
> used types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ARROW-277) Flatbuf serialization fails for Timestamp type

2016-09-22 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved ARROW-277.
-
Resolution: Duplicate
  Assignee: Julien Le Dem

> Flatbuf serialization fails for Timestamp type
> --
>
> Key: ARROW-277
> URL: https://issues.apache.org/jira/browse/ARROW-277
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Julien Le Dem
>
>  Caused By (java.lang.AssertionError) FlatBuffers: object serialization must 
> not be nested.
> com.google.flatbuffers.FlatBufferBuilder.notNested():293
> com.google.flatbuffers.FlatBufferBuilder.startVector():239
> com.google.flatbuffers.FlatBufferBuilder.createString():266
> org.apache.arrow.vector.types.pojo.ArrowType$Timestamp.getType():463
> org.apache.arrow.vector.types.pojo.Field.getField():63
> org.apache.arrow.vector.types.pojo.Schema.getSchema():41



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ARROW-270) [Format] Define more generic Interval logical type

2016-09-22 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated ARROW-270:

Component/s: Format

> [Format] Define more generic Interval logical type
> --
>
> Key: ARROW-270
> URL: https://issues.apache.org/jira/browse/ARROW-270
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Wes McKinney
>Assignee: Julien Le Dem
>
> Per discussion in 
> https://github.com/apache/arrow/commit/e7e399db5fc6913e67426514279f81766a0778d2#commitcomment-18711366,
>  we can create an {{Interval}} type with a unit to be more general.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (ARROW-270) [Format] Define more generic Interval logical type

2016-09-22 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem reassigned ARROW-270:
---

Assignee: Julien Le Dem

> [Format] Define more generic Interval logical type
> --
>
> Key: ARROW-270
> URL: https://issues.apache.org/jira/browse/ARROW-270
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Wes McKinney
>Assignee: Julien Le Dem
>
> Per discussion in 
> https://github.com/apache/arrow/commit/e7e399db5fc6913e67426514279f81766a0778d2#commitcomment-18711366,
>  we can create an {{Interval}} type with a unit to be more general.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-257) Add a typeids Vector to Union type

2016-09-22 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514043#comment-15514043
 ] 

Steven Phillips commented on ARROW-257:
---

I don't understand that purpose or benefit of this change. Could you give a 
concrete example of where this would be useful?

> Add a typeids Vector to Union type
> --
>
> Key: ARROW-257
> URL: https://issues.apache.org/jira/browse/ARROW-257
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>
> {noformat}
> enum UnionMode:int { Sparse, Dense }
> table Union {
>   mode: UnionMode;
>   typeIds: [Int32]; // optional, describes typeid of each child.
> }
> {noformat}
> The idea is to enable providing an id different from the child offset (the 
> default)
> This enables an optimization where we use predefined ids when constructing 
> the type vector of the union but want the children to be only the actually 
> used types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-288) Implement Arrow adapter for Spark Datasets

2016-09-22 Thread Jacek Laskowski (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513039#comment-15513039
 ] 

Jacek Laskowski commented on ARROW-288:
---

I've scheduled a [Spark/Scala 
meetup|http://www.meetup.com/WarsawScala/events/234156519/] next week and found 
the issue that we could help with somehow. We've got no experience with Arrow 
but quite fine with Spark SQL's Datasets.

Could you [~wesmckinn] or [~julienledem] describe the very small steps needed 
for the task? They could also just be a subtasks of the "umbrella" task. Thanks.

> Implement Arrow adapter for Spark Datasets
> --
>
> Key: ARROW-288
> URL: https://issues.apache.org/jira/browse/ARROW-288
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Java - Vectors
>Reporter: Wes McKinney
>
> It would be valuable for applications that use Arrow to be able to 
> * Convert between Spark DataFrames/Datasets and Java Arrow vectors
> * Send / Receive Arrow record batches / Arrow file format RPCs to / from 
> Spark 
> * Allow PySpark to use Arrow for messaging in UDF evaluation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)