Re: welcome a new batch of committers

2018-10-03 Thread Daniel Mateus Pires
That's great news; welcome everyone! :)

Daniel

On Wed, 3 Oct 2018 at 09:59, Reynold Xin  wrote:

> Hi all,
>
> The Apache Spark PMC has recently voted to add several new committers to
> the project, for their contributions:
>
> - Shane Knapp (contributor to infra)
> - Dongjoon Hyun (contributor to ORC support and other parts of Spark)
> - Kazuaki Ishizaki (contributor to Spark SQL)
> - Xingbo Jiang (contributor to Spark Core and SQL)
> - Yinan Li (contributor to Spark on Kubernetes)
> - Takeshi Yamamuro (contributor to Spark SQL)
>
> Please join me in welcoming them!
>
>


Re: [Spark SQL] Future of CalendarInterval

2018-07-29 Thread Daniel Mateus Pires
Sounds good! @Xiao

@Reynold AFAIK the only data type that is valid to cast to Calendar
Interval is VARCHAR

here is Postgres:

postgres=# select CAST(CAST(interval '1 hour' AS varchar) AS interval);
 interval
--
 01:00:00
(1 row)

(snippet comes from the JIRA)

Thanks,

Daniel


On 27 July 2018 at 20:38, Xiao Li  wrote:

> The code freeze of the upcoming release Spark 2.4 is very close. How about
> revisiting this and explicitly defining the support scope
> of CalendarIntervalType in the next release (Spark 3.0)?
>
> Thanks,
>
> Xiao
>
>
> 2018-07-27 10:45 GMT-07:00 Reynold Xin :
>
>> CalendarInterval is definitely externally visible.
>>
>> E.g. sql("select interval 1 day").dtypes would return "Array[(String,
>> String)] = Array((interval 1 days,CalendarIntervalType))"
>>
>> However, I'm not sure what it means to support casting. What are the
>> semantics for casting from any other data type to calendar interval? I can
>> see string casting and casting from itself, but not any other data types.
>>
>>
>>
>>
>> On Fri, Jul 27, 2018 at 10:34 AM Daniel Mateus Pires 
>> wrote:
>>
>>> Hi Sparkers! (maybe Sparkles ?)
>>>
>>> I just wanted to bring up the apparently ?controversial? Calendar
>>> Interval topic.
>>>
>>> I worked on: https://issues.apache.org/jira/browse/SPARK-24702, https
>>> ://github.com/apache/spark/pull/21706
>>>
>>> The user was reporting an unexpected behaviour where he/she wasn’t able
>>> to cast to a Calendar Interval type.
>>>
>>> In the current version of Spark the following code works:
>>>
>>> scala> spark.sql("SELECT 'interval 1 hour' as 
>>> a").select(col("a").cast("calendarinterval")).show()++| 
>>>   a|++|interval 1 hours|++
>>>
>>>
>>> While the following doesn’t:
>>> spark.sql("SELECT CALENDARINTERVAL('interval 1 hour') as a").show()
>>>
>>>
>>> Since the DataFrame API equivalent of the SQL worked, I thought adding
>>> it would be an easy decision to make (to make it consistent)
>>>
>>> However, I got push-back on the PR on the basis that “*we do not plan
>>> to expose Calendar Interval as a public type*”
>>> Should there be a consensus on either cleaning up the public DataFrame
>>> API out of CalendarIntervalType OR making it consistent with the SQL ?
>>>
>>> --
>>> Best regards,
>>> Daniel Mateus Pires
>>> Data Engineer @ Hudson's Bay Company
>>>
>>
>


[Spark SQL] Future of CalendarInterval

2018-07-27 Thread Daniel Mateus Pires
Hi Sparkers! (maybe Sparkles ?)

I just wanted to bring up the apparently ?controversial? Calendar Interval 
topic.

I worked on: https://issues.apache.org/jira/browse/SPARK-24702 
<https://issues.apache.org/jira/browse/SPARK-24702>, 
https://github.com/apache/spark/pull/21706 
<https://github.com/apache/spark/pull/21706>

The user was reporting an unexpected behaviour where he/she wasn’t able to cast 
to a Calendar Interval type.

In the current version of Spark the following code works:
scala> spark.sql("SELECT 'interval 1 hour' as 
a").select(col("a").cast("calendarinterval")).show()
++
|   a|
++
|interval 1 hours|
++

While the following doesn’t:
spark.sql("SELECT CALENDARINTERVAL('interval 1 hour') as a").show()


Since the DataFrame API equivalent of the SQL worked, I thought adding it would 
be an easy decision to make (to make it consistent)

However, I got push-back on the PR on the basis that “we do not plan to expose 
Calendar Interval as a public type”
Should there be a consensus on either cleaning up the public DataFrame API out 
of CalendarIntervalType OR making it consistent with the SQL ?

--
Best regards,
Daniel Mateus Pires
Data Engineer @ Hudson's Bay Company