Re: spark-avro aliases incompatible

2017-11-07 Thread Gaspar Muñoz
In the doc you refer:

// The Avro records get converted to Spark types, filtered, and// then
written back out as Avro recordsval df =
spark.read.avro("/tmp/episodes.avro")df.filter("doctor >
5").write.avro("/tmp/output")

Alternatively you can specify the format to use instead:
[image: Copy to clipboard]Copy

val df = spark.read
.format("com.databricks.spark.avro")
.load("/tmp/episodes.avro")

As far as I know  spark-avro is not built-in in spark 2.x. That is not the
problem, because also in that databricks doc said: *"At the moment, it
ignores docs, aliases and other properties present in the Avro file."*

Regards.


2017-11-06 22:29 GMT+01:00 Gourav Sengupta :

> Hi,
>
> I may be wrong about this, but when you are using format("") you are
> basically using old SPARK classes, which still exists because of backward
> compatibility.
>
> Please refer to the following documentation to take advantage of the
> recent changes in SPARK: https://docs.databricks.com/spark/latest/
> data-sources/read-avro.html
>
> Kindly let us know how things are going on.
>
> Regards,
> Gourav Sengupta
>
> On Mon, Nov 6, 2017 at 8:04 PM, Gaspar Muñoz  wrote:
>
>> Of course,
>>
>> right now I'm trying in local with spark 2.2.0 and spark-avro 4.0.0.
>> I've just uploaded a snippet https://gist.github.co
>> m/gasparms/5d0740bd61a500357e0230756be963e1
>>
>> Basically, my avro schema has a field with an alias and in the last part
>> of code spark-avro is not able to read old data with old name using the
>> alias.
>>
>> In spark-avro library Readme said that is not supported and I am asking
>> if any of you has a workaround or how do you manage schema evolution?
>>
>> Regards.
>>
>> 2017-11-05 20:13 GMT+01:00 Gourav Sengupta :
>>
>>> Hi Gaspar,
>>>
>>> can you please provide the details regarding the environment, versions,
>>> libraries and code snippets please?
>>>
>>> For example: SPARK version, OS, distribution, running on YARN, etc and
>>> all other details.
>>>
>>>
>>> Regards,
>>> Gourav Sengupta
>>>
>>> On Sun, Nov 5, 2017 at 9:03 AM, Gaspar Muñoz  wrote:
>>>
 Hi there,

 I use avro format to store historical due to avro schema evolution. I
 manage external schemas and read  them using avroSchema option so we have
 been able to add and delete columns.

 The problem is when I introduced aliases and Spark process didn't work
 as expected and then I read in spark-avro library "At the moment, it
 ignores docs, aliases and other properties present in the Avro file".

 How do you manage aliases and column renaming? Is there any workaround?

 Thanks in advance.

 Regards

 --
 Gaspar Muñoz Soria

 Vía de las dos Castillas, 33
 ,
 Ática 4, 3ª Planta
 28224 Pozuelo de Alarcón, Madrid
 Tel: +34 91 828 6473

>>>
>>>
>>
>>
>> --
>> Gaspar Muñoz Soria
>>
>> Vía de las dos Castillas, 33
>> ,
>> Ática 4, 3ª Planta
>> 28224 Pozuelo de Alarcón, Madrid
>> Tel: +34 91 828 6473
>>
>
>


-- 
Gaspar Muñoz Soria

Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473


Re: spark-avro aliases incompatible

2017-11-06 Thread Gourav Sengupta
Hi,

I may be wrong about this, but when you are using format("") you are
basically using old SPARK classes, which still exists because of backward
compatibility.

Please refer to the following documentation to take advantage of the recent
changes in SPARK:
https://docs.databricks.com/spark/latest/data-sources/read-avro.html

Kindly let us know how things are going on.

Regards,
Gourav Sengupta

On Mon, Nov 6, 2017 at 8:04 PM, Gaspar Muñoz  wrote:

> Of course,
>
> right now I'm trying in local with spark 2.2.0 and spark-avro 4.0.0.  I've
> just uploaded a snippet https://gist.github.com/gasparms/
> 5d0740bd61a500357e0230756be963e1
>
> Basically, my avro schema has a field with an alias and in the last part
> of code spark-avro is not able to read old data with old name using the
> alias.
>
> In spark-avro library Readme said that is not supported and I am asking if
> any of you has a workaround or how do you manage schema evolution?
>
> Regards.
>
> 2017-11-05 20:13 GMT+01:00 Gourav Sengupta :
>
>> Hi Gaspar,
>>
>> can you please provide the details regarding the environment, versions,
>> libraries and code snippets please?
>>
>> For example: SPARK version, OS, distribution, running on YARN, etc and
>> all other details.
>>
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Sun, Nov 5, 2017 at 9:03 AM, Gaspar Muñoz  wrote:
>>
>>> Hi there,
>>>
>>> I use avro format to store historical due to avro schema evolution. I
>>> manage external schemas and read  them using avroSchema option so we have
>>> been able to add and delete columns.
>>>
>>> The problem is when I introduced aliases and Spark process didn't work
>>> as expected and then I read in spark-avro library "At the moment, it
>>> ignores docs, aliases and other properties present in the Avro file".
>>>
>>> How do you manage aliases and column renaming? Is there any workaround?
>>>
>>> Thanks in advance.
>>>
>>> Regards
>>>
>>> --
>>> Gaspar Muñoz Soria
>>>
>>> Vía de las dos Castillas, 33
>>> ,
>>> Ática 4, 3ª Planta
>>> 28224 Pozuelo de Alarcón, Madrid
>>> Tel: +34 91 828 6473
>>>
>>
>>
>
>
> --
> Gaspar Muñoz Soria
>
> Vía de las dos Castillas, 33
> ,
> Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473
>


Re: spark-avro aliases incompatible

2017-11-06 Thread Gaspar Muñoz
Of course,

right now I'm trying in local with spark 2.2.0 and spark-avro 4.0.0.  I've
just uploaded a snippet
https://gist.github.com/gasparms/5d0740bd61a500357e0230756be963e1

Basically, my avro schema has a field with an alias and in the last part of
code spark-avro is not able to read old data with old name using the alias.

In spark-avro library Readme said that is not supported and I am asking if
any of you has a workaround or how do you manage schema evolution?

Regards.

2017-11-05 20:13 GMT+01:00 Gourav Sengupta :

> Hi Gaspar,
>
> can you please provide the details regarding the environment, versions,
> libraries and code snippets please?
>
> For example: SPARK version, OS, distribution, running on YARN, etc and all
> other details.
>
>
> Regards,
> Gourav Sengupta
>
> On Sun, Nov 5, 2017 at 9:03 AM, Gaspar Muñoz  wrote:
>
>> Hi there,
>>
>> I use avro format to store historical due to avro schema evolution. I
>> manage external schemas and read  them using avroSchema option so we have
>> been able to add and delete columns.
>>
>> The problem is when I introduced aliases and Spark process didn't work as
>> expected and then I read in spark-avro library "At the moment, it ignores
>> docs, aliases and other properties present in the Avro file".
>>
>> How do you manage aliases and column renaming? Is there any workaround?
>>
>> Thanks in advance.
>>
>> Regards
>>
>> --
>> Gaspar Muñoz Soria
>>
>> Vía de las dos Castillas, 33
>> ,
>> Ática 4, 3ª Planta
>> 28224 Pozuelo de Alarcón, Madrid
>> Tel: +34 91 828 6473
>>
>
>


-- 
Gaspar Muñoz Soria

Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473


Re: spark-avro aliases incompatible

2017-11-05 Thread Gourav Sengupta
Hi Gaspar,

can you please provide the details regarding the environment, versions,
libraries and code snippets please?

For example: SPARK version, OS, distribution, running on YARN, etc and all
other details.


Regards,
Gourav Sengupta

On Sun, Nov 5, 2017 at 9:03 AM, Gaspar Muñoz  wrote:

> Hi there,
>
> I use avro format to store historical due to avro schema evolution. I
> manage external schemas and read  them using avroSchema option so we have
> been able to add and delete columns.
>
> The problem is when I introduced aliases and Spark process didn't work as
> expected and then I read in spark-avro library "At the moment, it ignores
> docs, aliases and other properties present in the Avro file".
>
> How do you manage aliases and column renaming? Is there any workaround?
>
> Thanks in advance.
>
> Regards
>
> --
> Gaspar Muñoz Soria
>
> Vía de las dos Castillas, 33
> ,
> Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473
>


spark-avro aliases incompatible

2017-11-05 Thread Gaspar Muñoz
Hi there,

I use avro format to store historical due to avro schema evolution. I
manage external schemas and read  them using avroSchema option so we have
been able to add and delete columns.

The problem is when I introduced aliases and Spark process didn't work as
expected and then I read in spark-avro library "At the moment, it ignores
docs, aliases and other properties present in the Avro file".

How do you manage aliases and column renaming? Is there any workaround?

Thanks in advance.

Regards

-- 
Gaspar Muñoz Soria

Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473