date:20190621

Re: Announcing Delta Lake 0.2.0

2019-06-21 Thread Michael Armbrust

>
> Thanks for confirmation. We are using the workaround to create a separate
> Hive external table STORED AS PARQUET with the exact location of Delta
> table. Our use case is batch-driven and we are running VACUUM with 0
> retention after every batch is completed. Do you see any potential problem
> with this workaround, other than during the time when the batch is running
> the table can provide some wrong information?
>

This is a reasonable workaround to allow other systems to read Delta
tables. Another consideration is that if you are running on S3, eventual
consistency my increase the amount of time before external readers see a
consistent view. Also note, that this prevents you from using time travel.

In the near future, I think we should also support generating manifest
files that list the data files in the most recent version of the Delta
table (see #76  for details).
This will give support for Presto, though Hive would require some
additional modifications on the Hive side (if there are any Hive
contributors / committers on this list let me know!).

In the longer term, we are talking with authors of other engines to build
native support for reading the Delta transaction log (e.g. this
announcement from Starburst
).
Please contact me if you are interested in contributing here!

Dataframe Publish to RabbitMQ

2019-06-21 Thread Spico Florin

Hello!
Can you please share some code/thoughts on how to publish data from a
dataframe to RabbbitMQ?

Thanks.
Regards,
Florin

Re: Unable to run simple spark-sql

2019-06-21 Thread Raymond Honderdors

Good to hear
It was what I thought
Hard to validate with out the actual configuration
(Did not have time to setup ambari)


On Fri, Jun 21, 2019, 15:44 Nirmal Kumar  wrote:

> Hey Raymond,
>
> This root cause of the problem was the hive database location was
> 'file:/home/hive/spark-warehouse/testdb.db/employee_orc’
>
> I checked that using desc extended testdb.employee
>
> It might be some config issue in the cluster at that time that made the
> location to point to local filesystem.
>
> I created a new database and confirmed that the location was in HDFS
> i.e.hdfs://xxx:8020/apps/hive/warehouse/
> For this the code ran fine.
>
> Thanks for the help,
> -Nirmal
>
> From: Nirmal Kumar
> Sent: 19 June 2019 11:51
> To: Raymond Honderdors 
> Cc: user 
> Subject: RE: Unable to run simple spark-sql
>
> Hi Raymond,
>
> I cross checked hive/conf/hive-site.xml and spark2/conf/hive-site.xml
> Same value is being shown by Ambari Hive config.
> Seems correct value here:
>
>   
>   hive.metastore.warehouse.dir
>   /apps/hive/warehouse
>  
>
> Problem :
> Spark trying to create a local directory under the home directory of hive
> user (/home/hive/).
> Why is it referring the local file system and from where?
>
> Thanks,
> Nirmal
>
> From: Raymond Honderdors  raymond.honderd...@sizmek.com>>
> Sent: 19 June 2019 11:18
> To: Nirmal Kumar  nirmal.ku...@impetus.co.in>>
> Cc: user mailto:user@spark.apache.org>>
> Subject: Re: Unable to run simple spark-sql
>
> Hi Nirmal,
> i came across the following article "
> https://stackoverflow.com/questions/47497003/why-is-hive-creating-tables-in-the-local-file-system
> <
> https://secure-web.cisco.com/1eJXDpPVEl4WoA0ZWL4WJdfrYSsbn4TuKCqHt_IFHMsP29j7xLbCNNBf3Mvmm39OoR8qKeyuLZrkovYLX3CFWIyaUVQ2G3sCCFB9XdWPy_cd2sZrbiLq-hrsZ6rfmMFYZgd27mWYvc49jRUsx6YpUM1JNWdfOidNCVet4LOLJO3VV9kODNw0hhJAirwm0dpxceiGNfGSV_lJIDJvrPt-NG_SiqzFt9HGrOFCJCnCYJHbTlMGKh3LDbkFAvqhDvG8kYkmAU6eMvMUAkjSVQZGjP2uZg0fL1U-AwYPbfU1FsqKyd171Ctt3cFHwGgks1IxkBU-PhKMe4lwFoOI3KuMARwQOGuH2obX4ZJsgeZlZFQw/https%3A%2F%2Fstackoverflow.com%2Fquestions%2F47497003%2Fwhy-is-hive-creating-tables-in-the-local-file-system
> >"
> (and an updated ref link :
> https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+Administration
> <
> https://secure-web.cisco.com/1lHF1a_dGhv0gGAUGVVJizv-j46GpuInCGeNUEhVAkSIeRS8079OhMBRiqwAoRNS9SXkMo_hZuQnvKuiKbSfXjmbSZwpbPTMrDdKaDOB0shFSn5B_9Xn99nORdhBXNdRB0otIq_iqx3_jNdvgWkxzmlQnLnI6-wE26x8ToJYq06GIN-NEi5K9ZvIvCGRt7xNQJaVsXmTpNNKJp0v5bJ8WfTVWt2sOpR1N8W1on7ZrJCKHl9mH8QTJNdRYWEYfF4HkMn5V8U_wGEOsTcx8RDOc7kZHisS_ZUrEwDPKA0PAk35HLJtQ-26-XF1teKiEh8oKB4U_3aMoMcC41nkdckQ7ig/https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FHive%2FAdminManual%2BMetastore%2BAdministration
> >)
> you should check "hive.metastore.warehouse.dir" in hive config files
>
>
> On Tue, Jun 18, 2019 at 8:09 PM Nirmal Kumar  > wrote:
> Just an update on the thread that it's kerberized.
>
> I'm trying to execute the query with a different user xyz not hive.
> Because seems like some permission issue the user xyz trying creating
> directory in /home/hive directory
>
> Do i need some impersonation setting?
>
> Thanks,
> Nirmal
>
> Get Outlook for Android https://secure-web.cisco.com/11z28bxN4NP4Z9g1qxRqPBXzLZShxonyI1ilwAlTV7-TyszSMWOzoSN6NKJr6jGA4169JJxYBOz8iEGs9x3uOAc9izmc36tkqKjjhkgHCJ9-BCf39p4n1xVDehS9j-LVMqvQ3E_0WFBUJS6iHhuj9iAwq_hgac83c0r_VYMtzPCsVC2dyLoiN2QaLQ4UjFMm8nv8ylOR-3ZpolBGGxEe0aKtWOm5o5iWnpTgF1uDzcAD0pDjikQCBS4FpMeXZL1T-LSQcoieAbZxNKH3_TO9PVC_CX_oedg3tlnuUaVFE3pq3DR5Ofx5YcuuGN43d3WGKK_2c8a6ZE74bdDI0IMDusQ/https%3A%2F%2Faka.ms%2Fghei36
> >>
>
> 
> From: Nirmal Kumar
> Sent: Tuesday, June 18, 2019 5:56:06 PM
> To: Raymond Honderdors; Nirmal Kumar
> Cc: user
> Subject: RE: Unable to run simple spark-sql
>
> Hi Raymond,
>
> Permission on hdfs is 777
> drwxrwxrwx   - impadmin hdfs  0 2019-06-13 16:09
> /home/hive/spark-warehouse
>
>
> But it’s pointing to a local file system:
> Exception in thread "main" java.lang.IllegalStateException: Cannot create
> staging directory
> 'file:/home/hive/spark-warehouse/testdb.db/employee_orc/.hive-staging_hive_2019-06-18_16-08-21_448_1691186175028734135-1'
>
> Thanks,
> -Nirmal
>
>
> From: Raymond Honderdors  raymond.honderd...@sizmek.com>>
> Sent: 18 June 2019 17:52
> To: Nirmal Kumar  nirmal.ku...@impetus.co.in>.invalid>
> Cc: user mailto:user@spark.apache.org>>
> Subject: Re: Unable to run simple spark-sql
>
> Hi
> Can you check the permission of the user running spark
> On the hdfs folder where it tries to create the table
>
> On Tue, Jun 18, 2019, 15:05 Nirmal Kumar  .invalid nirmal.ku...@impetus.co.in.invalid nirmal.ku...@impetus.co.in.invalid>>> wrote:
> Hi List,
>
> I tried running the following sample Java code using Spark2 version 2.0.0
> on YARN (HDP-2.5.0.0)
>
> public class SparkSQLTest {
>   public static void main(String[]

RE: Unable to run simple spark-sql

2019-06-21 Thread Nirmal Kumar

Hey Raymond,

This root cause of the problem was the hive database location was 
'file:/home/hive/spark-warehouse/testdb.db/employee_orc’

I checked that using desc extended testdb.employee

It might be some config issue in the cluster at that time that made the 
location to point to local filesystem.

I created a new database and confirmed that the location was in HDFS 
i.e.hdfs://xxx:8020/apps/hive/warehouse/
For this the code ran fine.

Thanks for the help,
-Nirmal

From: Nirmal Kumar
Sent: 19 June 2019 11:51
To: Raymond Honderdors 
Cc: user 
Subject: RE: Unable to run simple spark-sql

Hi Raymond,

I cross checked hive/conf/hive-site.xml and spark2/conf/hive-site.xml
Same value is being shown by Ambari Hive config.
Seems correct value here:

  
  hive.metastore.warehouse.dir
  /apps/hive/warehouse
 

Problem :
Spark trying to create a local directory under the home directory of hive user 
(/home/hive/).
Why is it referring the local file system and from where?

Thanks,
Nirmal

From: Raymond Honderdors 
mailto:raymond.honderd...@sizmek.com>>
Sent: 19 June 2019 11:18
To: Nirmal Kumar mailto:nirmal.ku...@impetus.co.in>>
Cc: user mailto:user@spark.apache.org>>
Subject: Re: Unable to run simple spark-sql

Hi Nirmal,
i came across the following article 
"https://stackoverflow.com/questions/47497003/why-is-hive-creating-tables-in-the-local-file-system"
(and an updated ref link : 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+Administration)
you should check "hive.metastore.warehouse.dir" in hive config files


On Tue, Jun 18, 2019 at 8:09 PM Nirmal Kumar 
mailto:nirmal.ku...@impetus.co.in>> wrote:
Just an update on the thread that it's kerberized.

I'm trying to execute the query with a different user xyz not hive.
Because seems like some permission issue the user xyz trying creating directory 
in /home/hive directory

Do i need some impersonation setting?

Thanks,
Nirmal

Get Outlook for 
Android>


From: Nirmal Kumar
Sent: Tuesday, June 18, 2019 5:56:06 PM
To: Raymond Honderdors; Nirmal Kumar
Cc: user
Subject: RE: Unable to run simple spark-sql

Hi Raymond,

Permission on hdfs is 777
drwxrwxrwx   - impadmin hdfs  0 2019-06-13 16:09 
/home/hive/spark-warehouse


But it’s pointing to a local file system:
Exception in thread "main" java.lang.IllegalStateException: Cannot create 
staging directory  
'file:/home/hive/spark-warehouse/testdb.db/employee_orc/.hive-staging_hive_2019-06-18_16-08-21_448_1691186175028734135-1'

Thanks,
-Nirmal


From: Raymond Honderdors 
mailto:raymond.honderd...@sizmek.com>>
Sent: 18 June 2019 17:52
To: Nirmal Kumar 
mailto:nirmal.ku...@impetus.co.in>.invalid>
Cc: user mailto:user@spark.apache.org>>
Subject: Re: Unable to run simple spark-sql

Hi
Can you check the permission of the user running spark
On the hdfs folder where it tries to create the table

On Tue, Jun 18, 2019, 15:05 Nirmal Kumar 
mailto:nirmal.ku...@impetus.co.in>.invalid>>
 wrote:
Hi List,

I tried running the following sample Java code using Spark2 version 2.0.0 on 
YARN (HDP-2.5.0.0)

public class SparkSQLTest {
  public static void main(String[] args) {
SparkSession sparkSession = SparkSession.builder().master("yarn")
.config("spark.sql.warehouse.dir", "/apps/hive/warehouse")
.config("hive.metastore.uris", "thrift://x:9083")
.config("spark.driver.extraJavaOptions", "-Dhdp.version=2.5.0.0-1245")

Re: Timeout between driver and application master (Thrift Server)

2019-06-21 Thread tibi.bronto

Hi Jürgen,

Did you ever find a way to resolve this issue ?

Looking at the implementation of the application master, it seems that there
is no heartbeat/keepalive mechanism for the communication between the driver
and AM, so when something closes the connection for inactivity, the AM shuts
down:
https://github.com/apache/spark/blob/branch-2.3/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L807


Jürgen Thomann wrote
> Hi,
> 
> I'm using the Spark Thrift Server and after some time the driver and 
> application master are shutting down because of timeouts. There is a
> firewall 
> in between and there is no traffic between them as it seems. Is there a
> way to 
> configure TCP keep alive for the connection or some other way to make the 
> firewall happy?
> 
> Environment:
> CentOS 7, HDP 2.6.5 with Spark 2.3.0
> 
> The Error on the driver is "ERROR YarnClientSchedulerBackend: Yarn
> application 
> has already exited with state finished" and a bit later there are some 
> Exceptions with ClosedChannelException.
> 
> The application master has the following message:
> WARN TransportChannelHandler: Exception in connection from 
> 
> java.io.IOException: Connection timed out
> ... Stacktrace omitted
> The messages are at the same time (same second, sadly no milliseconds in
> the 
> logs).
> 
> Thanks,
> Jürgen
> 
> 
> 
> -
> To unsubscribe e-mail: 

> user-unsubscribe@.apache





--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Announcing Delta Lake 0.2.0

2019-06-21 Thread ayan guha

Hi

Thanks for confirmation. We are using the workaround to create a separate
Hive external table STORED AS PARQUET with the exact location of Delta
table. Our use case is batch-driven and we are running VACUUM with 0
retention after every batch is completed. Do you see any potential problem
with this workaround, other than during the time when the batch is running
the table can provide some wrong information?

Best
Ayan

On Fri, Jun 21, 2019 at 8:03 PM Tathagata Das 
wrote:

> @ayan guha  @Gourav Sengupta
> 
> Delta Lake is OSS currently does not support defining tables in Hive
> metastore using DDL commands. We are hoping to add the necessary
> compatibility fixes in Apache Spark to make Delta Lake work with tables and
> DDL commands. So we will support them in a future release. In the meantime,
> please read/write Delta tables using paths.
>
> TD
>
> On Fri, Jun 21, 2019 at 12:49 AM Gourav Sengupta <
> gourav.sengu...@gmail.com> wrote:
>
>> Hi Ayan,
>>
>> I may be wrong about this, but I think that Delta files are in Parquet
>> format. But I am sure that you have already checked this. Am I missing
>> something?
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Fri, Jun 21, 2019 at 6:39 AM ayan guha  wrote:
>>
>>> Hi
>>> We used spark.sql to create a table using DELTA. We also have a hive
>>> metastore attached to the spark session. Hence, a table gets created in
>>> Hive metastore. We then tried to query the table from Hive. We faced
>>> following issues:
>>>
>>>1. SERDE is SequenceFile, should have been Parquet
>>>2. Scema fields are not passed.
>>>
>>> Essentially the hive DDL looks like:
>>>
>>> *CREATE TABLE `TABLE NAME`(**  `col` array COMMENT 'from
>>> deserializer')*
>>>
>>> *ROW FORMAT SERDE **
>>> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' **WITH
>>> SERDEPROPERTIES ( **  'path'=WASB PATH**')  **STORED AS INPUTFORMAT *
>>> *  'org.apache.hadoop.mapred.SequenceFileInputFormat'*
>>>
>>> *OUTPUTFORMAT **
>>> 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'  *
>>> *LOCATION **  '* *WASB PATH'*
>>>
>>> *TBLPROPERTIES ( **  'spark.sql.create.version'='2.4.0',**
>>> 'spark.sql.sources.provider'='DELTA',**
>>> 'spark.sql.sources.schema.numParts'='1',*
>>> *  
>>> 'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[]}',**
>>> 'transient_lastDdlTime'='1556544657')*
>>>
>>> Is this expected? And will the use case be supported in future releases?
>>>
>>>
>>> We are now experimenting
>>>
>>> Best
>>>
>>> Ayan
>>>
>>> On Fri, Jun 21, 2019 at 11:06 AM Liwen Sun 
>>> wrote:
>>>
 Hi James,

 Right now we don't have plans for having a catalog component as part of
 Delta Lake, but we are looking to support Hive metastore and also DDL
 commands in the near future.

 Thanks,
 Liwen

 On Thu, Jun 20, 2019 at 4:46 AM James Cotrotsios <
 jamescotrots...@gmail.com> wrote:

> Is there a plan to have a business catalog component for the Data
> Lake? If not how would someone make a proposal to create an open source
> project related to that. I would be interested in building out an open
> source data catalog that would use the Hive metadata store as a baseline
> for technical metadata.
>
>
> On Wed, Jun 19, 2019 at 3:04 PM Liwen Sun 
> wrote:
>
>> We are delighted to announce the availability of Delta Lake 0.2.0!
>>
>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
>> https://docs.delta.io/0.2.0/quick-start.html
>>
>> To view the release notes:
>> https://github.com/delta-io/delta/releases/tag/v0.2.0
>>
>> This release introduces two main features:
>>
>> *Cloud storage support*
>> In addition to HDFS, you can now configure Delta Lake to read and
>> write data on cloud storage services such as Amazon S3 and Azure Blob
>> Storage. For configuration instructions, please see:
>> https://docs.delta.io/0.2.0/delta-storage.html
>>
>> *Improved concurrency*
>> Delta Lake now allows concurrent append-only writes while still
>> ensuring serializability. For concurrency control in Delta Lake, please
>> see: https://docs.delta.io/0.2.0/delta-concurrency.html
>>
>> We have also greatly expanded the test coverage as part of this
>> release.
>>
>> We would like to acknowledge all community members for contributing
>> to this release.
>>
>> Best regards,
>> Liwen Sun
>>
>>
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>

-- 
Best Regards,
Ayan Guha

Re: Announcing Delta Lake 0.2.0

2019-06-21 Thread Tathagata Das

@ayan guha  @Gourav Sengupta

Delta Lake is OSS currently does not support defining tables in Hive
metastore using DDL commands. We are hoping to add the necessary
compatibility fixes in Apache Spark to make Delta Lake work with tables and
DDL commands. So we will support them in a future release. In the meantime,
please read/write Delta tables using paths.

TD

On Fri, Jun 21, 2019 at 12:49 AM Gourav Sengupta 
wrote:

> Hi Ayan,
>
> I may be wrong about this, but I think that Delta files are in Parquet
> format. But I am sure that you have already checked this. Am I missing
> something?
>
> Regards,
> Gourav Sengupta
>
> On Fri, Jun 21, 2019 at 6:39 AM ayan guha  wrote:
>
>> Hi
>> We used spark.sql to create a table using DELTA. We also have a hive
>> metastore attached to the spark session. Hence, a table gets created in
>> Hive metastore. We then tried to query the table from Hive. We faced
>> following issues:
>>
>>1. SERDE is SequenceFile, should have been Parquet
>>2. Scema fields are not passed.
>>
>> Essentially the hive DDL looks like:
>>
>> *CREATE TABLE `TABLE NAME`(**  `col` array COMMENT 'from
>> deserializer')*
>>
>> *ROW FORMAT SERDE **
>> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' **WITH
>> SERDEPROPERTIES ( **  'path'=WASB PATH**')  **STORED AS INPUTFORMAT *
>> *  'org.apache.hadoop.mapred.SequenceFileInputFormat'*
>>
>> *OUTPUTFORMAT **
>> 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'  **LOCATION **
>> '* *WASB PATH'*
>>
>> *TBLPROPERTIES ( **  'spark.sql.create.version'='2.4.0',**
>> 'spark.sql.sources.provider'='DELTA',**
>> 'spark.sql.sources.schema.numParts'='1',*
>> *  'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[]}',**
>> 'transient_lastDdlTime'='1556544657')*
>>
>> Is this expected? And will the use case be supported in future releases?
>>
>>
>> We are now experimenting
>>
>> Best
>>
>> Ayan
>>
>> On Fri, Jun 21, 2019 at 11:06 AM Liwen Sun 
>> wrote:
>>
>>> Hi James,
>>>
>>> Right now we don't have plans for having a catalog component as part of
>>> Delta Lake, but we are looking to support Hive metastore and also DDL
>>> commands in the near future.
>>>
>>> Thanks,
>>> Liwen
>>>
>>> On Thu, Jun 20, 2019 at 4:46 AM James Cotrotsios <
>>> jamescotrots...@gmail.com> wrote:
>>>
 Is there a plan to have a business catalog component for the Data Lake?
 If not how would someone make a proposal to create an open source project
 related to that. I would be interested in building out an open source data
 catalog that would use the Hive metadata store as a baseline for technical
 metadata.


 On Wed, Jun 19, 2019 at 3:04 PM Liwen Sun 
 wrote:

> We are delighted to announce the availability of Delta Lake 0.2.0!
>
> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
> https://docs.delta.io/0.2.0/quick-start.html
>
> To view the release notes:
> https://github.com/delta-io/delta/releases/tag/v0.2.0
>
> This release introduces two main features:
>
> *Cloud storage support*
> In addition to HDFS, you can now configure Delta Lake to read and
> write data on cloud storage services such as Amazon S3 and Azure Blob
> Storage. For configuration instructions, please see:
> https://docs.delta.io/0.2.0/delta-storage.html
>
> *Improved concurrency*
> Delta Lake now allows concurrent append-only writes while still
> ensuring serializability. For concurrency control in Delta Lake, please
> see: https://docs.delta.io/0.2.0/delta-concurrency.html
>
> We have also greatly expanded the test coverage as part of this
> release.
>
> We would like to acknowledge all community members for contributing to
> this release.
>
> Best regards,
> Liwen Sun
>
>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>

Re: Announcing Delta Lake 0.2.0

2019-06-21 Thread Gourav Sengupta

Hi Ayan,

I may be wrong about this, but I think that Delta files are in Parquet
format. But I am sure that you have already checked this. Am I missing
something?

Regards,
Gourav Sengupta

On Fri, Jun 21, 2019 at 6:39 AM ayan guha  wrote:

> Hi
> We used spark.sql to create a table using DELTA. We also have a hive
> metastore attached to the spark session. Hence, a table gets created in
> Hive metastore. We then tried to query the table from Hive. We faced
> following issues:
>
>1. SERDE is SequenceFile, should have been Parquet
>2. Scema fields are not passed.
>
> Essentially the hive DDL looks like:
>
> *CREATE TABLE `TABLE NAME`(**  `col` array COMMENT 'from
> deserializer')*
>
> *ROW FORMAT SERDE **
> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' **WITH
> SERDEPROPERTIES ( **  'path'=WASB PATH**')  **STORED AS INPUTFORMAT *
> *  'org.apache.hadoop.mapred.SequenceFileInputFormat'*
>
> *OUTPUTFORMAT **
> 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'  **LOCATION **
> '* *WASB PATH'*
>
> *TBLPROPERTIES ( **  'spark.sql.create.version'='2.4.0',**
> 'spark.sql.sources.provider'='DELTA',**
> 'spark.sql.sources.schema.numParts'='1',*
> *  'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[]}',**
> 'transient_lastDdlTime'='1556544657')*
>
> Is this expected? And will the use case be supported in future releases?
>
>
> We are now experimenting
>
> Best
>
> Ayan
>
> On Fri, Jun 21, 2019 at 11:06 AM Liwen Sun 
> wrote:
>
>> Hi James,
>>
>> Right now we don't have plans for having a catalog component as part of
>> Delta Lake, but we are looking to support Hive metastore and also DDL
>> commands in the near future.
>>
>> Thanks,
>> Liwen
>>
>> On Thu, Jun 20, 2019 at 4:46 AM James Cotrotsios <
>> jamescotrots...@gmail.com> wrote:
>>
>>> Is there a plan to have a business catalog component for the Data Lake?
>>> If not how would someone make a proposal to create an open source project
>>> related to that. I would be interested in building out an open source data
>>> catalog that would use the Hive metadata store as a baseline for technical
>>> metadata.
>>>
>>>
>>> On Wed, Jun 19, 2019 at 3:04 PM Liwen Sun 
>>> wrote:
>>>
 We are delighted to announce the availability of Delta Lake 0.2.0!

 To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
 https://docs.delta.io/0.2.0/quick-start.html

 To view the release notes:
 https://github.com/delta-io/delta/releases/tag/v0.2.0

 This release introduces two main features:

 *Cloud storage support*
 In addition to HDFS, you can now configure Delta Lake to read and write
 data on cloud storage services such as Amazon S3 and Azure Blob Storage.
 For configuration instructions, please see:
 https://docs.delta.io/0.2.0/delta-storage.html

 *Improved concurrency*
 Delta Lake now allows concurrent append-only writes while still
 ensuring serializability. For concurrency control in Delta Lake, please
 see: https://docs.delta.io/0.2.0/delta-concurrency.html

 We have also greatly expanded the test coverage as part of this release.

 We would like to acknowledge all community members for contributing to
 this release.

 Best regards,
 Liwen Sun


>
> --
> Best Regards,
> Ayan Guha
>

Re: Announcing Delta Lake 0.2.0

Dataframe Publish to RabbitMQ

Re: Unable to run simple spark-sql

RE: Unable to run simple spark-sql

Re: Timeout between driver and application master (Thrift Server)

Re: Announcing Delta Lake 0.2.0

Re: Announcing Delta Lake 0.2.0

Re: Announcing Delta Lake 0.2.0

8 matches

Site Navigation

Mail list logo

Footer information