subject:"Re\: New Spark Datasource for Hive ACID tables"

Re: New Spark Datasource for Hive ACID tables

2019-07-27 Thread Abhishek Somani

I realised that the build instructions in the README.md were not very clear
due to some recent changes. I have updated those now.

Thanks,
Abhishek Somani

On Sun, Jul 28, 2019 at 7:53 AM naresh Goud 
wrote:

> Thanks Abhishek.
> I will check it out.
>
> Thank you,
> Naresh
>
> On Sat, Jul 27, 2019 at 9:21 PM Abhishek Somani <
> abhisheksoman...@gmail.com> wrote:
>
>> Hey Naresh,
>>
>> There is a `shaded-dependecies` project inside the root directory. You
>> need to go into that and build and publish that to local first.
>>
>> cd shaded-dependencies
>>> sbt clean publishLocal
>>>
>>
>> After that, come back out to the root directory and build that project.
>> The spark-acid-shaded-dependencies jar will now be found:
>>
>>> cd ..
>>> sbt assembly
>>
>>
>> This will create the jar which you can use.
>>
>> On another note, unless you are making changes in the code, you don't
>> need to build yourself as the jar is published in
>> https://spark-packages.org/package/qubole/spark-acid. So you can just
>> use it as:
>>
>> spark-shell --packages qubole:spark-acid:0.4.0-s_2.11
>>
>>
>> ...and it will be automatically fetched and used.
>>
>> Thanks,
>> Abhishek
>>
>>
>> On Sun, Jul 28, 2019 at 4:42 AM naresh Goud 
>> wrote:
>>
>>> It looks there is some internal dependency missing.
>>>
>>> libraryDependencies ++= Seq(
>>> "com.qubole" %% "spark-acid-shaded-dependencies" % "0.1"
>>> )
>>>
>>> How do we get it?
>>>
>>>
>>> Thank you,
>>> Naresh
>>>
>>>
>>>
>>>
>>> Thanks,
>>> Naresh
>>> www.linkedin.com/in/naresh-dulam
>>> http://hadoopandspark.blogspot.com/
>>>
>>>
>>>
>>> On Sat, Jul 27, 2019 at 5:34 PM naresh Goud 
>>> wrote:
>>>
 Hi Abhishek,


 We are not able to build jar using git hub code with below error?

 Any others able to build jars? Is there anything else missing?



 Note: Unresolved dependencies path:
 [warn]  com.qubole:spark-acid-shaded-dependencies_2.11:0.1
 (C:\Data\Hadoop\spark-acid-master\build.sbt#L51-54)
 [warn]+- com.qubole:spark-acid_2.11:0.4.0
 sbt.ResolveException: unresolved dependency:
 com.qubole#spark-acid-shaded-dependencies_2.11;0.1: not found
 at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:313)
 at
 sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:191)
 at
 sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:168)
 at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
 at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
 at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:133)
 at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:57)
 at sbt.IvySbt$$anon$4.call(Ivy.scala:65)
 at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93)
 at
 xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78)
 at
 xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97)
 at xsbt.boot.Using$.withResource(Using.scala:10)
 at xsbt.boot.Using$.apply(Using.scala:9)
 at
 xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58)
 at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48)
 at xsbt.boot.Locks$.apply0(Locks.scala:31)
 at xsbt.boot.Locks$.apply(Locks.scala:28)
 at sbt.IvySbt.withDefaultLogger(Ivy.scala:65)
 at sbt.IvySbt.withIvy(Ivy.scala:128)
 at sbt.IvySbt.withIvy(Ivy.scala:125)
 at sbt.IvySbt$Module.withModule(Ivy.scala:156)
 at sbt.IvyActions$.updateEither(IvyActions.scala:168)
 at
 sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1541)














 Thanks,
 Naresh
 www.linkedin.com/in/naresh-dulam
 http://hadoopandspark.blogspot.com/



 On Sat, Jul 27, 2019 at 3:25 PM Nicolas Paris 
 wrote:

> Congrats
>
> The read/write feature with hive3 is highly interesting
>
> On Fri, Jul 26, 2019 at 06:07:55PM +0530, Abhishek Somani wrote:
> > Hi All,
> >
> > We at Qubole have open sourced a datasource that will enable users
> to work on
> > their Hive ACID Transactional Tables using Spark.
> >
> > Github: https://github.com/qubole/spark-acid
> >
> > Hive ACID tables allow users to work on their data transactionally,
> and also
> > gives them the ability to Delete, Update and Merge data efficiently
> without
> > having to rewrite all of their data in a table, partition or file.
> We believe
> > that being able to work on these tables from Spark is a much desired
> value add,
> > as is also apparent in
> https://issues.apache.org/jira/browse/SPARK-15348 and
> > https://issues.apache.org/jira/browse/SPARK-16996

Re: New Spark Datasource for Hive ACID tables

2019-07-27 Thread naresh Goud

Thanks Abhishek.
I will check it out.

Thank you,
Naresh

On Sat, Jul 27, 2019 at 9:21 PM Abhishek Somani 
wrote:

> Hey Naresh,
>
> There is a `shaded-dependecies` project inside the root directory. You
> need to go into that and build and publish that to local first.
>
> cd shaded-dependencies
>> sbt clean publishLocal
>>
>
> After that, come back out to the root directory and build that project.
> The spark-acid-shaded-dependencies jar will now be found:
>
>> cd ..
>> sbt assembly
>
>
> This will create the jar which you can use.
>
> On another note, unless you are making changes in the code, you don't need
> to build yourself as the jar is published in
> https://spark-packages.org/package/qubole/spark-acid. So you can just use
> it as:
>
> spark-shell --packages qubole:spark-acid:0.4.0-s_2.11
>
>
> ...and it will be automatically fetched and used.
>
> Thanks,
> Abhishek
>
>
> On Sun, Jul 28, 2019 at 4:42 AM naresh Goud 
> wrote:
>
>> It looks there is some internal dependency missing.
>>
>> libraryDependencies ++= Seq(
>> "com.qubole" %% "spark-acid-shaded-dependencies" % "0.1"
>> )
>>
>> How do we get it?
>>
>>
>> Thank you,
>> Naresh
>>
>>
>>
>>
>> Thanks,
>> Naresh
>> www.linkedin.com/in/naresh-dulam
>> http://hadoopandspark.blogspot.com/
>>
>>
>>
>> On Sat, Jul 27, 2019 at 5:34 PM naresh Goud 
>> wrote:
>>
>>> Hi Abhishek,
>>>
>>>
>>> We are not able to build jar using git hub code with below error?
>>>
>>> Any others able to build jars? Is there anything else missing?
>>>
>>>
>>>
>>> Note: Unresolved dependencies path:
>>> [warn]  com.qubole:spark-acid-shaded-dependencies_2.11:0.1
>>> (C:\Data\Hadoop\spark-acid-master\build.sbt#L51-54)
>>> [warn]+- com.qubole:spark-acid_2.11:0.4.0
>>> sbt.ResolveException: unresolved dependency:
>>> com.qubole#spark-acid-shaded-dependencies_2.11;0.1: not found
>>> at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:313)
>>> at
>>> sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:191)
>>> at
>>> sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:168)
>>> at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>>> at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>>> at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:133)
>>> at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:57)
>>> at sbt.IvySbt$$anon$4.call(Ivy.scala:65)
>>> at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93)
>>> at
>>> xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78)
>>> at
>>> xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97)
>>> at xsbt.boot.Using$.withResource(Using.scala:10)
>>> at xsbt.boot.Using$.apply(Using.scala:9)
>>> at
>>> xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58)
>>> at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48)
>>> at xsbt.boot.Locks$.apply0(Locks.scala:31)
>>> at xsbt.boot.Locks$.apply(Locks.scala:28)
>>> at sbt.IvySbt.withDefaultLogger(Ivy.scala:65)
>>> at sbt.IvySbt.withIvy(Ivy.scala:128)
>>> at sbt.IvySbt.withIvy(Ivy.scala:125)
>>> at sbt.IvySbt$Module.withModule(Ivy.scala:156)
>>> at sbt.IvyActions$.updateEither(IvyActions.scala:168)
>>> at
>>> sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1541)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Thanks,
>>> Naresh
>>> www.linkedin.com/in/naresh-dulam
>>> http://hadoopandspark.blogspot.com/
>>>
>>>
>>>
>>> On Sat, Jul 27, 2019 at 3:25 PM Nicolas Paris 
>>> wrote:
>>>
 Congrats

 The read/write feature with hive3 is highly interesting

 On Fri, Jul 26, 2019 at 06:07:55PM +0530, Abhishek Somani wrote:
 > Hi All,
 >
 > We at Qubole have open sourced a datasource that will enable users to
 work on
 > their Hive ACID Transactional Tables using Spark.
 >
 > Github: https://github.com/qubole/spark-acid
 >
 > Hive ACID tables allow users to work on their data transactionally,
 and also
 > gives them the ability to Delete, Update and Merge data efficiently
 without
 > having to rewrite all of their data in a table, partition or file. We
 believe
 > that being able to work on these tables from Spark is a much desired
 value add,
 > as is also apparent in
 https://issues.apache.org/jira/browse/SPARK-15348 and
 > https://issues.apache.org/jira/browse/SPARK-16996 with multiple
 people looking
 > for it. Currently the datasource supports reading from these ACID
 tables only,
 > and we are working on adding the ability to write into these tables
 via Spark
 > as well.
 >
 > The datasource is also available as a spark package, and instructions
 on how to
 > use it are available on the Github page.
 >
 > We welcome your

Re: New Spark Datasource for Hive ACID tables

2019-07-27 Thread Abhishek Somani

Hey Naresh,

There is a `shaded-dependecies` project inside the root directory. You need
to go into that and build and publish that to local first.

cd shaded-dependencies
> sbt clean publishLocal
>

After that, come back out to the root directory and build that project. The
spark-acid-shaded-dependencies jar will now be found:

> cd ..
> sbt assembly


This will create the jar which you can use.

On another note, unless you are making changes in the code, you don't need
to build yourself as the jar is published in
https://spark-packages.org/package/qubole/spark-acid. So you can just use
it as:

spark-shell --packages qubole:spark-acid:0.4.0-s_2.11


...and it will be automatically fetched and used.

Thanks,
Abhishek


On Sun, Jul 28, 2019 at 4:42 AM naresh Goud 
wrote:

> It looks there is some internal dependency missing.
>
> libraryDependencies ++= Seq(
> "com.qubole" %% "spark-acid-shaded-dependencies" % "0.1"
> )
>
> How do we get it?
>
>
> Thank you,
> Naresh
>
>
>
>
> Thanks,
> Naresh
> www.linkedin.com/in/naresh-dulam
> http://hadoopandspark.blogspot.com/
>
>
>
> On Sat, Jul 27, 2019 at 5:34 PM naresh Goud 
> wrote:
>
>> Hi Abhishek,
>>
>>
>> We are not able to build jar using git hub code with below error?
>>
>> Any others able to build jars? Is there anything else missing?
>>
>>
>>
>> Note: Unresolved dependencies path:
>> [warn]  com.qubole:spark-acid-shaded-dependencies_2.11:0.1
>> (C:\Data\Hadoop\spark-acid-master\build.sbt#L51-54)
>> [warn]+- com.qubole:spark-acid_2.11:0.4.0
>> sbt.ResolveException: unresolved dependency:
>> com.qubole#spark-acid-shaded-dependencies_2.11;0.1: not found
>> at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:313)
>> at
>> sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:191)
>> at
>> sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:168)
>> at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>> at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>> at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:133)
>> at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:57)
>> at sbt.IvySbt$$anon$4.call(Ivy.scala:65)
>> at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93)
>> at
>> xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78)
>> at
>> xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97)
>> at xsbt.boot.Using$.withResource(Using.scala:10)
>> at xsbt.boot.Using$.apply(Using.scala:9)
>> at
>> xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58)
>> at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48)
>> at xsbt.boot.Locks$.apply0(Locks.scala:31)
>> at xsbt.boot.Locks$.apply(Locks.scala:28)
>> at sbt.IvySbt.withDefaultLogger(Ivy.scala:65)
>> at sbt.IvySbt.withIvy(Ivy.scala:128)
>> at sbt.IvySbt.withIvy(Ivy.scala:125)
>> at sbt.IvySbt$Module.withModule(Ivy.scala:156)
>> at sbt.IvyActions$.updateEither(IvyActions.scala:168)
>> at
>> sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1541)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks,
>> Naresh
>> www.linkedin.com/in/naresh-dulam
>> http://hadoopandspark.blogspot.com/
>>
>>
>>
>> On Sat, Jul 27, 2019 at 3:25 PM Nicolas Paris 
>> wrote:
>>
>>> Congrats
>>>
>>> The read/write feature with hive3 is highly interesting
>>>
>>> On Fri, Jul 26, 2019 at 06:07:55PM +0530, Abhishek Somani wrote:
>>> > Hi All,
>>> >
>>> > We at Qubole have open sourced a datasource that will enable users to
>>> work on
>>> > their Hive ACID Transactional Tables using Spark.
>>> >
>>> > Github: https://github.com/qubole/spark-acid
>>> >
>>> > Hive ACID tables allow users to work on their data transactionally,
>>> and also
>>> > gives them the ability to Delete, Update and Merge data efficiently
>>> without
>>> > having to rewrite all of their data in a table, partition or file. We
>>> believe
>>> > that being able to work on these tables from Spark is a much desired
>>> value add,
>>> > as is also apparent in
>>> https://issues.apache.org/jira/browse/SPARK-15348 and
>>> > https://issues.apache.org/jira/browse/SPARK-16996 with multiple
>>> people looking
>>> > for it. Currently the datasource supports reading from these ACID
>>> tables only,
>>> > and we are working on adding the ability to write into these tables
>>> via Spark
>>> > as well.
>>> >
>>> > The datasource is also available as a spark package, and instructions
>>> on how to
>>> > use it are available on the Github page.
>>> >
>>> > We welcome your feedback and suggestions.
>>> >
>>> > Thanks,
>>> > Abhishek Somani
>>>
>>> --
>>> nicolas
>>>
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>

Re: New Spark Datasource for Hive ACID tables

2019-07-27 Thread naresh Goud

It looks there is some internal dependency missing.

libraryDependencies ++= Seq(
"com.qubole" %% "spark-acid-shaded-dependencies" % "0.1"
)

How do we get it?


Thank you,
Naresh




Thanks,
Naresh
www.linkedin.com/in/naresh-dulam
http://hadoopandspark.blogspot.com/



On Sat, Jul 27, 2019 at 5:34 PM naresh Goud 
wrote:

> Hi Abhishek,
>
>
> We are not able to build jar using git hub code with below error?
>
> Any others able to build jars? Is there anything else missing?
>
>
>
> Note: Unresolved dependencies path:
> [warn]  com.qubole:spark-acid-shaded-dependencies_2.11:0.1
> (C:\Data\Hadoop\spark-acid-master\build.sbt#L51-54)
> [warn]+- com.qubole:spark-acid_2.11:0.4.0
> sbt.ResolveException: unresolved dependency:
> com.qubole#spark-acid-shaded-dependencies_2.11;0.1: not found
> at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:313)
> at
> sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:191)
> at
> sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:168)
> at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
> at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
> at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:133)
> at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:57)
> at sbt.IvySbt$$anon$4.call(Ivy.scala:65)
> at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93)
> at
> xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78)
> at
> xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97)
> at xsbt.boot.Using$.withResource(Using.scala:10)
> at xsbt.boot.Using$.apply(Using.scala:9)
> at
> xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58)
> at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48)
> at xsbt.boot.Locks$.apply0(Locks.scala:31)
> at xsbt.boot.Locks$.apply(Locks.scala:28)
> at sbt.IvySbt.withDefaultLogger(Ivy.scala:65)
> at sbt.IvySbt.withIvy(Ivy.scala:128)
> at sbt.IvySbt.withIvy(Ivy.scala:125)
> at sbt.IvySbt$Module.withModule(Ivy.scala:156)
> at sbt.IvyActions$.updateEither(IvyActions.scala:168)
> at
> sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1541)
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Thanks,
> Naresh
> www.linkedin.com/in/naresh-dulam
> http://hadoopandspark.blogspot.com/
>
>
>
> On Sat, Jul 27, 2019 at 3:25 PM Nicolas Paris 
> wrote:
>
>> Congrats
>>
>> The read/write feature with hive3 is highly interesting
>>
>> On Fri, Jul 26, 2019 at 06:07:55PM +0530, Abhishek Somani wrote:
>> > Hi All,
>> >
>> > We at Qubole have open sourced a datasource that will enable users to
>> work on
>> > their Hive ACID Transactional Tables using Spark.
>> >
>> > Github: https://github.com/qubole/spark-acid
>> >
>> > Hive ACID tables allow users to work on their data transactionally, and
>> also
>> > gives them the ability to Delete, Update and Merge data efficiently
>> without
>> > having to rewrite all of their data in a table, partition or file. We
>> believe
>> > that being able to work on these tables from Spark is a much desired
>> value add,
>> > as is also apparent in
>> https://issues.apache.org/jira/browse/SPARK-15348 and
>> > https://issues.apache.org/jira/browse/SPARK-16996 with multiple people
>> looking
>> > for it. Currently the datasource supports reading from these ACID
>> tables only,
>> > and we are working on adding the ability to write into these tables via
>> Spark
>> > as well.
>> >
>> > The datasource is also available as a spark package, and instructions
>> on how to
>> > use it are available on the Github page.
>> >
>> > We welcome your feedback and suggestions.
>> >
>> > Thanks,
>> > Abhishek Somani
>>
>> --
>> nicolas
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>

Re: New Spark Datasource for Hive ACID tables

2019-07-27 Thread naresh Goud

Hi Abhishek,


We are not able to build jar using git hub code with below error?

Any others able to build jars? Is there anything else missing?



Note: Unresolved dependencies path:
[warn]  com.qubole:spark-acid-shaded-dependencies_2.11:0.1
(C:\Data\Hadoop\spark-acid-master\build.sbt#L51-54)
[warn]+- com.qubole:spark-acid_2.11:0.4.0
sbt.ResolveException: unresolved dependency:
com.qubole#spark-acid-shaded-dependencies_2.11;0.1: not found
at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:313)
at
sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:191)
at
sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:168)
at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:133)
at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:57)
at sbt.IvySbt$$anon$4.call(Ivy.scala:65)
at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93)
at
xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78)
at
xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97)
at xsbt.boot.Using$.withResource(Using.scala:10)
at xsbt.boot.Using$.apply(Using.scala:9)
at
xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58)
at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48)
at xsbt.boot.Locks$.apply0(Locks.scala:31)
at xsbt.boot.Locks$.apply(Locks.scala:28)
at sbt.IvySbt.withDefaultLogger(Ivy.scala:65)
at sbt.IvySbt.withIvy(Ivy.scala:128)
at sbt.IvySbt.withIvy(Ivy.scala:125)
at sbt.IvySbt$Module.withModule(Ivy.scala:156)
at sbt.IvyActions$.updateEither(IvyActions.scala:168)
at
sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1541)














Thanks,
Naresh
www.linkedin.com/in/naresh-dulam
http://hadoopandspark.blogspot.com/



On Sat, Jul 27, 2019 at 3:25 PM Nicolas Paris 
wrote:

> Congrats
>
> The read/write feature with hive3 is highly interesting
>
> On Fri, Jul 26, 2019 at 06:07:55PM +0530, Abhishek Somani wrote:
> > Hi All,
> >
> > We at Qubole have open sourced a datasource that will enable users to
> work on
> > their Hive ACID Transactional Tables using Spark.
> >
> > Github: https://github.com/qubole/spark-acid
> >
> > Hive ACID tables allow users to work on their data transactionally, and
> also
> > gives them the ability to Delete, Update and Merge data efficiently
> without
> > having to rewrite all of their data in a table, partition or file. We
> believe
> > that being able to work on these tables from Spark is a much desired
> value add,
> > as is also apparent in https://issues.apache.org/jira/browse/SPARK-15348
>  and
> > https://issues.apache.org/jira/browse/SPARK-16996 with multiple people
> looking
> > for it. Currently the datasource supports reading from these ACID tables
> only,
> > and we are working on adding the ability to write into these tables via
> Spark
> > as well.
> >
> > The datasource is also available as a spark package, and instructions on
> how to
> > use it are available on the Github page.
> >
> > We welcome your feedback and suggestions.
> >
> > Thanks,
> > Abhishek Somani
>
> --
> nicolas
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: New Spark Datasource for Hive ACID tables

2019-07-27 Thread Nicolas Paris

Congrats

The read/write feature with hive3 is highly interesting

On Fri, Jul 26, 2019 at 06:07:55PM +0530, Abhishek Somani wrote:
> Hi All,
> 
> We at Qubole have open sourced a datasource that will enable users to work on
> their Hive ACID Transactional Tables using Spark. 
> 
> Github: https://github.com/qubole/spark-acid
> 
> Hive ACID tables allow users to work on their data transactionally, and also
> gives them the ability to Delete, Update and Merge data efficiently without
> having to rewrite all of their data in a table, partition or file. We believe
> that being able to work on these tables from Spark is a much desired value 
> add,
> as is also apparent in https://issues.apache.org/jira/browse/SPARK-15348 and 
> https://issues.apache.org/jira/browse/SPARK-16996 with multiple people looking
> for it. Currently the datasource supports reading from these ACID tables only,
> and we are working on adding the ability to write into these tables via Spark
> as well.
> 
> The datasource is also available as a spark package, and instructions on how 
> to
> use it are available on the Github page.
> 
> We welcome your feedback and suggestions.
> 
> Thanks,
> Abhishek Somani 

-- 
nicolas

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: New Spark Datasource for Hive ACID tables

2019-07-26 Thread Abhishek Somani

Hey Naresh,

Thanks for your question. Yes it will work!

Thanks,
Abhishek Somani

On Fri, Jul 26, 2019 at 7:08 PM naresh Goud 
wrote:

> Thanks Abhishek.
>
> Will it work on hive acid table which is not compacted ? i.e table having
> base and delta files?
>
> Let’s say hive acid table customer
>
> Create table customer(customer_id int, customer_name string,
> customer_email string) cluster by customer_id buckets 10 location
> ‘/test/customer’ tableproperties(transactional=true)
>
>
> And table hdfs path having below directories
>
> /test/customer/base_15234/
> /test/customer/delta_1234_456
>
>
> That means table having updates and major compaction not run.
>
> Will it spark reader works ?
>
>
> Thank you,
> Naresh
>
>
>
>
>
>
>
> On Fri, Jul 26, 2019 at 7:38 AM Abhishek Somani <
> abhisheksoman...@gmail.com> wrote:
>
>> Hi All,
>>
>> We at Qubole  have open sourced a datasource
>> that will enable users to work on their Hive ACID Transactional Tables
>> 
>> using Spark.
>>
>> Github: https://github.com/qubole/spark-acid
>>
>> Hive ACID tables allow users to work on their data transactionally, and
>> also gives them the ability to Delete, Update and Merge data efficiently
>> without having to rewrite all of their data in a table, partition or file.
>> We believe that being able to work on these tables from Spark is a much
>> desired value add, as is also apparent in
>> https://issues.apache.org/jira/browse/SPARK-15348 and
>> https://issues.apache.org/jira/browse/SPARK-16996 with multiple people
>> looking for it. Currently the datasource supports reading from these ACID
>> tables only, and we are working on adding the ability to write into these
>> tables via Spark as well.
>>
>> The datasource is also available as a spark package, and instructions on
>> how to use it are available on the Github page
>> .
>>
>> We welcome your feedback and suggestions.
>>
>> Thanks,
>> Abhishek Somani
>>
> --
> Thanks,
> Naresh
> www.linkedin.com/in/naresh-dulam
> http://hadoopandspark.blogspot.com/
>
>

Re: New Spark Datasource for Hive ACID tables

2019-07-26 Thread naresh Goud

Thanks Abhishek.

Will it work on hive acid table which is not compacted ? i.e table having
base and delta files?

Let’s say hive acid table customer

Create table customer(customer_id int, customer_name string, customer_email
string) cluster by customer_id buckets 10 location ‘/test/customer’
tableproperties(transactional=true)


And table hdfs path having below directories

/test/customer/base_15234/
/test/customer/delta_1234_456


That means table having updates and major compaction not run.

Will it spark reader works ?


Thank you,
Naresh







On Fri, Jul 26, 2019 at 7:38 AM Abhishek Somani 
wrote:

> Hi All,
>
> We at Qubole  have open sourced a datasource
> that will enable users to work on their Hive ACID Transactional Tables
> 
> using Spark.
>
> Github: https://github.com/qubole/spark-acid
>
> Hive ACID tables allow users to work on their data transactionally, and
> also gives them the ability to Delete, Update and Merge data efficiently
> without having to rewrite all of their data in a table, partition or file.
> We believe that being able to work on these tables from Spark is a much
> desired value add, as is also apparent in
> https://issues.apache.org/jira/browse/SPARK-15348 and
> https://issues.apache.org/jira/browse/SPARK-16996 with multiple people
> looking for it. Currently the datasource supports reading from these ACID
> tables only, and we are working on adding the ability to write into these
> tables via Spark as well.
>
> The datasource is also available as a spark package, and instructions on
> how to use it are available on the Github page
> .
>
> We welcome your feedback and suggestions.
>
> Thanks,
> Abhishek Somani
>
-- 
Thanks,
Naresh
www.linkedin.com/in/naresh-dulam
http://hadoopandspark.blogspot.com/

Re: New Spark Datasource for Hive ACID tables

Re: New Spark Datasource for Hive ACID tables

Re: New Spark Datasource for Hive ACID tables

Re: New Spark Datasource for Hive ACID tables

Re: New Spark Datasource for Hive ACID tables

Re: New Spark Datasource for Hive ACID tables

Re: New Spark Datasource for Hive ACID tables

Re: New Spark Datasource for Hive ACID tables

8 matches

Site Navigation

Mail list logo

Footer information