Hive on Spark Job Monitoring

2017-03-16 Thread Ninad Shringarpure
Hi Team,

I wanted to understand how Hive on Spark actually maps to Spark jobs
underneath triggered by Hive.

AFAIK each Hive query would trigger a new Spark job. But this was
contradicted by someone and wanted to confirm what is the real design
implementation.
Please let me know if there is reference/design doc which explains this or
if someone knows about this can answer here.

Thanks,
Ninad


Making withColumn nullable

2017-01-27 Thread Ninad Shringarpure
HI Team,

When I add a column to my data frame using withColumn and assign some
value, it automatically creates the schema with this column to be not
nullable.
My final Hive table schema where I want to insert it has this column to be
nullable and hence throws an error when I try to save.

Is there a way of making the column I add with withColumn method to be set
to nullable?

Thanks,
Ninad


Creating UUID using SparksSQL

2017-01-18 Thread Ninad Shringarpure
Hi Team,

Is there a standard way of generating a unique id for each row in from
Spark SQL. I am looking for functionality similar to UUID generation in
hive.

Let me know if you need any additional information.

Thanks,
Ninad


[DataFrames] map function - 2.0

2016-12-15 Thread Ninad Shringarpure
Hi Team,

When going through Dataset class for Spark 2.0 it comes across that both
overloaded map functions with encoder and without are marked as
experimental.

Is there a reason and issues that developers whould be aware of when using
this for production applications. Also is there a "non-experimental" way of
using map function on Dataframe in Spark 2.0

Thanks,
Ninad


Re: [Spark-SQL] collect_list() support for nested collection

2016-12-13 Thread Ninad Shringarpure
Exactly what I was looking for. Thank you so much!!


On Tue, Dec 13, 2016 at 6:15 PM Michael Armbrust 
wrote:

> Yes
>
>
> https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/4464261896877850/2840265927289860/latest.html
>
> On Tue, Dec 13, 2016 at 10:43 AM, Ninad Shringarpure 
> wrote:
>
>
> Hi Team,
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Does Spark 2.0 support non-primitive types in collect_list for inserting
> nested collections?
> Would appreciate any references or samples.
>
> Thanks,
> Ninad
>
>
>
>
>
>
>
>
>
>


Fwd: [Spark-SQL] collect_list() support for nested collection

2016-12-13 Thread Ninad Shringarpure
Hi Team,

Does Spark 2.0 support non-primitive types in collect_list for inserting
nested collections?
Would appreciate any references or samples.

Thanks,
Ninad


Unsubscribe

2016-11-28 Thread Ninad Shringarpure
Unsubscribe


Fwd: jdbcRDD for data ingestion from RDBMS

2016-10-17 Thread Ninad Shringarpure
Hi Team,

One of my client teams is trying to see if they can use Spark to source
data from RDBMS instead of Sqoop.  Data would be substantially large in the
order of billions of records.

I am not sure reading the documentations whether jdbcRDD by design is going
to be able to scale well for this amount of data. Plus some in-built
features provided in Sqoop like --direct might give better performance than
straight up jdbc.

My primary question to this group is if it is advisable to use jdbcRDD for
data sourcing and can we expect it to scale. Also performance wise how
would it compare to Sqoop.

Please let me know your thoughts and any pointers if anyone in the group
has already implemented it.

Thanks,
Ninad