Re: can we insert and update with spark sql

2015-02-12 Thread Debasish Das
I thought more on it...can we provide access to the IndexedRDD through
thriftserver API and let the mapPartitions query the API ? I am not sure if
ThriftServer is as performant as opening up an API using other akka based
frameworks (like play or spray)...

Any pointers will be really helpful...

Neither play nor spray is being used in Spark right nowso it brings
dependencies and we already know about the akka conflicts...thriftserver on
the other hand is already integrated for JDBC access


On Tue, Feb 10, 2015 at 3:43 PM, Debasish Das debasish.da...@gmail.com
wrote:

 Also I wanted to run get() and set() from mapPartitions (from spark
 workers and not master)...

 To be able to do that I think I have to create a separate spark context
 for the cache...

 But I am not sure how SparkContext from job1 can access SparkContext from
 job2 !


 On Tue, Feb 10, 2015 at 3:25 PM, Debasish Das debasish.da...@gmail.com
 wrote:

 Thanks...this is what I was looking for...

 It will be great if Ankur can give brief details about it...Basically how
 does it contrast with memcached for example...

 On Tue, Feb 10, 2015 at 2:32 PM, Michael Armbrust mich...@databricks.com
  wrote:

 You should look at https://github.com/amplab/spark-indexedrdd

 On Tue, Feb 10, 2015 at 2:27 PM, Debasish Das debasish.da...@gmail.com
 wrote:

 Hi Michael,

 I want to cache a RDD and define get() and set() operators on it.
 Basically like memcached. Is it possible to build a memcached like
 distributed cache using Spark SQL ? If not what do you suggest we should
 use for such operations...

 Thanks.
 Deb

 On Fri, Jul 18, 2014 at 1:00 PM, Michael Armbrust 
 mich...@databricks.com wrote:

 You can do insert into.  As with other SQL on HDFS systems there is no
 updating of data.
 On Jul 17, 2014 1:26 AM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 Is this what you are looking for?


 https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/sql/parquet/InsertIntoParquetTable.html

 According to the doc, it says Operator that acts as a sink for
 queries on RDDs and can be used to store the output inside a directory of
 Parquet files. This operator is similar to Hive's INSERT INTO TABLE
 operation in the sense that one can choose to either overwrite or append 
 to
 a directory. Note that consecutive insertions to the same table must have
 compatible (source) schemas.

 Thanks
 Best Regards


 On Thu, Jul 17, 2014 at 11:42 AM, Hu, Leo leo.h...@sap.com wrote:

  Hi

As for spark 1.0, can we insert and update a table with SPARK
 SQL, and how?



 Thanks

 Best Regard









Re: can we insert and update with spark sql

2015-02-10 Thread Michael Armbrust
You should look at https://github.com/amplab/spark-indexedrdd

On Tue, Feb 10, 2015 at 2:27 PM, Debasish Das debasish.da...@gmail.com
wrote:

 Hi Michael,

 I want to cache a RDD and define get() and set() operators on it.
 Basically like memcached. Is it possible to build a memcached like
 distributed cache using Spark SQL ? If not what do you suggest we should
 use for such operations...

 Thanks.
 Deb

 On Fri, Jul 18, 2014 at 1:00 PM, Michael Armbrust mich...@databricks.com
 wrote:

 You can do insert into.  As with other SQL on HDFS systems there is no
 updating of data.
 On Jul 17, 2014 1:26 AM, Akhil Das ak...@sigmoidanalytics.com wrote:

 Is this what you are looking for?


 https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/sql/parquet/InsertIntoParquetTable.html

 According to the doc, it says Operator that acts as a sink for queries
 on RDDs and can be used to store the output inside a directory of Parquet
 files. This operator is similar to Hive's INSERT INTO TABLE operation in
 the sense that one can choose to either overwrite or append to a directory.
 Note that consecutive insertions to the same table must have compatible
 (source) schemas.

 Thanks
 Best Regards


 On Thu, Jul 17, 2014 at 11:42 AM, Hu, Leo leo.h...@sap.com wrote:

  Hi

As for spark 1.0, can we insert and update a table with SPARK SQL,
 and how?



 Thanks

 Best Regard






Re: can we insert and update with spark sql

2015-02-10 Thread Debasish Das
Hi Michael,

I want to cache a RDD and define get() and set() operators on it. Basically
like memcached. Is it possible to build a memcached like distributed cache
using Spark SQL ? If not what do you suggest we should use for such
operations...

Thanks.
Deb

On Fri, Jul 18, 2014 at 1:00 PM, Michael Armbrust mich...@databricks.com
wrote:

 You can do insert into.  As with other SQL on HDFS systems there is no
 updating of data.
 On Jul 17, 2014 1:26 AM, Akhil Das ak...@sigmoidanalytics.com wrote:

 Is this what you are looking for?


 https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/sql/parquet/InsertIntoParquetTable.html

 According to the doc, it says Operator that acts as a sink for queries
 on RDDs and can be used to store the output inside a directory of Parquet
 files. This operator is similar to Hive's INSERT INTO TABLE operation in
 the sense that one can choose to either overwrite or append to a directory.
 Note that consecutive insertions to the same table must have compatible
 (source) schemas.

 Thanks
 Best Regards


 On Thu, Jul 17, 2014 at 11:42 AM, Hu, Leo leo.h...@sap.com wrote:

  Hi

As for spark 1.0, can we insert and update a table with SPARK SQL,
 and how?



 Thanks

 Best Regard





Re: can we insert and update with spark sql

2015-02-10 Thread Debasish Das
Thanks...this is what I was looking for...

It will be great if Ankur can give brief details about it...Basically how
does it contrast with memcached for example...

On Tue, Feb 10, 2015 at 2:32 PM, Michael Armbrust mich...@databricks.com
wrote:

 You should look at https://github.com/amplab/spark-indexedrdd

 On Tue, Feb 10, 2015 at 2:27 PM, Debasish Das debasish.da...@gmail.com
 wrote:

 Hi Michael,

 I want to cache a RDD and define get() and set() operators on it.
 Basically like memcached. Is it possible to build a memcached like
 distributed cache using Spark SQL ? If not what do you suggest we should
 use for such operations...

 Thanks.
 Deb

 On Fri, Jul 18, 2014 at 1:00 PM, Michael Armbrust mich...@databricks.com
  wrote:

 You can do insert into.  As with other SQL on HDFS systems there is no
 updating of data.
 On Jul 17, 2014 1:26 AM, Akhil Das ak...@sigmoidanalytics.com wrote:

 Is this what you are looking for?


 https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/sql/parquet/InsertIntoParquetTable.html

 According to the doc, it says Operator that acts as a sink for
 queries on RDDs and can be used to store the output inside a directory of
 Parquet files. This operator is similar to Hive's INSERT INTO TABLE
 operation in the sense that one can choose to either overwrite or append to
 a directory. Note that consecutive insertions to the same table must have
 compatible (source) schemas.

 Thanks
 Best Regards


 On Thu, Jul 17, 2014 at 11:42 AM, Hu, Leo leo.h...@sap.com wrote:

  Hi

As for spark 1.0, can we insert and update a table with SPARK SQL,
 and how?



 Thanks

 Best Regard







Re: can we insert and update with spark sql

2015-02-10 Thread Debasish Das
Also I wanted to run get() and set() from mapPartitions (from spark workers
and not master)...

To be able to do that I think I have to create a separate spark context for
the cache...

But I am not sure how SparkContext from job1 can access SparkContext from
job2 !


On Tue, Feb 10, 2015 at 3:25 PM, Debasish Das debasish.da...@gmail.com
wrote:

 Thanks...this is what I was looking for...

 It will be great if Ankur can give brief details about it...Basically how
 does it contrast with memcached for example...

 On Tue, Feb 10, 2015 at 2:32 PM, Michael Armbrust mich...@databricks.com
 wrote:

 You should look at https://github.com/amplab/spark-indexedrdd

 On Tue, Feb 10, 2015 at 2:27 PM, Debasish Das debasish.da...@gmail.com
 wrote:

 Hi Michael,

 I want to cache a RDD and define get() and set() operators on it.
 Basically like memcached. Is it possible to build a memcached like
 distributed cache using Spark SQL ? If not what do you suggest we should
 use for such operations...

 Thanks.
 Deb

 On Fri, Jul 18, 2014 at 1:00 PM, Michael Armbrust 
 mich...@databricks.com wrote:

 You can do insert into.  As with other SQL on HDFS systems there is no
 updating of data.
 On Jul 17, 2014 1:26 AM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 Is this what you are looking for?


 https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/sql/parquet/InsertIntoParquetTable.html

 According to the doc, it says Operator that acts as a sink for
 queries on RDDs and can be used to store the output inside a directory of
 Parquet files. This operator is similar to Hive's INSERT INTO TABLE
 operation in the sense that one can choose to either overwrite or append 
 to
 a directory. Note that consecutive insertions to the same table must have
 compatible (source) schemas.

 Thanks
 Best Regards


 On Thu, Jul 17, 2014 at 11:42 AM, Hu, Leo leo.h...@sap.com wrote:

  Hi

As for spark 1.0, can we insert and update a table with SPARK SQL,
 and how?



 Thanks

 Best Regard








Re: can we insert and update with spark sql

2014-07-18 Thread Michael Armbrust
You can do insert into.  As with other SQL on HDFS systems there is no
updating of data.
On Jul 17, 2014 1:26 AM, Akhil Das ak...@sigmoidanalytics.com wrote:

 Is this what you are looking for?


 https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/sql/parquet/InsertIntoParquetTable.html

 According to the doc, it says Operator that acts as a sink for queries
 on RDDs and can be used to store the output inside a directory of Parquet
 files. This operator is similar to Hive's INSERT INTO TABLE operation in
 the sense that one can choose to either overwrite or append to a directory.
 Note that consecutive insertions to the same table must have compatible
 (source) schemas.

 Thanks
 Best Regards


 On Thu, Jul 17, 2014 at 11:42 AM, Hu, Leo leo.h...@sap.com wrote:

  Hi

As for spark 1.0, can we insert and update a table with SPARK SQL, and
 how?



 Thanks

 Best Regard





Re: can we insert and update with spark sql

2014-07-17 Thread Akhil Das
Is this what you are looking for?

https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/sql/parquet/InsertIntoParquetTable.html

According to the doc, it says Operator that acts as a sink for queries on
RDDs and can be used to store the output inside a directory of Parquet
files. This operator is similar to Hive's INSERT INTO TABLE operation in
the sense that one can choose to either overwrite or append to a directory.
Note that consecutive insertions to the same table must have compatible
(source) schemas.

Thanks
Best Regards


On Thu, Jul 17, 2014 at 11:42 AM, Hu, Leo leo.h...@sap.com wrote:

  Hi

As for spark 1.0, can we insert and update a table with SPARK SQL, and
 how?



 Thanks

 Best Regard