Re: can we insert and update with spark sql
I thought more on it...can we provide access to the IndexedRDD through thriftserver API and let the mapPartitions query the API ? I am not sure if ThriftServer is as performant as opening up an API using other akka based frameworks (like play or spray)... Any pointers will be really helpful... Neither play nor spray is being used in Spark right nowso it brings dependencies and we already know about the akka conflicts...thriftserver on the other hand is already integrated for JDBC access On Tue, Feb 10, 2015 at 3:43 PM, Debasish Das debasish.da...@gmail.com wrote: Also I wanted to run get() and set() from mapPartitions (from spark workers and not master)... To be able to do that I think I have to create a separate spark context for the cache... But I am not sure how SparkContext from job1 can access SparkContext from job2 ! On Tue, Feb 10, 2015 at 3:25 PM, Debasish Das debasish.da...@gmail.com wrote: Thanks...this is what I was looking for... It will be great if Ankur can give brief details about it...Basically how does it contrast with memcached for example... On Tue, Feb 10, 2015 at 2:32 PM, Michael Armbrust mich...@databricks.com wrote: You should look at https://github.com/amplab/spark-indexedrdd On Tue, Feb 10, 2015 at 2:27 PM, Debasish Das debasish.da...@gmail.com wrote: Hi Michael, I want to cache a RDD and define get() and set() operators on it. Basically like memcached. Is it possible to build a memcached like distributed cache using Spark SQL ? If not what do you suggest we should use for such operations... Thanks. Deb On Fri, Jul 18, 2014 at 1:00 PM, Michael Armbrust mich...@databricks.com wrote: You can do insert into. As with other SQL on HDFS systems there is no updating of data. On Jul 17, 2014 1:26 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Is this what you are looking for? https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/sql/parquet/InsertIntoParquetTable.html According to the doc, it says Operator that acts as a sink for queries on RDDs and can be used to store the output inside a directory of Parquet files. This operator is similar to Hive's INSERT INTO TABLE operation in the sense that one can choose to either overwrite or append to a directory. Note that consecutive insertions to the same table must have compatible (source) schemas. Thanks Best Regards On Thu, Jul 17, 2014 at 11:42 AM, Hu, Leo leo.h...@sap.com wrote: Hi As for spark 1.0, can we insert and update a table with SPARK SQL, and how? Thanks Best Regard
Re: can we insert and update with spark sql
You should look at https://github.com/amplab/spark-indexedrdd On Tue, Feb 10, 2015 at 2:27 PM, Debasish Das debasish.da...@gmail.com wrote: Hi Michael, I want to cache a RDD and define get() and set() operators on it. Basically like memcached. Is it possible to build a memcached like distributed cache using Spark SQL ? If not what do you suggest we should use for such operations... Thanks. Deb On Fri, Jul 18, 2014 at 1:00 PM, Michael Armbrust mich...@databricks.com wrote: You can do insert into. As with other SQL on HDFS systems there is no updating of data. On Jul 17, 2014 1:26 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Is this what you are looking for? https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/sql/parquet/InsertIntoParquetTable.html According to the doc, it says Operator that acts as a sink for queries on RDDs and can be used to store the output inside a directory of Parquet files. This operator is similar to Hive's INSERT INTO TABLE operation in the sense that one can choose to either overwrite or append to a directory. Note that consecutive insertions to the same table must have compatible (source) schemas. Thanks Best Regards On Thu, Jul 17, 2014 at 11:42 AM, Hu, Leo leo.h...@sap.com wrote: Hi As for spark 1.0, can we insert and update a table with SPARK SQL, and how? Thanks Best Regard
Re: can we insert and update with spark sql
Hi Michael, I want to cache a RDD and define get() and set() operators on it. Basically like memcached. Is it possible to build a memcached like distributed cache using Spark SQL ? If not what do you suggest we should use for such operations... Thanks. Deb On Fri, Jul 18, 2014 at 1:00 PM, Michael Armbrust mich...@databricks.com wrote: You can do insert into. As with other SQL on HDFS systems there is no updating of data. On Jul 17, 2014 1:26 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Is this what you are looking for? https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/sql/parquet/InsertIntoParquetTable.html According to the doc, it says Operator that acts as a sink for queries on RDDs and can be used to store the output inside a directory of Parquet files. This operator is similar to Hive's INSERT INTO TABLE operation in the sense that one can choose to either overwrite or append to a directory. Note that consecutive insertions to the same table must have compatible (source) schemas. Thanks Best Regards On Thu, Jul 17, 2014 at 11:42 AM, Hu, Leo leo.h...@sap.com wrote: Hi As for spark 1.0, can we insert and update a table with SPARK SQL, and how? Thanks Best Regard
Re: can we insert and update with spark sql
Thanks...this is what I was looking for... It will be great if Ankur can give brief details about it...Basically how does it contrast with memcached for example... On Tue, Feb 10, 2015 at 2:32 PM, Michael Armbrust mich...@databricks.com wrote: You should look at https://github.com/amplab/spark-indexedrdd On Tue, Feb 10, 2015 at 2:27 PM, Debasish Das debasish.da...@gmail.com wrote: Hi Michael, I want to cache a RDD and define get() and set() operators on it. Basically like memcached. Is it possible to build a memcached like distributed cache using Spark SQL ? If not what do you suggest we should use for such operations... Thanks. Deb On Fri, Jul 18, 2014 at 1:00 PM, Michael Armbrust mich...@databricks.com wrote: You can do insert into. As with other SQL on HDFS systems there is no updating of data. On Jul 17, 2014 1:26 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Is this what you are looking for? https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/sql/parquet/InsertIntoParquetTable.html According to the doc, it says Operator that acts as a sink for queries on RDDs and can be used to store the output inside a directory of Parquet files. This operator is similar to Hive's INSERT INTO TABLE operation in the sense that one can choose to either overwrite or append to a directory. Note that consecutive insertions to the same table must have compatible (source) schemas. Thanks Best Regards On Thu, Jul 17, 2014 at 11:42 AM, Hu, Leo leo.h...@sap.com wrote: Hi As for spark 1.0, can we insert and update a table with SPARK SQL, and how? Thanks Best Regard
Re: can we insert and update with spark sql
Also I wanted to run get() and set() from mapPartitions (from spark workers and not master)... To be able to do that I think I have to create a separate spark context for the cache... But I am not sure how SparkContext from job1 can access SparkContext from job2 ! On Tue, Feb 10, 2015 at 3:25 PM, Debasish Das debasish.da...@gmail.com wrote: Thanks...this is what I was looking for... It will be great if Ankur can give brief details about it...Basically how does it contrast with memcached for example... On Tue, Feb 10, 2015 at 2:32 PM, Michael Armbrust mich...@databricks.com wrote: You should look at https://github.com/amplab/spark-indexedrdd On Tue, Feb 10, 2015 at 2:27 PM, Debasish Das debasish.da...@gmail.com wrote: Hi Michael, I want to cache a RDD and define get() and set() operators on it. Basically like memcached. Is it possible to build a memcached like distributed cache using Spark SQL ? If not what do you suggest we should use for such operations... Thanks. Deb On Fri, Jul 18, 2014 at 1:00 PM, Michael Armbrust mich...@databricks.com wrote: You can do insert into. As with other SQL on HDFS systems there is no updating of data. On Jul 17, 2014 1:26 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Is this what you are looking for? https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/sql/parquet/InsertIntoParquetTable.html According to the doc, it says Operator that acts as a sink for queries on RDDs and can be used to store the output inside a directory of Parquet files. This operator is similar to Hive's INSERT INTO TABLE operation in the sense that one can choose to either overwrite or append to a directory. Note that consecutive insertions to the same table must have compatible (source) schemas. Thanks Best Regards On Thu, Jul 17, 2014 at 11:42 AM, Hu, Leo leo.h...@sap.com wrote: Hi As for spark 1.0, can we insert and update a table with SPARK SQL, and how? Thanks Best Regard
Re: can we insert and update with spark sql
You can do insert into. As with other SQL on HDFS systems there is no updating of data. On Jul 17, 2014 1:26 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Is this what you are looking for? https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/sql/parquet/InsertIntoParquetTable.html According to the doc, it says Operator that acts as a sink for queries on RDDs and can be used to store the output inside a directory of Parquet files. This operator is similar to Hive's INSERT INTO TABLE operation in the sense that one can choose to either overwrite or append to a directory. Note that consecutive insertions to the same table must have compatible (source) schemas. Thanks Best Regards On Thu, Jul 17, 2014 at 11:42 AM, Hu, Leo leo.h...@sap.com wrote: Hi As for spark 1.0, can we insert and update a table with SPARK SQL, and how? Thanks Best Regard
Re: can we insert and update with spark sql
Is this what you are looking for? https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/sql/parquet/InsertIntoParquetTable.html According to the doc, it says Operator that acts as a sink for queries on RDDs and can be used to store the output inside a directory of Parquet files. This operator is similar to Hive's INSERT INTO TABLE operation in the sense that one can choose to either overwrite or append to a directory. Note that consecutive insertions to the same table must have compatible (source) schemas. Thanks Best Regards On Thu, Jul 17, 2014 at 11:42 AM, Hu, Leo leo.h...@sap.com wrote: Hi As for spark 1.0, can we insert and update a table with SPARK SQL, and how? Thanks Best Regard