They're effectively the same code paths. However, I'd recommend using the Data Frame API unless you have a specific need to pass in a custom Configuration object.
The Data Frame API has bindings in Scala, Java and Python, so that's another advantage. The phoenix-spark docs have a PySpark example, but it's applicable to Java (and Scala) as well. Josh On Thu, Jun 9, 2016 at 1:02 PM, Long, Xindian <xindian.l...@sensus.com> wrote: > Hi, Josh: > > > > Thanks for the answer. Do you know the underlining difference between the > following two ways of Loading a Dataframe? (using the Data Source API, or > Load as a DataFrame directly using a Configuration object) > > > > Is there a Java interface to use the functionality of > phoenixTableAsDataFrame, saveToPhoenix ? > > > > Thanks > > > > Xindian > > > Load as a DataFrame using the Data Source API > > *import org.apache.spark.SparkContext* > > *import org.apache.spark.sql.SQLContext* > > *import org.apache.phoenix.spark._* > > > > *val sc = new SparkContext("local", "phoenix-test")* > > *val sqlContext = new SQLContext(sc)* > > > > *val df = sqlContext.load(* > > * "org.apache.phoenix.spark",* > > * Map("table" -> "TABLE1", "zkUrl" -> "phoenix-server:2181")* > > *)* > > > > *df* > > * .filter(df("COL1") === "test_row_1" && df("ID") === 1L)* > > * .select(df("ID"))* > > * .show* > Or Load Load as a DataFrame directly using a Configuration object > > *import org.apache.hadoop.conf.Configuration* > > *import org.apache.spark.SparkContext* > > *import org.apache.spark.sql.SQLContext* > > *import org.apache.phoenix.spark._* > > > > *val configuration = new Configuration()* > > *// Can set Phoenix-specific settings, requires 'hbase.zookeeper.quorum'* > > > > *val sc = new SparkContext("local", "phoenix-test")* > > *val sqlContext = new SQLContext(sc)* > > > > *// Load the columns 'ID' and 'COL1' from TABLE1 as a DataFrame* > > *val df = sqlContext.phoenixTableAsDataFrame(* > > * "TABLE1", Array("ID", "COL1"), conf = configuration* > > *)* > > > > *df.show* > > > > > > > *From:* Josh Mahonin [mailto:jmaho...@gmail.com] > *Sent:* 2016年6月9日 9:44 > *To:* user@phoenix.apache.org > *Subject:* Re: phoenix spark options not supporint query in dbtable > > > > Hi Xindian, > > > > The phoenix-spark integration is based on the Phoenix MapReduce layer, > which doesn't support aggregate functions. However, as you mentioned, both > filtering and pruning predicates are pushed down to Phoenix. With an RDD or > DataFrame loaded, all of Spark's various aggregation methods are available > to you. > > > > Although the Spark JDBC data source supports the full complement of > Phoenix's supported queries, the way it achieves parallelism is to split > the query across a number of workers and connections based on a > 'partitionColumn' with a 'lowerBound' and 'upperBound', which must be > numeric. If your use case has numeric primary keys, then that is > potentially a good solution for you. [1] > > > > The phoenix-spark parallelism is based on the splits provided by the > Phoenix query planner, and has no requirements on specifying partition > columns or upper/lower bounds. It's up to you to evaluate which technique > is the right method for your use case. [2] > > > > Good luck, > > > > Josh > > [1] > http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases > > [2] https://phoenix.apache.org/phoenix_spark.html > > > > > > On Wed, Jun 8, 2016 at 6:01 PM, Long, Xindian <xindian.l...@sensus.com> > wrote: > > The Spark JDBC data source supports to specify a query as the “dbtable” > option. > > I assume all queries in the above query in pushed down to the database > instead of done in Spark. > > > > The phoenix spark plug in seems not supporting that. Why is that? Any > plan in the future to support it? > > > > I know phoenix spark does support an optional select clause and predicate > push down in some cases, but it is limited. > > > > Thanks > > > > Xindian > > > > > > ------------------------------------------- > > Xindian “Shindian” Long > > Mobile: 919-9168651 > > Email: xindian.l...@gmail.com > > > > > > > > >