Re: Phoenix as a source for Spark processing

2018-03-07 Thread Stepan Migunov
Some more details... We have done some simple tests to compare read/write possibility spark+hive and spark+phoenix. And now we have the following results: Copy table (with no any transformations) (about 800 million rec): Hive (TEZ) - 752 sec Spark: >From Hive to Hive: 2463 sec >From Phoenix to H

LIMIT statement when loading data in Phoenix Spark module.

2018-03-07 Thread alexander . scherbatiy
Hello, I use the Phoenix Spark plugin to load data from HBase. There is the SparkSqlContextFunctions.phoenixTableAsDataFrame() method which allows to get a Dataset for the given table name, columns and a predicate. Is it possible to also provide LIMIT statement so the number of the retrieved r

Re: LIMIT statement when loading data in Phoenix Spark module.

2018-03-07 Thread Xavier Jodoin
You can do it directly with spark sql Xavier On 2018-03-07 06:38 AM, alexander.scherba...@yandex.com wrote: Hello, I use the Phoenix Spark plugin to load data from HBase. There is the SparkSqlContextFunctions.phoenixTableAsDataFrame() method which allows to get a Dataset for the given table

Re: LIMIT statement when loading data in Phoenix Spark module.

2018-03-07 Thread alexander . scherbatiy
Does it work that only the limited number of rows will be sent from the each HBase Region Server to the client? I just ask because I can use the WHERE statement in the same way in the Spark SQL instead of passing the predicate. Thanks, Alexandr. 07.03.2018, 15:35, "Xavier Jodoin" : > You can d

Re: LIMIT statement when loading data in Phoenix Spark module.

2018-03-07 Thread Xavier Jodoin
it will limit the number of rows fetched by the client On 2018-03-07 07:54 AM, alexander.scherba...@yandex.com wrote: Does it work that only the limited number of rows will be sent from the each HBase Region Server to the client? I just ask because I can use the WHERE statement in the same wa

Re: LIMIT statement when loading data in Phoenix Spark module.

2018-03-07 Thread alexander . scherbatiy
Is there a documentation which describes which queries and how will be propagated to the server during data fetching for the Phoenix Spark? Thanks, Alexandr. 07.03.2018, 16:24, "Xavier Jodoin" : > it will limit the number of rows fetched by the client > > On 2018-03-07 07:54 AM, alexander.sche

Re: Runtime DDL supported?

2018-03-07 Thread Miles Spielberg
We found https://issues.apache.org/jira/browse/PHOENIX-3547, which seems to be precisely our problem. We would want at least the option to use a bigint rather than the int in the JIRA to accommodate massive growth. While we intend to have many tenants, we don't intend to use the Phoenix "tenant_id"