Re: When would/should I use spark with phoenix?

dalin.qin Tue, 13 Sep 2016 13:08:26 -0700

Hi Cheyenne ,

That's a very interesting question, if secondary indexes are created well
on phoenix table , hbase will use coprocessor to do the join operation
(java based  map reduce job still if I understand correctly) and then
return the result . on the contrary spark is famous for its great
improvement vs the traditional m/r operation ,once the two tables are in
spark dataframe , I believe spark wins all the time . however it might take
long time to load the two big table into spark .


I'll do this test in the future,right now our system is quite busy with ALS
model tasks.

Cheers,
Dalin

On Tue, Sep 13, 2016 at 3:58 PM, Cheyenne Forbes <
[email protected]> wrote:

> i've been thinking, is spark sql faster than phoenix (or phoenix-spark)
> with selects with joins on large data (for example instagram's size)?
>
> Regards,
>
> Cheyenne Forbes
>
> Chief Executive Officer
> Avapno Omnitech
>
> Chief Operating Officer
> Avapno Solutions, Co.
>
> Chairman
> Avapno Assets, LLC
>
> Bethel Town P.O
> Westmoreland
> Jamaica
>
> Email: [email protected]
> Mobile: 876-881-7889
> skype: cheyenne.forbes1
>
>
> On Tue, Sep 13, 2016 at 8:41 AM, Josh Mahonin <[email protected]> wrote:
>
>> Hi Dalin,
>>
>> Thanks for the information, I'm glad to hear that the spark integration
>> is working well for your use case.
>>
>> Josh
>>
>> On Mon, Sep 12, 2016 at 8:15 PM, dalin.qin <[email protected]> wrote:
>>
>>> Hi Josh,
>>>
>>> before the project kicked off , we get the idea that hbase is more
>>> suitable for massive writing rather than batch full table reading(I forgot
>>> where the idea from ,just some benchmart testing posted in the website
>>> maybe). So we decide to read hbase only based on primary key for small
>>> amount of data query request. we store the hbase result in json file either
>>> as everyday's incremental changes(another benefit from json is you can put
>>> them in a time based directory so that you could only query part of those
>>> files), then use spark to read those json files and do the ML model or
>>> report caculation.
>>>
>>> Hope this could help:)
>>>
>>> Dalin
>>>
>>>
>>> On Mon, Sep 12, 2016 at 5:36 PM, Josh Mahonin <[email protected]>
>>> wrote:
>>>
>>>> Hi Dalin,
>>>>
>>>> That's great to hear. Have you also tried reading back those rows
>>>> through Spark for a larger "batch processing" job? Am curious if you have
>>>> any experiences or insight there from operating on a large dataset.
>>>>
>>>> Thanks!
>>>>
>>>> Josh
>>>>
>>>> On Mon, Sep 12, 2016 at 10:29 AM, dalin.qin <[email protected]> wrote:
>>>>
>>>>> Hi ,
>>>>> I've used phoenix table to store billions of rows , rows are
>>>>> incrementally insert into phoenix by spark every day and the table was for
>>>>> instant query from web page by providing primary key . so far so good .
>>>>>
>>>>> Thanks
>>>>> Dalin
>>>>>
>>>>> On Mon, Sep 12, 2016 at 10:07 AM, Cheyenne Forbes <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Thanks everyone, I will be using phoenix for simple input/output and
>>>>>> the phoenix_spark plugin (https://phoenix.apache.org/ph
>>>>>> oenix_spark.html) for more complex queries, is that the smart thing?
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Cheyenne Forbes
>>>>>>
>>>>>> Chief Executive Officer
>>>>>> Avapno Omnitech
>>>>>>
>>>>>> Chief Operating Officer
>>>>>> Avapno Solutions, Co.
>>>>>>
>>>>>> Chairman
>>>>>> Avapno Assets, LLC
>>>>>>
>>>>>> Bethel Town P.O
>>>>>> Westmoreland
>>>>>> Jamaica
>>>>>>
>>>>>> Email: [email protected]
>>>>>> Mobile: 876-881-7889
>>>>>> skype: cheyenne.forbes1
>>>>>>
>>>>>>
>>>>>> On Sun, Sep 11, 2016 at 11:07 AM, Ted Yu <[email protected]> wrote:
>>>>>>
>>>>>>> w.r.t. Resource Management, Spark also relies on other framework
>>>>>>> such as YARN or Mesos.
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>> On Sun, Sep 11, 2016 at 6:31 AM, John Leach <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Spark has a robust execution model with the following features that
>>>>>>>> are not part of phoenix
>>>>>>>>         * Scalable
>>>>>>>>         * fault tolerance with lineage (Handles large intermediate
>>>>>>>> results)
>>>>>>>>         * memory management for tasks
>>>>>>>>         * Resource Management (Fair Scheduling)
>>>>>>>>         * Additional SQL Features (Windowing ,etc.)
>>>>>>>>         * Machine Learning Libraries
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> John
>>>>>>>>
>>>>>>>> > On Sep 11, 2016, at 2:45 AM, Cheyenne Forbes <
>>>>>>>> [email protected]> wrote:
>>>>>>>> >
>>>>>>>> > I realized there is a spark plugin for phoenix, any use cases?
>>>>>>>> why would I use spark with phoenix instead of phoenix by itself?
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: When would/should I use spark with phoenix?

Reply via email to