Re: When would/should I use spark with phoenix?

Cheyenne Forbes Tue, 13 Sep 2016 13:17:23 -0700

if I was to use spark (via python api for example), the query would be
processed on my webservers or on a separate server like in phoenix?


Regards,

Cheyenne Forbes

Chief Executive Officer
Avapno Omnitech

Chief Operating Officer
Avapno Solutions, Co.

Chairman
Avapno Assets, LLC

Bethel Town P.O
Westmoreland
Jamaica

Email: [email protected]
Mobile: 876-881-7889
skype: cheyenne.forbes1


On Tue, Sep 13, 2016 at 3:07 PM, dalin.qin <[email protected]> wrote:

> Hi Cheyenne ,
>
> That's a very interesting question, if secondary indexes are created well
> on phoenix table , hbase will use coprocessor to do the join operation
> (java based  map reduce job still if I understand correctly) and then
> return the result . on the contrary spark is famous for its great
> improvement vs the traditional m/r operation ,once the two tables are in
> spark dataframe , I believe spark wins all the time . however it might take
> long time to load the two big table into spark .
>
> I'll do this test in the future,right now our system is quite busy with
> ALS model tasks.
>
> Cheers,
> Dalin
>
> On Tue, Sep 13, 2016 at 3:58 PM, Cheyenne Forbes <
> [email protected]> wrote:
>
>> i've been thinking, is spark sql faster than phoenix (or phoenix-spark)
>> with selects with joins on large data (for example instagram's size)?
>>
>> Regards,
>>
>> Cheyenne Forbes
>>
>> Chief Executive Officer
>> Avapno Omnitech
>>
>> Chief Operating Officer
>> Avapno Solutions, Co.
>>
>> Chairman
>> Avapno Assets, LLC
>>
>> Bethel Town P.O
>> Westmoreland
>> Jamaica
>>
>> Email: [email protected]
>> Mobile: 876-881-7889
>> skype: cheyenne.forbes1
>>
>>
>> On Tue, Sep 13, 2016 at 8:41 AM, Josh Mahonin <[email protected]> wrote:
>>
>>> Hi Dalin,
>>>
>>> Thanks for the information, I'm glad to hear that the spark integration
>>> is working well for your use case.
>>>
>>> Josh
>>>
>>> On Mon, Sep 12, 2016 at 8:15 PM, dalin.qin <[email protected]> wrote:
>>>
>>>> Hi Josh,
>>>>
>>>> before the project kicked off , we get the idea that hbase is more
>>>> suitable for massive writing rather than batch full table reading(I forgot
>>>> where the idea from ,just some benchmart testing posted in the website
>>>> maybe). So we decide to read hbase only based on primary key for small
>>>> amount of data query request. we store the hbase result in json file either
>>>> as everyday's incremental changes(another benefit from json is you can put
>>>> them in a time based directory so that you could only query part of those
>>>> files), then use spark to read those json files and do the ML model or
>>>> report caculation.
>>>>
>>>> Hope this could help:)
>>>>
>>>> Dalin
>>>>
>>>>
>>>> On Mon, Sep 12, 2016 at 5:36 PM, Josh Mahonin <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Dalin,
>>>>>
>>>>> That's great to hear. Have you also tried reading back those rows
>>>>> through Spark for a larger "batch processing" job? Am curious if you have
>>>>> any experiences or insight there from operating on a large dataset.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Josh
>>>>>
>>>>> On Mon, Sep 12, 2016 at 10:29 AM, dalin.qin <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi ,
>>>>>> I've used phoenix table to store billions of rows , rows are
>>>>>> incrementally insert into phoenix by spark every day and the table was 
>>>>>> for
>>>>>> instant query from web page by providing primary key . so far so good .
>>>>>>
>>>>>> Thanks
>>>>>> Dalin
>>>>>>
>>>>>> On Mon, Sep 12, 2016 at 10:07 AM, Cheyenne Forbes <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Thanks everyone, I will be using phoenix for simple input/output and
>>>>>>> the phoenix_spark plugin (https://phoenix.apache.org/ph
>>>>>>> oenix_spark.html) for more complex queries, is that the smart thing?
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Cheyenne Forbes
>>>>>>>
>>>>>>> Chief Executive Officer
>>>>>>> Avapno Omnitech
>>>>>>>
>>>>>>> Chief Operating Officer
>>>>>>> Avapno Solutions, Co.
>>>>>>>
>>>>>>> Chairman
>>>>>>> Avapno Assets, LLC
>>>>>>>
>>>>>>> Bethel Town P.O
>>>>>>> Westmoreland
>>>>>>> Jamaica
>>>>>>>
>>>>>>> Email: [email protected]
>>>>>>> Mobile: 876-881-7889
>>>>>>> skype: cheyenne.forbes1
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Sep 11, 2016 at 11:07 AM, Ted Yu <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> w.r.t. Resource Management, Spark also relies on other framework
>>>>>>>> such as YARN or Mesos.
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>>
>>>>>>>> On Sun, Sep 11, 2016 at 6:31 AM, John Leach <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Spark has a robust execution model with the following features
>>>>>>>>> that are not part of phoenix
>>>>>>>>>         * Scalable
>>>>>>>>>         * fault tolerance with lineage (Handles large intermediate
>>>>>>>>> results)
>>>>>>>>>         * memory management for tasks
>>>>>>>>>         * Resource Management (Fair Scheduling)
>>>>>>>>>         * Additional SQL Features (Windowing ,etc.)
>>>>>>>>>         * Machine Learning Libraries
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>> > On Sep 11, 2016, at 2:45 AM, Cheyenne Forbes <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>> >
>>>>>>>>> > I realized there is a spark plugin for phoenix, any use cases?
>>>>>>>>> why would I use spark with phoenix instead of phoenix by itself?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: When would/should I use spark with phoenix?

Reply via email to