Hi Cheyenne , That's a very interesting question, if secondary indexes are created well on phoenix table , hbase will use coprocessor to do the join operation (java based map reduce job still if I understand correctly) and then return the result . on the contrary spark is famous for its great improvement vs the traditional m/r operation ,once the two tables are in spark dataframe , I believe spark wins all the time . however it might take long time to load the two big table into spark .
I'll do this test in the future,right now our system is quite busy with ALS model tasks. Cheers, Dalin On Tue, Sep 13, 2016 at 3:58 PM, Cheyenne Forbes < [email protected]> wrote: > i've been thinking, is spark sql faster than phoenix (or phoenix-spark) > with selects with joins on large data (for example instagram's size)? > > Regards, > > Cheyenne Forbes > > Chief Executive Officer > Avapno Omnitech > > Chief Operating Officer > Avapno Solutions, Co. > > Chairman > Avapno Assets, LLC > > Bethel Town P.O > Westmoreland > Jamaica > > Email: [email protected] > Mobile: 876-881-7889 > skype: cheyenne.forbes1 > > > On Tue, Sep 13, 2016 at 8:41 AM, Josh Mahonin <[email protected]> wrote: > >> Hi Dalin, >> >> Thanks for the information, I'm glad to hear that the spark integration >> is working well for your use case. >> >> Josh >> >> On Mon, Sep 12, 2016 at 8:15 PM, dalin.qin <[email protected]> wrote: >> >>> Hi Josh, >>> >>> before the project kicked off , we get the idea that hbase is more >>> suitable for massive writing rather than batch full table reading(I forgot >>> where the idea from ,just some benchmart testing posted in the website >>> maybe). So we decide to read hbase only based on primary key for small >>> amount of data query request. we store the hbase result in json file either >>> as everyday's incremental changes(another benefit from json is you can put >>> them in a time based directory so that you could only query part of those >>> files), then use spark to read those json files and do the ML model or >>> report caculation. >>> >>> Hope this could help:) >>> >>> Dalin >>> >>> >>> On Mon, Sep 12, 2016 at 5:36 PM, Josh Mahonin <[email protected]> >>> wrote: >>> >>>> Hi Dalin, >>>> >>>> That's great to hear. Have you also tried reading back those rows >>>> through Spark for a larger "batch processing" job? Am curious if you have >>>> any experiences or insight there from operating on a large dataset. >>>> >>>> Thanks! >>>> >>>> Josh >>>> >>>> On Mon, Sep 12, 2016 at 10:29 AM, dalin.qin <[email protected]> wrote: >>>> >>>>> Hi , >>>>> I've used phoenix table to store billions of rows , rows are >>>>> incrementally insert into phoenix by spark every day and the table was for >>>>> instant query from web page by providing primary key . so far so good . >>>>> >>>>> Thanks >>>>> Dalin >>>>> >>>>> On Mon, Sep 12, 2016 at 10:07 AM, Cheyenne Forbes < >>>>> [email protected]> wrote: >>>>> >>>>>> Thanks everyone, I will be using phoenix for simple input/output and >>>>>> the phoenix_spark plugin (https://phoenix.apache.org/ph >>>>>> oenix_spark.html) for more complex queries, is that the smart thing? >>>>>> >>>>>> Regards, >>>>>> >>>>>> Cheyenne Forbes >>>>>> >>>>>> Chief Executive Officer >>>>>> Avapno Omnitech >>>>>> >>>>>> Chief Operating Officer >>>>>> Avapno Solutions, Co. >>>>>> >>>>>> Chairman >>>>>> Avapno Assets, LLC >>>>>> >>>>>> Bethel Town P.O >>>>>> Westmoreland >>>>>> Jamaica >>>>>> >>>>>> Email: [email protected] >>>>>> Mobile: 876-881-7889 >>>>>> skype: cheyenne.forbes1 >>>>>> >>>>>> >>>>>> On Sun, Sep 11, 2016 at 11:07 AM, Ted Yu <[email protected]> wrote: >>>>>> >>>>>>> w.r.t. Resource Management, Spark also relies on other framework >>>>>>> such as YARN or Mesos. >>>>>>> >>>>>>> Cheers >>>>>>> >>>>>>> On Sun, Sep 11, 2016 at 6:31 AM, John Leach <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Spark has a robust execution model with the following features that >>>>>>>> are not part of phoenix >>>>>>>> * Scalable >>>>>>>> * fault tolerance with lineage (Handles large intermediate >>>>>>>> results) >>>>>>>> * memory management for tasks >>>>>>>> * Resource Management (Fair Scheduling) >>>>>>>> * Additional SQL Features (Windowing ,etc.) >>>>>>>> * Machine Learning Libraries >>>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> John >>>>>>>> >>>>>>>> > On Sep 11, 2016, at 2:45 AM, Cheyenne Forbes < >>>>>>>> [email protected]> wrote: >>>>>>>> > >>>>>>>> > I realized there is a spark plugin for phoenix, any use cases? >>>>>>>> why would I use spark with phoenix instead of phoenix by itself? >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
