if I was to use spark (via python api for example), the query would be processed on my webservers or on a separate server like in phoenix?
Regards, Cheyenne Forbes Chief Executive Officer Avapno Omnitech Chief Operating Officer Avapno Solutions, Co. Chairman Avapno Assets, LLC Bethel Town P.O Westmoreland Jamaica Email: [email protected] Mobile: 876-881-7889 skype: cheyenne.forbes1 On Tue, Sep 13, 2016 at 3:07 PM, dalin.qin <[email protected]> wrote: > Hi Cheyenne , > > That's a very interesting question, if secondary indexes are created well > on phoenix table , hbase will use coprocessor to do the join operation > (java based map reduce job still if I understand correctly) and then > return the result . on the contrary spark is famous for its great > improvement vs the traditional m/r operation ,once the two tables are in > spark dataframe , I believe spark wins all the time . however it might take > long time to load the two big table into spark . > > I'll do this test in the future,right now our system is quite busy with > ALS model tasks. > > Cheers, > Dalin > > On Tue, Sep 13, 2016 at 3:58 PM, Cheyenne Forbes < > [email protected]> wrote: > >> i've been thinking, is spark sql faster than phoenix (or phoenix-spark) >> with selects with joins on large data (for example instagram's size)? >> >> Regards, >> >> Cheyenne Forbes >> >> Chief Executive Officer >> Avapno Omnitech >> >> Chief Operating Officer >> Avapno Solutions, Co. >> >> Chairman >> Avapno Assets, LLC >> >> Bethel Town P.O >> Westmoreland >> Jamaica >> >> Email: [email protected] >> Mobile: 876-881-7889 >> skype: cheyenne.forbes1 >> >> >> On Tue, Sep 13, 2016 at 8:41 AM, Josh Mahonin <[email protected]> wrote: >> >>> Hi Dalin, >>> >>> Thanks for the information, I'm glad to hear that the spark integration >>> is working well for your use case. >>> >>> Josh >>> >>> On Mon, Sep 12, 2016 at 8:15 PM, dalin.qin <[email protected]> wrote: >>> >>>> Hi Josh, >>>> >>>> before the project kicked off , we get the idea that hbase is more >>>> suitable for massive writing rather than batch full table reading(I forgot >>>> where the idea from ,just some benchmart testing posted in the website >>>> maybe). So we decide to read hbase only based on primary key for small >>>> amount of data query request. we store the hbase result in json file either >>>> as everyday's incremental changes(another benefit from json is you can put >>>> them in a time based directory so that you could only query part of those >>>> files), then use spark to read those json files and do the ML model or >>>> report caculation. >>>> >>>> Hope this could help:) >>>> >>>> Dalin >>>> >>>> >>>> On Mon, Sep 12, 2016 at 5:36 PM, Josh Mahonin <[email protected]> >>>> wrote: >>>> >>>>> Hi Dalin, >>>>> >>>>> That's great to hear. Have you also tried reading back those rows >>>>> through Spark for a larger "batch processing" job? Am curious if you have >>>>> any experiences or insight there from operating on a large dataset. >>>>> >>>>> Thanks! >>>>> >>>>> Josh >>>>> >>>>> On Mon, Sep 12, 2016 at 10:29 AM, dalin.qin <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi , >>>>>> I've used phoenix table to store billions of rows , rows are >>>>>> incrementally insert into phoenix by spark every day and the table was >>>>>> for >>>>>> instant query from web page by providing primary key . so far so good . >>>>>> >>>>>> Thanks >>>>>> Dalin >>>>>> >>>>>> On Mon, Sep 12, 2016 at 10:07 AM, Cheyenne Forbes < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Thanks everyone, I will be using phoenix for simple input/output and >>>>>>> the phoenix_spark plugin (https://phoenix.apache.org/ph >>>>>>> oenix_spark.html) for more complex queries, is that the smart thing? >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Cheyenne Forbes >>>>>>> >>>>>>> Chief Executive Officer >>>>>>> Avapno Omnitech >>>>>>> >>>>>>> Chief Operating Officer >>>>>>> Avapno Solutions, Co. >>>>>>> >>>>>>> Chairman >>>>>>> Avapno Assets, LLC >>>>>>> >>>>>>> Bethel Town P.O >>>>>>> Westmoreland >>>>>>> Jamaica >>>>>>> >>>>>>> Email: [email protected] >>>>>>> Mobile: 876-881-7889 >>>>>>> skype: cheyenne.forbes1 >>>>>>> >>>>>>> >>>>>>> On Sun, Sep 11, 2016 at 11:07 AM, Ted Yu <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> w.r.t. Resource Management, Spark also relies on other framework >>>>>>>> such as YARN or Mesos. >>>>>>>> >>>>>>>> Cheers >>>>>>>> >>>>>>>> On Sun, Sep 11, 2016 at 6:31 AM, John Leach <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Spark has a robust execution model with the following features >>>>>>>>> that are not part of phoenix >>>>>>>>> * Scalable >>>>>>>>> * fault tolerance with lineage (Handles large intermediate >>>>>>>>> results) >>>>>>>>> * memory management for tasks >>>>>>>>> * Resource Management (Fair Scheduling) >>>>>>>>> * Additional SQL Features (Windowing ,etc.) >>>>>>>>> * Machine Learning Libraries >>>>>>>>> >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> John >>>>>>>>> >>>>>>>>> > On Sep 11, 2016, at 2:45 AM, Cheyenne Forbes < >>>>>>>>> [email protected]> wrote: >>>>>>>>> > >>>>>>>>> > I realized there is a spark plugin for phoenix, any use cases? >>>>>>>>> why would I use spark with phoenix instead of phoenix by itself? >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
