Re: [Architecture] tenant specific MQTT receivers in DAS , not listening to topics once tenant get unloaded
Hi, On Mon, Mar 20, 2017 at 11:02 AM, Sinthuja Ragendran <sinth...@wso2.com> wrote: > Hi, > > As the receiver configurations are deployable artefacts, those will be > active when the tenant is loaded. One approach is to have all tenants > loaded indefinitely. I think this will have high memory. And therefore we > internally discussed below approach to handling this problem. > > Instead of having multiple MQTT receiver configurations per tenant to > handle this, implement a specialised/privileged MQTT event receiver which > could handle multiple subscriptions on behalf of tenants, and it's only > deployable in the super tenant mode. In that case, this event receiver will > have the topic URI with {tenantDomain} placeholder and it is used to > subscribe to the specific tenanted topic. And then, based on which topic > the event has arrived the tenant flow will be started and an event will be > inserted into specific tenant space. By this way, only the tenants which > are actively used/sending events will be loaded, and not all tenants are > required to be loaded. > > Please share your thoughts on this. Also, AFAIR we had the similar > requirement for Task execution. @Anjana, how are we handling that? > Yes, the tasks and their definitions are stored in the super tenant space. So they get triggered appropriately, and as required, any tenant specific resources would be loaded by the task implementation. Cheers, Anjana. > > Thanks, > Sinthuja. > > On Mon, Mar 20, 2017 at 10:50 AM, Jasintha Dasanayake <jasin...@wso2.com> > wrote: > >> HI All >> >> When DAS working in tenant mode and a particular tenant has MQTT >> receivers, those cannot be activated once tenants get unloaded. For an >> example , if I restart the DAS then those tenants specific MQTT receivers >> are not loaded unless we explicitly load that particular tenant. IMO, >> expected behavior would be, those receivers should be loaded and subscribed >> to a particular topic without loading the tenants explicitly. >> >> Are there any known mechanism to address this particular problem ? >> >> Thanks and Regards >> /jasintha >> >> -- >> >> *Jasintha Dasanayake**Associate Technical Lead* >> >> *WSO2 Inc. | http://wso2.com <http://wso2.com/>lean . enterprise . >> middleware* >> >> >> *mobile :- 0711-368-118 <071%20136%208118>* >> > > > > -- > *Sinthuja Rajendran* > Technical Lead > WSO2, Inc.:http://wso2.com > > Blog: http://sinthu-rajan.blogspot.com/ > Mobile: +94774273955 <077%20427%203955> > > > -- *Anjana Fernando* Associate Director / Architect WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Opentracing
Hi Srinath, Looks interesting. @Gokul, can you please have a look and give a summary. Maybe we can submit a GSoC project for this, if it's actually worth doing. Cheers, Anjana. On Wed, Feb 1, 2017 at 10:36 AM, Srinath Perera <srin...@wso2.com> wrote: > They are trying to build an open standard ( or so they says). > It seem to come from zipkin > Having one would solve lot of problems. > >- http://opentracing.io/ >- https://medium.com/opentracing/distributed-tracing-in-10-minutes- >51b378ee40f1#.5rfk4tfwa >- https://medium.com/opentracing/towards-turnkey-distributed-tracing- >5f4297d1736#.xiy7fet0j > > Anjana, could you have a look? If it make sense, maybe we can support it. > > -- > > Srinath Perera, Ph.D. >http://people.apache.org/~hemapani/ >http://srinathsview.blogspot.com/ > -- *Anjana Fernando* Associate Director / Architect WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] RDBMS based coordinator election algorithm for MB
Hi Ramith, Sure. Actually, I was talking with SameeraR to take over this and create a common component which has the required coordination functionality. The idea is to create a component, where the providers can be plugged in, such as the RDBMS based one, ZK, or any other container specific provider that maybe out there. Cheers, Anjana. On Mon, Nov 7, 2016 at 12:38 PM, Ramith Jayasinghe <ram...@wso2.com> wrote: > this might require some work.. shall we have a chat? > > On Thu, Nov 3, 2016 at 3:52 PM, Anjana Fernando <anj...@wso2.com> wrote: > >> Ping! .. >> >> On Wed, Nov 2, 2016 at 5:03 PM, Anjana Fernando <anj...@wso2.com> wrote: >> >>> Hi, >>> >>> On Wed, Nov 2, 2016 at 3:14 PM, Asanka Abeyweera <asank...@wso2.com> >>> wrote: >>> >>>> Hi Anjana, >>>> >>>> Currently, the implementation is part of the MB code (not a common >>>> component). >>>> >>> >>> Okay, can we please get it as a common component. >>> >>> Cheers, >>> Anjana. >>> >>> >>>> >>>> On Wed, Nov 2, 2016 at 3:00 PM, Anjana Fernando <anj...@wso2.com> >>>> wrote: >>>> >>>>> Hi Asanka/Ramith, >>>>> >>>>> So for C5 based Streaming Analytics solution, we need coordination >>>>> functionality there as well. Is the functionality mentioned here created >>>>> as >>>>> a common component or baked in to the MB code? .. if so, can we please get >>>>> it implemented it as a generic component, so other products can also use >>>>> it. >>>>> >>>>> Cheers, >>>>> Anjana. >>>>> >>>>> On Tue, Aug 9, 2016 at 3:10 PM, Anjana Fernando <anj...@wso2.com> >>>>> wrote: >>>>> >>>>>> Great! .. >>>>>> >>>>>> Cheers, >>>>>> Anjana. >>>>>> >>>>>> On Tue, Aug 9, 2016 at 1:49 PM, Asanka Abeyweera <asank...@wso2.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Anjana, >>>>>>> >>>>>>> Thank you for the suggestion. We have already done a similar thing. >>>>>>> We have added a backoff time after creating the leader entry and check >>>>>>> if >>>>>>> the leader entry is the entry created by self before informing the >>>>>>> leader >>>>>>> change. >>>>>>> >>>>>>> On Tue, Aug 9, 2016 at 12:27 PM, Anjana Fernando <anj...@wso2.com> >>>>>>> wrote: >>>>>>> >>>>>>>> I see, thanks for the clarification, looks good! .. I think small >>>>>>>> thing to consider is, to avoid the situation where, the current leader >>>>>>>> goes >>>>>>>> away, and two other competes to become the leader, and the first one >>>>>>>> and >>>>>>>> the second one checks (reads) the table to check the last heartbeat and >>>>>>>> figures out that the leader is outdated at the same time, and then >>>>>>>> first >>>>>>>> one delete the entry and puts his one, and after that, second one will >>>>>>>> also >>>>>>>> delete the existing one and put his one, so both will think they >>>>>>>> became the >>>>>>>> leader, due to the condition that both succeeded in adding the entry >>>>>>>> without an error. So this can probably be fixed by checking back after >>>>>>>> a >>>>>>>> bit of time if the current node is actually me, which probabilistically >>>>>>>> will work well, if that time period is sufficient big enough than a >>>>>>>> typical >>>>>>>> database transaction required by a node to do the earlier operations. >>>>>>>> Or >>>>>>>> else, we should make sure the database transaction level used in this >>>>>>>> scenario is at least REPEATABLE_READ, where when we read the record, it >>>>>>>> will lock it throughout the transaction. So some DBMSs does not support >>>>>>>> REPEATABLE_READ, where in that case, we should be able to use >>>>&
Re: [Architecture] RDBMS based coordinator election algorithm for MB
Ping! .. On Wed, Nov 2, 2016 at 5:03 PM, Anjana Fernando <anj...@wso2.com> wrote: > Hi, > > On Wed, Nov 2, 2016 at 3:14 PM, Asanka Abeyweera <asank...@wso2.com> > wrote: > >> Hi Anjana, >> >> Currently, the implementation is part of the MB code (not a common >> component). >> > > Okay, can we please get it as a common component. > > Cheers, > Anjana. > > >> >> On Wed, Nov 2, 2016 at 3:00 PM, Anjana Fernando <anj...@wso2.com> wrote: >> >>> Hi Asanka/Ramith, >>> >>> So for C5 based Streaming Analytics solution, we need coordination >>> functionality there as well. Is the functionality mentioned here created as >>> a common component or baked in to the MB code? .. if so, can we please get >>> it implemented it as a generic component, so other products can also use >>> it. >>> >>> Cheers, >>> Anjana. >>> >>> On Tue, Aug 9, 2016 at 3:10 PM, Anjana Fernando <anj...@wso2.com> wrote: >>> >>>> Great! .. >>>> >>>> Cheers, >>>> Anjana. >>>> >>>> On Tue, Aug 9, 2016 at 1:49 PM, Asanka Abeyweera <asank...@wso2.com> >>>> wrote: >>>> >>>>> Hi Anjana, >>>>> >>>>> Thank you for the suggestion. We have already done a similar thing. We >>>>> have added a backoff time after creating the leader entry and check if the >>>>> leader entry is the entry created by self before informing the leader >>>>> change. >>>>> >>>>> On Tue, Aug 9, 2016 at 12:27 PM, Anjana Fernando <anj...@wso2.com> >>>>> wrote: >>>>> >>>>>> I see, thanks for the clarification, looks good! .. I think small >>>>>> thing to consider is, to avoid the situation where, the current leader >>>>>> goes >>>>>> away, and two other competes to become the leader, and the first one and >>>>>> the second one checks (reads) the table to check the last heartbeat and >>>>>> figures out that the leader is outdated at the same time, and then first >>>>>> one delete the entry and puts his one, and after that, second one will >>>>>> also >>>>>> delete the existing one and put his one, so both will think they became >>>>>> the >>>>>> leader, due to the condition that both succeeded in adding the entry >>>>>> without an error. So this can probably be fixed by checking back after a >>>>>> bit of time if the current node is actually me, which probabilistically >>>>>> will work well, if that time period is sufficient big enough than a >>>>>> typical >>>>>> database transaction required by a node to do the earlier operations. Or >>>>>> else, we should make sure the database transaction level used in this >>>>>> scenario is at least REPEATABLE_READ, where when we read the record, it >>>>>> will lock it throughout the transaction. So some DBMSs does not support >>>>>> REPEATABLE_READ, where in that case, we should be able to use >>>>>> SERIALIZABLE, >>>>>> which most of them support. >>>>>> >>>>>> Cheers, >>>>>> Anjana. >>>>>> >>>>>> On Tue, Aug 9, 2016 at 11:11 AM, Maninda Edirisooriya < >>>>>> mani...@wso2.com> wrote: >>>>>> >>>>>>> Hi Anjana, >>>>>>> >>>>>>> After having an offline chat with Asanka what I understood was that >>>>>>> the leader election was done completely via the database but with no >>>>>>> network communication. The leader is mentioned in the database first. >>>>>>> Then >>>>>>> the leader updates the node data periodically in the database. If some >>>>>>> node >>>>>>> realizes the data in the DB are outdated that means the leader was >>>>>>> disconnected. Then that node will look at the created timestamp of the >>>>>>> leader entry. If that is not very recent that means there was no new >>>>>>> leader >>>>>>> elected recently. So he will try to update the leader entry with his >>>>>>> ID. As >>>>>>> I understand there the leader entry i
Re: [Architecture] RDBMS based coordinator election algorithm for MB
Hi, On Wed, Nov 2, 2016 at 3:14 PM, Asanka Abeyweera <asank...@wso2.com> wrote: > Hi Anjana, > > Currently, the implementation is part of the MB code (not a common > component). > Okay, can we please get it as a common component. Cheers, Anjana. > > On Wed, Nov 2, 2016 at 3:00 PM, Anjana Fernando <anj...@wso2.com> wrote: > >> Hi Asanka/Ramith, >> >> So for C5 based Streaming Analytics solution, we need coordination >> functionality there as well. Is the functionality mentioned here created as >> a common component or baked in to the MB code? .. if so, can we please get >> it implemented it as a generic component, so other products can also use >> it. >> >> Cheers, >> Anjana. >> >> On Tue, Aug 9, 2016 at 3:10 PM, Anjana Fernando <anj...@wso2.com> wrote: >> >>> Great! .. >>> >>> Cheers, >>> Anjana. >>> >>> On Tue, Aug 9, 2016 at 1:49 PM, Asanka Abeyweera <asank...@wso2.com> >>> wrote: >>> >>>> Hi Anjana, >>>> >>>> Thank you for the suggestion. We have already done a similar thing. We >>>> have added a backoff time after creating the leader entry and check if the >>>> leader entry is the entry created by self before informing the leader >>>> change. >>>> >>>> On Tue, Aug 9, 2016 at 12:27 PM, Anjana Fernando <anj...@wso2.com> >>>> wrote: >>>> >>>>> I see, thanks for the clarification, looks good! .. I think small >>>>> thing to consider is, to avoid the situation where, the current leader >>>>> goes >>>>> away, and two other competes to become the leader, and the first one and >>>>> the second one checks (reads) the table to check the last heartbeat and >>>>> figures out that the leader is outdated at the same time, and then first >>>>> one delete the entry and puts his one, and after that, second one will >>>>> also >>>>> delete the existing one and put his one, so both will think they became >>>>> the >>>>> leader, due to the condition that both succeeded in adding the entry >>>>> without an error. So this can probably be fixed by checking back after a >>>>> bit of time if the current node is actually me, which probabilistically >>>>> will work well, if that time period is sufficient big enough than a >>>>> typical >>>>> database transaction required by a node to do the earlier operations. Or >>>>> else, we should make sure the database transaction level used in this >>>>> scenario is at least REPEATABLE_READ, where when we read the record, it >>>>> will lock it throughout the transaction. So some DBMSs does not support >>>>> REPEATABLE_READ, where in that case, we should be able to use >>>>> SERIALIZABLE, >>>>> which most of them support. >>>>> >>>>> Cheers, >>>>> Anjana. >>>>> >>>>> On Tue, Aug 9, 2016 at 11:11 AM, Maninda Edirisooriya < >>>>> mani...@wso2.com> wrote: >>>>> >>>>>> Hi Anjana, >>>>>> >>>>>> After having an offline chat with Asanka what I understood was that >>>>>> the leader election was done completely via the database but with no >>>>>> network communication. The leader is mentioned in the database first. >>>>>> Then >>>>>> the leader updates the node data periodically in the database. If some >>>>>> node >>>>>> realizes the data in the DB are outdated that means the leader was >>>>>> disconnected. Then that node will look at the created timestamp of the >>>>>> leader entry. If that is not very recent that means there was no new >>>>>> leader >>>>>> elected recently. So he will try to update the leader entry with his ID. >>>>>> As >>>>>> I understand there the leader entry is using the leader ID and the >>>>>> timestamp as the primary key. Even several nodes try to do it >>>>>> simultaneously only one node will successfully be able to update the >>>>>> entry >>>>>> with the help of atomicity provided by the DB. Others members will note >>>>>> the >>>>>> timestamp of the leader was updated so will accept the first one who >>>>>> upd
Re: [Architecture] RDBMS based coordinator election algorithm for MB
Hi Asanka/Ramith, So for C5 based Streaming Analytics solution, we need coordination functionality there as well. Is the functionality mentioned here created as a common component or baked in to the MB code? .. if so, can we please get it implemented it as a generic component, so other products can also use it. Cheers, Anjana. On Tue, Aug 9, 2016 at 3:10 PM, Anjana Fernando <anj...@wso2.com> wrote: > Great! .. > > Cheers, > Anjana. > > On Tue, Aug 9, 2016 at 1:49 PM, Asanka Abeyweera <asank...@wso2.com> > wrote: > >> Hi Anjana, >> >> Thank you for the suggestion. We have already done a similar thing. We >> have added a backoff time after creating the leader entry and check if the >> leader entry is the entry created by self before informing the leader >> change. >> >> On Tue, Aug 9, 2016 at 12:27 PM, Anjana Fernando <anj...@wso2.com> wrote: >> >>> I see, thanks for the clarification, looks good! .. I think small thing >>> to consider is, to avoid the situation where, the current leader goes away, >>> and two other competes to become the leader, and the first one and the >>> second one checks (reads) the table to check the last heartbeat and figures >>> out that the leader is outdated at the same time, and then first one delete >>> the entry and puts his one, and after that, second one will also delete the >>> existing one and put his one, so both will think they became the leader, >>> due to the condition that both succeeded in adding the entry without an >>> error. So this can probably be fixed by checking back after a bit of time >>> if the current node is actually me, which probabilistically will work well, >>> if that time period is sufficient big enough than a typical database >>> transaction required by a node to do the earlier operations. Or else, we >>> should make sure the database transaction level used in this scenario is at >>> least REPEATABLE_READ, where when we read the record, it will lock it >>> throughout the transaction. So some DBMSs does not support REPEATABLE_READ, >>> where in that case, we should be able to use SERIALIZABLE, which most of >>> them support. >>> >>> Cheers, >>> Anjana. >>> >>> On Tue, Aug 9, 2016 at 11:11 AM, Maninda Edirisooriya <mani...@wso2.com> >>> wrote: >>> >>>> Hi Anjana, >>>> >>>> After having an offline chat with Asanka what I understood was that the >>>> leader election was done completely via the database but with no network >>>> communication. The leader is mentioned in the database first. Then the >>>> leader updates the node data periodically in the database. If some node >>>> realizes the data in the DB are outdated that means the leader was >>>> disconnected. Then that node will look at the created timestamp of the >>>> leader entry. If that is not very recent that means there was no new leader >>>> elected recently. So he will try to update the leader entry with his ID. As >>>> I understand there the leader entry is using the leader ID and the >>>> timestamp as the primary key. Even several nodes try to do it >>>> simultaneously only one node will successfully be able to update the entry >>>> with the help of atomicity provided by the DB. Others members will note the >>>> timestamp of the leader was updated so will accept the first one who >>>> updates as the leader. Even after the leader is elected, the leader will >>>> only notify node data via updating DB instead of network calls. Other nodes >>>> will just observe it and check the latest timestmps of the entry. >>>> >>>> >>>> *Maninda Edirisooriya* >>>> Senior Software Engineer >>>> >>>> *WSO2, Inc.*lean.enterprise.middleware. >>>> >>>> *Blog* : http://maninda.blogspot.com/ >>>> *E-mail* : mani...@wso2.com >>>> *Skype* : @manindae >>>> *Twitter* : @maninda >>>> >>>> On Tue, Aug 9, 2016 at 10:13 AM, Anjana Fernando <anj...@wso2.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I just noticed this thread. I've some concerns on this >>>>> implementations. First of all, I don't think the statement mentioned here >>>>> saying an external service such as ZooKeeper doesn't work, is correct. >>>>> Because, if you have a ZK cluster (it is suppose to be used as a cluster), >>>>> you will not have a
Re: [Architecture] Multidimensional Space Search with Lucene 6 - Possible Scenarios and the API
e specified >as polygons. >- Get the number of points in each bucket where buckets are specified >by the distance from a given location. > > * Composite polygons are possible. > *Scenarios* > > *Airport Scenario * > If we index the set of airports in the world as GeoPoints. Following > queries are possible examples. (Here is the test code I implemented as an > example.) > <https://github.com/janakact/test_lucene/blob/master/src/test/java/TestMultiDimensionalQueries.java> > >- Find closest set of airports to a given town. >- Find the set of airports within a given radius from a particular >town. >- Find the set of airports inside a country. (Country can be given as >a polygon) >- Find the set of airports within a given range of Latitudes and >Longitudes. It is a Latitude, Longitude box query. (For a examples: >Airports closer to the equatorial) >- Find the set of airports closer to a given path. (Path can be >something like a road. Find the airports which are less than 50km away from >a given highway) >- Count the airports in each country by giving country maps as >polygons. > > *Indexing airplane paths* > >- It is possible to query for paths which goes through an interesting >area. > > Above example covers most of the functionalities that Lucene Space search > provides. > Here are some other examples, > >- Number of television users a satellite can cover.(by indexing >receivers' locations) >- To find the number of stationary telescopes that can be used to >observe a solar eclipse. (by indexing telescope locations. Area the solar >eclipse is visible, can be represented as a polygon >http://eclipse.gsfc.nasa.gov/SEplot/SEplot2001/SE2016Sep01A.GIF ><http://eclipse.gsfc.nasa.gov/SEplot/SEplot2001/SE2016Sep01A.GIF>) > > So, that's it. > Thank you. > > Regards, > Janaka Chathuranga > > -- > Janaka Chathuranga > *Software Engineering Intern* > Mobile : *+94 (**071) 3315 725* > jana...@wso2.com > > <https://wso2.com/signature> > > -- *Anjana Fernando* Associate Director / Architect WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] DB event listener for ESB
t;>>> LinkedIn : http://www.linkedin.com/pub/malaka-silva/6/33/77 >>>> Blog : http://mrmalakasilva.blogspot.com/ >>>> >>>> WSO2, Inc. >>>> lean . enterprise . middleware >>>> https://wso2.com/signature >>>> http://www.wso2.com/about/team/malaka-silva/ >>>> <http://wso2.com/about/team/malaka-silva/> >>>> https://store.wso2.com/store/ >>>> >>>> Don't make Trees rare, we should keep them with care >>>> >>>> ___ >>>> Architecture mailing list >>>> Architecture@wso2.org >>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>> >>>> >>> >>> >>> -- >>> >>> Srinath Perera, Ph.D. >>>http://people.apache.org/~hemapani/ >>>http://srinathsview.blogspot.com/ >>> >>> ___ >>> Architecture mailing list >>> Architecture@wso2.org >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>> >>> >> >> ___ >> Architecture mailing list >> Architecture@wso2.org >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > > -- > > Srinath Perera, Ph.D. >http://people.apache.org/~hemapani/ >http://srinathsview.blogspot.com/ > > ___ > Architecture mailing list > Architecture@wso2.org > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- *Anjana Fernando* Associate Director / Architect WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] How do we get DAS server location?
chat earlier, the initial plan is to locate >>>> the Thrift endpoint through mDNS service discovery, considering the host >>>> and port first. >>>> >>>> I have used the JmDNS library pointed by Nirmal to do a PoC on this >>>> scenario, and I've also already incorporated the logic into the databridge >>>> Thrift server to enable service registration through a system property the >>>> users could set (-DenableDiscovery). The corresponding client code goes >>>> into the publisher OSGi service initialisation. This too is controllable by >>>> the same system property the user could set on the Thrift client (which >>>> will be the product talking to DAS/CEP). >>>> >>>> I'm doing some testing on the entire scenario, and once completed, I'll >>>> commit the changes into the relevant repos and send an update to this >>>> thread. >>>> >>>> Thanks, >>>> >>>> >>>> On Thursday, 30 June 2016, Srinath Perera <srin...@wso2.com> wrote: >>>> >>>>> Resending as it hits a filter rule. >>>>> >>>>> Gokul, please give an update on this? >>>>> >>>>> --Srinath >>>>> >>>> >>>> >>>> -- >>>> Gokul Balakrishnan >>>> Senior Software Engineer, >>>> WSO2, Inc. http://wso2.com >>>> M +94 77 5935 789 | +44 7563 570502 >>>> >>>> >>>> >>> >>> >>> -- >>> >>> Srinath Perera, Ph.D. >>>http://people.apache.org/~hemapani/ >>>http://srinathsview.blogspot.com/ >>> >> >> >> >> -- >> Gokul Balakrishnan >> Senior Software Engineer, >> WSO2, Inc. http://wso2.com >> M +94 77 5935 789 | +44 7563 570502 >> >> > > > -- > Gokul Balakrishnan > Senior Software Engineer, > WSO2, Inc. http://wso2.com > M +94 77 5935 789 | +44 7563 570502 > > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] In-Tenant Data restriction in the DAS platform
Anyways, yet again, as per the other discussions we had also, we cannot promise on these features, we would have to carefully check on the feasibility and make the decisions. Cheers, Anjana. On Thu, Jun 30, 2016 at 2:24 PM, Anjana Fernando <anj...@wso2.com> wrote: > Hi Dulitha, > > Your points are valid, we will check on these for an upcoming release, > most probably DAS v3.2.0, we just have to carefully check for all the > scenarios on how this will work out, there can be some scenarios that can > be tricky, but we should be able to figure them out. > > Cheers, > Anjana. > > On Thu, Jun 30, 2016 at 12:40 PM, Sinthuja Ragendran <sinth...@wso2.com> > wrote: > >> Hi Dulitha, >> >> On Wed, Jun 29, 2016 at 10:24 PM, Dulitha Wijewantha <duli...@wso2.com> >> wrote: >> >>> Hi guys, >>> Below are somethings I noted when I was writing dashboards for an >>> analytics solution. >>> >>> 1) oAuth protected APIs should be used to retrieve data for gadgets >>> >>> 2) There should be a way to restrict data for users inside a tenant >>> >> >> +1 for above two. And I too think we should bring more fine grained >> authorization model for DAS layer, at least in the table/stream level such >> that only role-A should be able to access it not all. And again there could >> be different level of access per stream/table, some users can only fetch >> the data, some can only send, and only some can delete it. >> >> We had similar requirement on dashboard server to protect a dashboard, >> and then we came up with a model to create some internal roles per >> dashboard during the dashboard creation time, and assign the user who is >> creating the dashboard for those internal role by default. Hence only >> he/she can perform any actions on the dashboard and it's private for >> him/her. If the user would like to share the dashboard, then he/she assign >> users independently for the internal roles created or assign a new role for >> the particular action. >> >> I think similarly we can handle for the tables as well. >> >>> >>> 3) If the user doesn't have authorization to view the data - he >>> shouldn't be able to view the corresponding visualization on the dashboard >>> server and vice versa. >>> >> >> This is bit tricky, as the authorization from dashboard page is something >> only required if there are any analytics related gadgets have been included >> in the dashboard page, and for others this will not be an issue. We need to >> properly handle this case if we include such feature. >> >> Thanks, >> Sinthuja. >> >> >>> >>> Cheers~ >>> -- >>> Dulitha Wijewantha (Chan) >>> Software Engineer - Mobile Development >>> WSO2 Inc >>> Lean.Enterprise.Middleware >>> * ~Email duli...@wso2.com <duli...@wso2mobile.com>* >>> * ~Mobile +94712112165 <%2B94712112165>* >>> * ~Website dulitha.me <http://dulitha.me>* >>> * ~Twitter @dulitharw <https://twitter.com/dulitharw>* >>> *~Github @dulichan <https://github.com/dulichan>* >>> *~SO @chan <http://stackoverflow.com/users/813471/chan>* >>> >> >> >> >> -- >> *Sinthuja Rajendran* >> Technical Lead >> WSO2, Inc.:http://wso2.com >> >> Blog: http://sinthu-rajan.blogspot.com/ >> Mobile: +94774273955 >> >> >> > > > -- > *Anjana Fernando* > Senior Technical Lead > WSO2 Inc. | http://wso2.com > lean . enterprise . middleware > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] In-Tenant Data restriction in the DAS platform
Hi Dulitha, Your points are valid, we will check on these for an upcoming release, most probably DAS v3.2.0, we just have to carefully check for all the scenarios on how this will work out, there can be some scenarios that can be tricky, but we should be able to figure them out. Cheers, Anjana. On Thu, Jun 30, 2016 at 12:40 PM, Sinthuja Ragendran <sinth...@wso2.com> wrote: > Hi Dulitha, > > On Wed, Jun 29, 2016 at 10:24 PM, Dulitha Wijewantha <duli...@wso2.com> > wrote: > >> Hi guys, >> Below are somethings I noted when I was writing dashboards for an >> analytics solution. >> >> 1) oAuth protected APIs should be used to retrieve data for gadgets >> >> 2) There should be a way to restrict data for users inside a tenant >> > > +1 for above two. And I too think we should bring more fine grained > authorization model for DAS layer, at least in the table/stream level such > that only role-A should be able to access it not all. And again there could > be different level of access per stream/table, some users can only fetch > the data, some can only send, and only some can delete it. > > We had similar requirement on dashboard server to protect a dashboard, and > then we came up with a model to create some internal roles per dashboard > during the dashboard creation time, and assign the user who is creating the > dashboard for those internal role by default. Hence only he/she can perform > any actions on the dashboard and it's private for him/her. If the user > would like to share the dashboard, then he/she assign users independently > for the internal roles created or assign a new role for the particular > action. > > I think similarly we can handle for the tables as well. > >> >> 3) If the user doesn't have authorization to view the data - he shouldn't >> be able to view the corresponding visualization on the dashboard server and >> vice versa. >> > > This is bit tricky, as the authorization from dashboard page is something > only required if there are any analytics related gadgets have been included > in the dashboard page, and for others this will not be an issue. We need to > properly handle this case if we include such feature. > > Thanks, > Sinthuja. > > >> >> Cheers~ >> -- >> Dulitha Wijewantha (Chan) >> Software Engineer - Mobile Development >> WSO2 Inc >> Lean.Enterprise.Middleware >> * ~Email duli...@wso2.com <duli...@wso2mobile.com>* >> * ~Mobile +94712112165 <%2B94712112165>* >> * ~Website dulitha.me <http://dulitha.me>* >> * ~Twitter @dulitharw <https://twitter.com/dulitharw>* >> *~Github @dulichan <https://github.com/dulichan>* >> *~SO @chan <http://stackoverflow.com/users/813471/chan>* >> > > > > -- > *Sinthuja Rajendran* > Technical Lead > WSO2, Inc.:http://wso2.com > > Blog: http://sinthu-rajan.blogspot.com/ > Mobile: +94774273955 > > > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] [Analytics] Allowing analytics data to be published to super tenant space
Hi Amila, On Thu, Jun 23, 2016 at 11:52 AM, Amila Maha Arachchi <ami...@wso2.com> wrote: > All, > > 1. We should allow to decide whether to publish data in super tenant mode > or tenant mode > This is possible, but the problem is, it complicates the ESB analytics solution, where we will have to maintain two different versions which would implement the two scenarios. So IMO, it would be better to follow a single approach which would be overall best flexibility, which we discussed earlier, where we publish and manage data in tenants, but execute the Spark scripts in the super tenant. > 2. If its the ST mode, we deploy the car in ST space. > 3. Data gets published and stored in one table. i.e. no table per tenant > The current connectors, e.g. RDBMS / HBase etc.. use the mechanism of creating a table per analytics table/tenant. In those connectors, this behavior cannot be changed, where mainly there are technical difficulties in doing so also, when filtering out different tenant's data and all. Anyways, usually in database systems, there is no limit in the number of physical tables created and all. And also, you will not access these tables directly, but will communicate via the REST APIs if required. > 4. Spark runs against that table > With the new improvements, we anyway get similar type of an interface where the Spark script will automatically read data from all the tenants and process the data in one go. > 5. Dashboard should be a SaaS app which filters data from the analyzed > table. > I guess that maybe possible, but that will need some changes in the current dashboards. @Dunith, can you comment on this please. Also, is there any issue in deploying a dashboard per tenant?, which is the current situation. Cheers, Anjana. > > Can above be facilitated? > > Regards, > Amila. > > -- > *Amila Maharachchi* > Senior Technical Lead > WSO2, Inc.; http://wso2.com > > Blog: http://maharachchi.blogspot.com > Mobile: +94719371446 > > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Cross Tenant Data Reading from Spark Queries in DAS
Hi, I've improved the functionality here, in order to write to tenant tables also using the super tenant space. With this, I've changed the earlier flag's name to "globalTenantAccess". So with this, if you define a table with this flag enabled, when you write records to it, by looking at the incoming record's "_tenantId" field value, it will route the record to it's respective tenant's tables. Now with both read and write functionality, we can seamlessly read the data from each tenant and write the results to that tenant itself, using a script residing in the super tenant. An example on how this work is shown below:- * Create several tenant's, create a stream called "S1" in each, and add some records to each * Execute the following the super tenant:- create temporary table S1 using CarbonAnalytics options (tableName "S1", schema "name STRING, count INTEGER, _tenantId INTEGER", globalTenantAccess "true"); Reading from this "S1" table, e.g. "select * from S1" will show all the records from all the tenants. Now, create another table "S2" in super tenant space with globalTenantAccess flag:- create temporary table S2 using CarbonAnalytics options (tableName "S2", schema "name STRING, count INTEGER, _tenantId INTEGER", globalTenantAccess "true"); Now we run the command "insert into table S2 select * from S1". The above command will make the system create table S2 in all the tenants that are available (basically tenants mentioned in the data from S1's _tenantId field), and write the data to it. At the end, each tenant will have a two tables, "S1" and "S2" with identical data. This basically explains how the full data set is collected together, and how the same data, tenant wise can be written back. Cheers, Anjana. On Mon, Apr 18, 2016 at 4:55 AM, Anjana Fernando <anj...@wso2.com> wrote: > Hi Chan, > > On Mon, Apr 18, 2016 at 4:47 AM, Dulitha Wijewantha <duli...@wso2.com> > wrote: > >> >> There is a new analytics provider property introduced, which is >>> "globalTenantRead", where when this is set to "true", it will go through >>> all the tenants in aggregating records of a table named "T1" in that >>> tenant. Also a new special table schema attribute "_tenantId" is >>> introduced, which is an automatically populated value for a record based on >>> the actual origin tenant of the record. So this "_tenantId" field can be >>> used for further filtering/grouping in the Spark queries. >>> >> >> So the tenant id of the event is persisted on the record store when >> events are recieved to DAS. Does this happen through the authorization? In >> case of thrift the login username has to be prefixed with the tenant >> domain? >> > > So DAS anyway had proper tenant isolation already. And yes, it is handled > with the authorization (so yeah, the username has to have the domain also > for tenants). Where, when we are storing events, a tenant has its own space > in our data layer, and now it is just retrieving data, we just specially > expose whose tenant the record belongs to in the result, since the results > can have data from multiple tenants. > > >> >> >> Does this impact indexes that have been setup? >> > > No it does not, it does not affect the existing indexes nor the raw data > stored. > > Cheers, > Anjana. > > >> >> >>> >>> [1] https://docs.wso2.com/pages/viewpage.action?pageId=50505847 >>> [2] https://docs.wso2.com/pages/viewpage.action?pageId=50505762 >>> [3] >>> https://docs.wso2.com/display/DAS310/Spark+Query+Language#SparkQueryLanguage-WSO2DASSQLguide >>> >>> Cheers, >>> Anjana. >>> -- >>> *Anjana Fernando* >>> Senior Technical Lead >>> WSO2 Inc. | http://wso2.com >>> lean . enterprise . middleware >>> >>> ___ >>> Architecture mailing list >>> Architecture@wso2.org >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>> >>> >> >> >> -- >> Dulitha Wijewantha (Chan) >> Software Engineer - Mobile Development >> WSO2 Inc >> Lean.Enterprise.Middleware >> * ~Email duli...@wso2.com <duli...@wso2mobile.com>* >> * ~Mobile +94712112165 <%2B94712112165>* >> * ~Website dulitha.me <http://dulitha.me>* >> * ~Twitter @dulitharw <https://twitter.com/dulitharw>* >> *~Github @dulichan <https://github.com/dulichan>* >> *~SO @chan <http://stackoverflow.com/users/813471/chan>* >> >> ___ >> Architecture mailing list >> Architecture@wso2.org >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > > -- > *Anjana Fernando* > Senior Technical Lead > WSO2 Inc. | http://wso2.com > lean . enterprise . middleware > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Caching Support for Analytics Event Tables
Hi, Not sure, if it'll be easy to merge the code like that, specially considering the two implementations are very different from the code level, as in, the points used to cache the data would be a bit different. In future, a better option would be to cache the data from CEP itself, rather than from individual event table implementation. Anyways, let's check the two implementations and see, at least from the configuration level. Cheers, Anjana. On Mon, Jun 20, 2016 at 8:36 PM, Mohanadarshan Vivekanandalingam < mo...@wso2.com> wrote: > > > On Mon, Jun 20, 2016 at 8:17 PM, Sriskandarajah Suhothayan <s...@wso2.com> > wrote: > >> Is this in line with the RDBMS implementation? Else it will be confusing >> to the users. >> Shall we have a chat and merge the caching code? >> > > Yes, let's have a chat.. > >> >> @Mohan can you work with Anjana >> > > sure... > > >> >> Regards >> Suho >> >> On Mon, Jun 20, 2016 at 12:49 PM, Anjana Fernando <anj...@wso2.com> >> wrote: >> >>> Hi, >>> >>> With a chat we had with Srinath, we've decided to set the default cache >>> timeout to 10 seconds, so from this moment, it is set to 10 seconds by >>> default in the code. >>> >>> Cheers, >>> Anjana. >>> >>> On Wed, Jun 15, 2016 at 1:57 PM, Nirmal Fernando <nir...@wso2.com> >>> wrote: >>> >>>> Great! Thanks Anjana! >>>> >>>> On Wed, Jun 15, 2016 at 11:26 AM, Anjana Fernando <anj...@wso2.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> We've added the $subject. Basically, a local cache is now maintained >>>>> in each event table, where it will store the most recently used data items >>>>> in the cache, up to a certain given cache size, for a maximum given >>>>> lifetime. The format is as follows:- >>>>> >>>>> @from(eventtable = 'analytics.table' , table.name = 'name', *caching* >>>>> = 'true', *cache.timeout.seconds* = '10', *cache.size.bytes* = >>>>> '10') >>>>> >>>>> The cache.timeout.seconds and cache.size.bytes values are optional, >>>>> with default values of 60 (1 minute) and 1024 * 1024 * 10 (10 MB) >>>>> respectively. >>>>> >>>>> Also, there are some debug logs available in the component, if you >>>>> want to check for explicit cache hit/miss situations and record lookup >>>>> timing, basically enable debug logs for the class >>>>> "org.wso2.carbon.analytics.eventtable.AnalyticsEventTable". >>>>> >>>>> So basically, if you use analytics event tables in performance >>>>> sensitive areas in your CEP execution plans, do consider using caching if >>>>> it is possible to do so. >>>>> >>>>> The unit tests are updated with caching, and the updated docs can be >>>>> found here [1]. >>>>> >>>>> [1] >>>>> https://docs.wso2.com/display/DAS310/Understanding+Event+Streams+and+Event+Tables#UnderstandingEventStreamsandEventTables-AnalyticseventtableAnalyticseventtable >>>>> >>>>> Cheers, >>>>> Anjana. >>>>> -- >>>>> *Anjana Fernando* >>>>> Senior Technical Lead >>>>> WSO2 Inc. | http://wso2.com >>>>> lean . enterprise . middleware >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Thanks & regards, >>>> Nirmal >>>> >>>> Team Lead - WSO2 Machine Learner >>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>> Mobile: +94715779733 >>>> Blog: http://nirmalfdo.blogspot.com/ >>>> >>>> >>>> >>> >>> >>> -- >>> *Anjana Fernando* >>> Senior Technical Lead >>> WSO2 Inc. | http://wso2.com >>> lean . enterprise . middleware >>> >> >> >> >> -- >> >> *S. Suhothayan* >> Technical Lead & Team Lead of WSO2 Complex Event Processor >> *WSO2 Inc. *http://wso2.com >> * <http://wso2.com/>* >> lean . enterprise . middleware >> >> >> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog: >> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/>twitter: >> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | linked-in: >> http://lk.linkedin.com/in/suhothayan <http://lk.linkedin.com/in/suhothayan>* >> > > > > -- > *V. Mohanadarshan* > *Associate Tech Lead,* > *Data Technologies Team,* > *WSO2, Inc. http://wso2.com <http://wso2.com> * > *lean.enterprise.middleware.* > > email: mo...@wso2.com > phone:(+94) 771117673 > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Caching Support for Analytics Event Tables
Hi, With a chat we had with Srinath, we've decided to set the default cache timeout to 10 seconds, so from this moment, it is set to 10 seconds by default in the code. Cheers, Anjana. On Wed, Jun 15, 2016 at 1:57 PM, Nirmal Fernando <nir...@wso2.com> wrote: > Great! Thanks Anjana! > > On Wed, Jun 15, 2016 at 11:26 AM, Anjana Fernando <anj...@wso2.com> wrote: > >> Hi, >> >> We've added the $subject. Basically, a local cache is now maintained in >> each event table, where it will store the most recently used data items in >> the cache, up to a certain given cache size, for a maximum given lifetime. >> The format is as follows:- >> >> @from(eventtable = 'analytics.table' , table.name = 'name', *caching* = >> 'true', *cache.timeout.seconds* = '10', *cache.size.bytes* = '10') >> >> The cache.timeout.seconds and cache.size.bytes values are optional, with >> default values of 60 (1 minute) and 1024 * 1024 * 10 (10 MB) respectively. >> >> Also, there are some debug logs available in the component, if you want >> to check for explicit cache hit/miss situations and record lookup timing, >> basically enable debug logs for the class >> "org.wso2.carbon.analytics.eventtable.AnalyticsEventTable". >> >> So basically, if you use analytics event tables in performance sensitive >> areas in your CEP execution plans, do consider using caching if it is >> possible to do so. >> >> The unit tests are updated with caching, and the updated docs can be >> found here [1]. >> >> [1] >> https://docs.wso2.com/display/DAS310/Understanding+Event+Streams+and+Event+Tables#UnderstandingEventStreamsandEventTables-AnalyticseventtableAnalyticseventtable >> >> Cheers, >> Anjana. >> -- >> *Anjana Fernando* >> Senior Technical Lead >> WSO2 Inc. | http://wso2.com >> lean . enterprise . middleware >> > > > > -- > > Thanks & regards, > Nirmal > > Team Lead - WSO2 Machine Learner > Associate Technical Lead - Data Technologies Team, WSO2 Inc. > Mobile: +94715779733 > Blog: http://nirmalfdo.blogspot.com/ > > > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
[Architecture] Caching Support for Analytics Event Tables
Hi, We've added the $subject. Basically, a local cache is now maintained in each event table, where it will store the most recently used data items in the cache, up to a certain given cache size, for a maximum given lifetime. The format is as follows:- @from(eventtable = 'analytics.table' , table.name = 'name', *caching* = 'true', *cache.timeout.seconds* = '10', *cache.size.bytes* = '10') The cache.timeout.seconds and cache.size.bytes values are optional, with default values of 60 (1 minute) and 1024 * 1024 * 10 (10 MB) respectively. Also, there are some debug logs available in the component, if you want to check for explicit cache hit/miss situations and record lookup timing, basically enable debug logs for the class "org.wso2.carbon.analytics.eventtable.AnalyticsEventTable". So basically, if you use analytics event tables in performance sensitive areas in your CEP execution plans, do consider using caching if it is possible to do so. The unit tests are updated with caching, and the updated docs can be found here [1]. [1] https://docs.wso2.com/display/DAS310/Understanding+Event+Streams+and+Event+Tables#UnderstandingEventStreamsandEventTables-AnalyticseventtableAnalyticseventtable Cheers, Anjana. -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] What should be the default MySQL engine to be used in DAS?
>>>>>>>> incoming records and processed records, i.e., EVENT_STORE and >>>>>>>>>>> PROCESSED_DATA_STORE. >>>>>>>>>>> >>>>>>>>>>> For ESB Analytics, we can configure to use MyISAM for >>>>>>>>>>> EVENT_STORE and InnoDB for PROCESSED_DATA_STORE. It is because in >>>>>>>>>>> ESB >>>>>>>>>>> analytics, summarizing up to minute level is done by real time >>>>>>>>>>> analytics >>>>>>>>>>> and Spark queries will read and process data using minutely (and >>>>>>>>>>> higher) >>>>>>>>>>> tables which we can keep in PROCESSED_DATA_STORE. Since raw >>>>>>>>>>> table(which >>>>>>>>>>> data receiver writes data) is not being used by Spark queries, the >>>>>>>>>>> receiver >>>>>>>>>>> performance will not be affected. >>>>>>>>>>> >>>>>>>>>>> However, in most cases, Spark queries may written to read data >>>>>>>>>>> directly from raw tables. As mentioned above, with MyISAM this >>>>>>>>>>> could lead >>>>>>>>>>> to performance issues if data publishing and spark analytics >>>>>>>>>>> happens in >>>>>>>>>>> parallel. So considering that I think we should change the default >>>>>>>>>>> configuration to use InnoDB. WDYT? >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Thanks & Regards, >>>>>>>>>>> >>>>>>>>>>> Inosh Goonewardena >>>>>>>>>>> Associate Technical Lead- WSO2 Inc. >>>>>>>>>>> Mobile: +94779966317 >>>>>>>>>>> >>>>>>>>>>> ___ >>>>>>>>>>> Architecture mailing list >>>>>>>>>>> Architecture@wso2.org >>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ___ >>>>>>>>>> Architecture mailing list >>>>>>>>>> Architecture@wso2.org >>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> W.G. Gihan Anuruddha >>>>>>>>> Senior Software Engineer | WSO2, Inc. >>>>>>>>> M: +94772272595 >>>>>>>>> >>>>>>>>> ___ >>>>>>>>> Architecture mailing list >>>>>>>>> Architecture@wso2.org >>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Thanks & Regards, >>>>>>>> >>>>>>>> Inosh Goonewardena >>>>>>>> Associate Technical Lead- WSO2 Inc. >>>>>>>> Mobile: +94779966317 >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Thanks & regards, >>>>>>> Nirmal >>>>>>> >>>>>>> Team Lead - WSO2 Machine Learner >>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>>>>> Mobile: +94715779733 >>>>>>> Blog: http://nirmalfdo.blogspot.com/ >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Thanks & Regards, >>>>>> >>>>>> Inosh Goonewardena >>>>>> Associate Technical Lead- WSO2 Inc. >>>>>> Mobile: +94779966317 >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Thanks & regards, >>>>> Nirmal >>>>> >>>>> Team Lead - WSO2 Machine Learner >>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>>> Mobile: +94715779733 >>>>> Blog: http://nirmalfdo.blogspot.com/ >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Thanks & Regards, >>>> >>>> Inosh Goonewardena >>>> Associate Technical Lead- WSO2 Inc. >>>> Mobile: +94779966317 >>>> >>> >>> >>> >>> -- >>> >>> Thanks & regards, >>> Nirmal >>> >>> Team Lead - WSO2 Machine Learner >>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>> Mobile: +94715779733 >>> Blog: http://nirmalfdo.blogspot.com/ >>> >>> >>> >>> ___ >>> Architecture mailing list >>> Architecture@wso2.org >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>> >>> >> >> >> -- >> Gokul Balakrishnan >> Senior Software Engineer, >> WSO2, Inc. http://wso2.com >> M +94 77 5935 789 | +44 7563 570502 >> >> >> ___ >> Architecture mailing list >> Architecture@wso2.org >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > > -- > > Thanks & regards, > Nirmal > > Team Lead - WSO2 Machine Learner > Associate Technical Lead - Data Technologies Team, WSO2 Inc. > Mobile: +94715779733 > Blog: http://nirmalfdo.blogspot.com/ > > > > ___ > Architecture mailing list > Architecture@wso2.org > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] How do we get DAS server location?
Hi Srinath, Yeah, we were doing some work on this, first Malith, and then Gokul. But due to other priorities we had with the work, we couldn't work on it much. And we were faced with some issues in how the configuration was done with it. Where we were thinking of doing some further discussions with it, on how practical it would be. For example, if we auto discover servers, we would only get the server locations, and obviously not the user credentials to talk with those server. We can only maybe provide default admin credentials we put to the server out of the box, and make the user edit a configuration file, which will make the concept of making the setup easier diminish a bit. Anyways, @Gokul, can you please give an update onto which extent we did this work. Cheers, Anjana. On Mon, May 30, 2016 at 10:48 AM, Srinath Perera <srin...@wso2.com> wrote: > Anjana, Have we done this? > > I think Gokul started working on this. > > --Srinath > > On Sat, Feb 20, 2016 at 6:03 PM, Nirmal Fernando <nir...@wso2.com> wrote: > >> There's a library called JmDNS http://jmdns.sourceforge.net/index.html >> which would probably help us here. >> >> JmDNS is a Java implementation of multi-cast DNS and can be used for >> service registration and discovery in local area networks. JmDNS is fully >> compatible with Apple's Bonjour >> <http://developer.apple.com/macosx/rendezvous/>. >> >> The Zeroconf <http://www.zeroconf.org/> working group is working towards >> zero configuration IP networking. Multi-cast DNS >> <http://www.multicastdns.org/> and DNS service discovery >> <http://www.dns-sd.org/> provide a convient ways for devices and >> services to register themselves, and to discover other network-based >> services without relying on centrally administered services. >> >> Java as a language is not appropriate for low-level network >> configuration, but it is very useful for service registration and >> discovery. JmDNS provides easy-to-use pure-Java mDNS implementation that >> runs on most JDK1.6 compatible VMs. >> >> The code is released under the Apache 2.0 license so that it can be >> freely incorporated into other products and services. >> >> On Sat, Feb 20, 2016 at 10:11 AM, Sanjiva Weerawarana <sanj...@wso2.com> >> wrote: >> >>> No no not using Hazelcast >>> >>> On Sat, Feb 20, 2016 at 10:07 AM, Srinath Perera <srin...@wso2.com> >>> wrote: >>> >>>> Hi Sanjiva, >>>> >>>> I think we did though Hazelcast. AFAIK, we have not done it for DAS >>>> discovery yet. >>>> >>>> If we use Hazelcast, it is trivial to do. But that will add Hazelcast >>>> to all our products. ( or Maybe we can find and borrow that part of the >>>> code). >>>> >>>> --Srinath >>>> >>>> On Sat, Feb 20, 2016 at 10:00 AM, Sanjiva Weerawarana <sanj...@wso2.com >>>> > wrote: >>>> >>>>> Guys we also need the servers to discover each other when on the same >>>>> machine or LAN. Have we done that yet? That's very easy to do [1] and IIRC >>>>> we used it before for something. >>>>> >>>>> [1] https://en.wikipedia.org/wiki/Zero-configuration_networking >>>>> >>>>> On Fri, Feb 19, 2016 at 7:05 PM, Malith Dhanushka <mal...@wso2.com> >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Fri, Feb 19, 2016 at 5:00 PM, Anjana Fernando <anj...@wso2.com> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> On Fri, Feb 19, 2016 at 4:54 PM, Srinath Perera <srin...@wso2.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Kasun, Nuwan >>>>>>>> >>>>>>>> All product needs to get DAS server location from one place. >>>>>>>> >>>>>>>> >>>>>>>>1. Do we have a place for that? Otherwise, we need something >>>>>>>>like conf/das-client.xml and create a component to read it and use >>>>>>>> it with >>>>>>>>API and ESB when they want to send events to DAS ( Anjana, can ESB >>>>>>>>analytics guys do it?) >>>>>>>> >>>>>>>> Yeah, we can check on that. As I remember, there were some >>>>>>> discussions
Re: [Architecture] What should be the default MySQL engine to be used in DAS?
Hi, So actually, we need to solve the case of, even though by default, we can use the "write_read_optimized" mode of the record store (which will automatically switch the queries used to create the database tables from the templates), but for some cases, the default event store we use, we need it to run in "write_optimized" mode, (where in MySQL, it uses MyISAM), for example, in ESB analytics case, for the raw event storing, we can use this, since there aren't many continuous reads done on it, like running a Spark job on it (it's done by CEP now). So if someone installs ESB analytics features to a base DAS distribution, as of now, it will be using the "EVENT_STORE" record store, which is by default set to "write_read_optimized" mode. So what I suggest is, creating two record stores to represent the current single "EVENT_STORE" one, where we can say like, "EVENT_STORE_WO" and "EVENT_STORE_WRO", which would represent "write_optimized" and "write_read_optimized" backed configurations, ("PROCESSED_STORE" will anyway be "write_read_optimized"). So in a MySQL setup, this would actually come into affect, when creating database tables, in a setup like HBase, the data source would possibly be pointing to a single database server, and same type of tables will be created. So basically what we achieve at the end is, we can write all our analytics scenarios in a portable way, without worrying about the behavior of the data storing/retrieval, as long as, we use the default record store names, which comes with a typical DAS, and only data source level changes would be done when needed. P.S. Also can we rename "write_read_optimized" in the configurations to "read_write_optimized", where the second one is more natural. Cheers, Anjana. On Wed, May 25, 2016 at 8:10 PM, Inosh Goonewardena <in...@wso2.com> wrote: > Hi, > > At the moment DAS support both MyISAM and InnoDB, but configured to use > MyISAM by default. > > There are several differences between MYISAM and InnoDB, but what is most > relevant with regard to DAS is the difference in concurrency. Basically, > MyISAM uses table-level locking and InnoDB uses row-level locking. So, with > MyISAM, if we are running Spark queries while publishing data to DAS, in > higher TPS it can lead to issues due to the inability of obtaining the > table lock by DAL layer to insert data to the table while Spark reading > from the same table. > > However, on the other hand, with InnoDB write speed is considerably slow > (because it is designed to support transactions), so it will affect the > receiver performance. > > One option we have in DAS is, we can use two DBs to to keep incoming > records and processed records, i.e., EVENT_STORE and PROCESSED_DATA_STORE. > > For ESB Analytics, we can configure to use MyISAM for EVENT_STORE and > InnoDB for PROCESSED_DATA_STORE. It is because in ESB analytics, > summarizing up to minute level is done by real time analytics and Spark > queries will read and process data using minutely (and higher) tables which > we can keep in PROCESSED_DATA_STORE. Since raw table(which data receiver > writes data) is not being used by Spark queries, the receiver performance > will not be affected. > > However, in most cases, Spark queries may written to read data directly > from raw tables. As mentioned above, with MyISAM this could lead to > performance issues if data publishing and spark analytics happens in > parallel. So considering that I think we should change the default > configuration to use InnoDB. WDYT? > > -- > Thanks & Regards, > > Inosh Goonewardena > Associate Technical Lead- WSO2 Inc. > Mobile: +94779966317 > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] [Analytics] Removing FACET from Indexing data types
On Fri, Apr 22, 2016 at 12:30 AM, Gimantha Bandara <giman...@wso2.com> wrote: > Hi Isuru, > > Older FACET keyword is also supported. Yes, we are planing to add -f to > denote facet attribute. > > @Anjana/Niranda WDYT? > +1. Cheers, Anjana. > > > On Friday, April 22, 2016, Isuru Wijesinghe <isur...@wso2.com> wrote: > >> Hi Gimantha, >> >> How can we denote a given field in any data type as a facet in >> *spark-sql.* Lets say as an example I have a field called >> processDefinitionId (string data-type) and I need to define it as a facet >> as well (see below example). >> >> CREATE TEMPORARY TABLE PROCESS_USAGE_SUMMARY USING CarbonAnalytics >> OPTIONS (tableName "PROCESS_USAGE_SUMMARY_DATA", >> schema "processDefinitionId string -i *-f*, >> processVersion string -i, >> processInstanceId string -i,, >> primaryKeys "processInstanceId" >> ); >> >> is this the way that we can define it in newer version ? >> >> >> On Fri, Apr 22, 2016 at 2:39 AM, Gimantha Bandara <giman...@wso2.com> >> wrote: >> >>> Hi all, >>> >>> We are planning to remove "FACET" (this type is used to >>> categorize/group, to get unique values and to drill-down) from indexing >>> data types and we will introduce an attribute to mark other data types as a >>> FACET or not. Earlier FACETs can be defined only for STRING fields and >>> even if we define a STRING as a FACET, then we will not be able to search >>> it as a STRING field. With this change, any data type field can be marked >>> as a FACET and then the field can be used as a FACET and as the usual data >>> type as well. >>> This change will not affect the older DAS capps or event-store >>> configurations; It will be backward compatible with previous DAS versions >>> (3.0.0 and 3.0.1). However if you try to get the Schema of a table using JS >>> APIs, REST APIs or the Webservice, FACET type will not be there. A >>> attribute called "isFacet" is used to identify the FACETed fields. See >>> below for an example. >>> >>> >>> >>> *Older schema* >>> { >>> "columns" : { >>>"logFile" : { "type" : "STRING", "isIndex" : true, >>> "isScoreParam" : false }, >>>"level" : { "type" : "DOUBLE", "isIndex" : true, >>> "isScoreParam" : false }, >>>"location" : { "type" : "FACET", "isIndex" : true, >>> "isScoreParam" : false } }, >>> "primaryKeys" : ["logFile", "level"] >>> } >>> >>> >>> *Equivalent new schema* >>> >>> >>> *{ "columns" : { "logFile" : { "type" : "STRING", >>> "isIndex" : true, "isScoreParam" : false, **, isFacet : *false >>> * }, "*level*" : { "type" : "DOUBLE", "isIndex" : true, >>> "isScoreParam" : false, **, isFacet : *false* },* >>> * "location" : { "type" : "*STRING*", "isIndex" : true, >>> "isScoreParam" : false, isF*acet : true >>> * } },//FACET field is removed "primaryKeys" : ["logFile", "* >>> level >>> >>> >>> *"] }* >>> -- >>> >>> >>> ___ >>> Architecture mailing list >>> Architecture@wso2.org >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>> >>> >> >> >> -- >> Isuru Wijesinghe >> *Software Engineer* >> WSO2 inc : http://wso2.com >> lean.enterprise.middleware >> Mobile: 0710933706 >> isur...@wso2.com >> > > > -- > Gimantha Bandara > Software Engineer > WSO2. Inc : http://wso2.com > Mobile : +94714961919 > > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] [analytics-esb] Summary Stat Generation Mechanism
Hi, Good progress Supun! .. do keep pushing the parameters to find the limits we can go to. @Suho, the idea was to all together eliminate the batch script and just store/index the data for later lookup, and do the computation purely in Siddhi. I don't think we will get a big scaling problem, since the data needs to be stored in-memory when we go to upper layers of summarization is smaller, and stops at yearly granularity. So it would be at that time, we having data in-memory for last years worth of data, in a way of last 12 records of summary data for 12 months for a specific artifact, last day's worth, that is 30 entries etc.. so growing of data slows immensely, and also it has a upper limit, which I guess should be comfortability within usual memory capacity. So if we can get a proper checkpoint and replay mechanism figured out for data processed, we can do all the things in CEP, then we just don't have the complexity of maintaining two mechanism of doing the processing. Cheers, Anjana. On Wed, Apr 20, 2016 at 12:11 PM, Sriskandarajah Suhothayan <s...@wso2.com> wrote: > I think it will make more sense to run seconds and minutes from siddhi, > and run the spark every hour, when there are lots of date on the system > this will be much more scalable. > > WDYT? > > Regards > Suho > > On Wed, Apr 20, 2016 at 11:50 AM, Supun Sethunga <sup...@wso2.com> wrote: > >> Hi, >> >> This is a follow-up mail of [1], to give an update on the status with the >> performance issue [2] . So as mentioned in the previous mail, with >> Spark-script doing the summary stat generation as a batch process, creates >> a bottleneck at a higher TPS. More precisely, with our findings, it cannot >> handle a throughput of more than 30 TPS as a batch process. (i.e: events >> published to DAS within 10 mins with a TPS of 30, take more than 10 mins to >> process. Means, if we schedule a script every 10 mins, the events to be >> processed grows over time). >> >> To overcome this, thought of doing the summarizing up to a certain extent >> (upto second-wise summary) using siddhi, and to generate remaining >> stats (per-minute/hour/day/month), using spark. With this enhancement, ran >> some load tests locally to evaluate this approach, and the results are as >> follows. >> >> Backend DB : MySQL >> ESB analytics nodes: 1 >> >> With InnoDB >> >>- With *80 TPS*: (script scheduled every 1 min) : Avg time taken for >>completion of the script = ~ *20 sec*. >>- With* 500 TPS* (script scheduled every 2 min) : Avg time taken for >>completion of the script = ~ *45 sec*. >> >> >> With MyISAM >> >>- With *80 TPS* (script scheduled every 1 min) : Avg time taken for >>completion of the script = ~ *24 sec*. >>- With *80 TPS *(script scheduled every 2 min) : Avg time taken for >>completion of the script = ~ *20 sec*. >>- With *500 TPS* (script scheduled every 2 min) : Avg time taken for >>completion of the script = ~ *35 sec*. >> >> As a further improvement, we would be trying out to do summarizing upto >> minute/hour level (eventually do all the summarizing using siddhi). >> >> [1] [Dev] ESB Analytics - Verifying the common production use cases >> [2] https://wso2.org/jira/browse/ANLYESB-15 >> >> Thanks, >> Supun >> >> -- >> *Supun Sethunga* >> Software Engineer >> WSO2, Inc. >> http://wso2.com/ >> lean | enterprise | middleware >> Mobile : +94 716546324 >> >> ___ >> Architecture mailing list >> Architecture@wso2.org >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > > -- > > *S. Suhothayan* > Technical Lead & Team Lead of WSO2 Complex Event Processor > *WSO2 Inc. *http://wso2.com > * <http://wso2.com/>* > lean . enterprise . middleware > > > *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog: > http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/>twitter: > http://twitter.com/suhothayan <http://twitter.com/suhothayan> | linked-in: > http://lk.linkedin.com/in/suhothayan <http://lk.linkedin.com/in/suhothayan>* > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
[Architecture] Cross Tenant Data Reading from Spark Queries in DAS
Hi, We've implemented an approach to read data from all the tenants in the system by the super tenant, where the table read from the tenants should have the same table name. So now, with the following syntax, you will be given an aggregated view of all the data records from all the tenants. create temporary table T1 using CarbonAnalytics OPTIONS (tableName "T1", schema "d1 int, d2 string, _tenantId int", globalTenantRead "true"); There is a new analytics provider property introduced, which is "globalTenantRead", where when this is set to "true", it will go through all the tenants in aggregating records of a table named "T1" in that tenant. Also a new special table schema attribute "_tenantId" is introduced, which is an automatically populated value for a record based on the actual origin tenant of the record. So this "_tenantId" field can be used for further filtering/grouping in the Spark queries. With this new feature, there is a change in the way DAS stores the metadata of each analytics table. So because of this, there is a migration step when going from DAS v3.0.x to v3.1.0+. Since it is just a table metadata format change, not data itself, the migration process is a very quick one. The migration process has been incorporated to the DAS data backup tool [1], and the migration guide in the docs are updated here [2], and general docs on $subject is updated here [3]. [1] https://docs.wso2.com/pages/viewpage.action?pageId=50505847 [2] https://docs.wso2.com/pages/viewpage.action?pageId=50505762 [3] https://docs.wso2.com/display/DAS310/Spark+Query+Language#SparkQueryLanguage-WSO2DASSQLguide Cheers, Anjana. -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Support Efficient Cross Tenant Analytics in DAS
Hi Srinath, I'm not sure if this is something we would have to "fix". It was a clear design decision we took in order to isolate the tenant data, in order for others not to access other tenant's data. So also in Spark virtual tables, it will directly map to their own analytics tables. If we allow, maybe the super tenant, to access other tenant's data, it can be seen as a security threat. The idea should be, no single tenant should have any special access to other tenant's data. So setting aside the physical representation (which has other complications, like adding another index for tenantId and so on, which should be supported by all data sources), if we are to do this, we need a special view for super tenant tables in Spark virtual tables, in order for them to have access to the "tenantId" property of that table. And in other tenant's tables, we need to hide this, and not let them use it of course. This looks like bit of a hack to implement a specific scenario we have. So this requirement as I know mainly came from APIM analytics, where its in-built analytics publishes all tenant's data to super tenant's tables and the data is processed from there. So if we are doing this, this data is only used internally, and cannot be shown to each respective tenants for their own analytics. If each tenant needs to do their own analytics, they should configure to get data for their tenant space, and write their own analytics scripts. This may at the end mean, some type of data duplication, but it should happen, because two different users are doing their different processing. And IMO, we should not try to share any possible common data they may have and hack the system. At the end, the point is, we should not take lightly what we try to achieve in having multi-tenancy, and compromise its fundamentals. At the moment, the idea should be, each tenant would have their own data, its own analytics scripts, and if you need to scale accordingly, have separate hardware for those tenants. And running separate queries for different tenants does not necessarily make it very slow, since the data load will be divided between the tenants, and only extra processing would be possible ramp up times for query executions. Cheers, Anjana. On Thu, Mar 31, 2016 at 11:45 AM, Srinath Perera <srin...@wso2.com> wrote: > Hi Anjana, > > Currently we keep different Hbase/ RDBMS table per tenant. In > multi-tenant, environment, this is very expensive as we will have to run a > query per tenant. > > How can we fix this? e.g. if we keep tenant as field in the table, that > let us do a "group by". > > --Srinath > > -- > > Blog: http://srinathsview.blogspot.com twitter:@srinath_perera > Site: http://home.apache.org/~hemapani/ > Photos: http://www.flickr.com/photos/hemapani/ > Phone: 0772360902 > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] [Analytics] Improvements to Lucene based Aggregate functions (Installing Aggregates as OSGI components)
m aggregate functions through Javascript API* >>> >>> var queryInfo = { >>> tableName:"Students", //table name on which the aggregation is >>> performed >>> searchParams : { >>> groupByField:"location", //grouping field if any >>> query : "Grade:10" //additional filtering query >>> aggregateFields:[ >>> { >>> fields:["Height", "Weight"], //fields necessary for >>> aggregate function >>> aggregate:"CUSTOM_AGGREGATE", //unique name of the aggregate >>> function, this is what we return using "getAggregateName" method above. >>> alias:"aggregated_result" //Alias for the result of the >>> aggregate function >>> }] >>> } >>> } >>> >>> client.searchWithAggregates(queryInfo, function(data) { >>> console.log (data["message"]); >>> }, function(error) { >>> console.log("error occured: " + error["message"]); >>> }); >>> >>> >>> *Note that the order elements in attribute "fields" will be the same >>> order of aggregateFields parameter's element order in above process method. >>> That is Height will be aggregateFields[0] and Weight will be >>> aggregateFields[1] in process method. Based on that order, >>> "CUSTOM_AGGREGATE" should be implemented.* >>> >>> >>> >>> *Aggregates REST APIs*This is as same as the Javascript API. >>> >>> POST https://localhost:9443/analytics/aggregates >>> { >>> "tableName":"Students", >>> "groupByField":"location", >>> "aggregateFields":[ >>>{ >>> "fields":["Height", "Weight"], >>> "aggregate":"CUSTOM_AGGREGATE", >>> "alias":"aggregated_result" >>>}] >>> } >>> >>> [1] >>> https://docs.wso2.com/display/DAS301/Retrieving+Aggregated+Values+of+Given+Records+via+REST+API >>> [2] >>> https://docs.wso2.com/display/DAS301/Retrieving+Aggregated+Values+of+Given+Records+via+JS+API >>> -- >>> Gimantha Bandara >>> Software Engineer >>> WSO2. Inc : http://wso2.com >>> Mobile : +94714961919 >>> >>> ___ >>> Architecture mailing list >>> Architecture@wso2.org >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>> >>> >> >> >> -- >> *Sinthuja Rajendran* >> Associate Technical Lead >> WSO2, Inc.:http://wso2.com >> >> Blog: http://sinthu-rajan.blogspot.com/ >> Mobile: +94774273955 >> >> >> >> ___ >> Architecture mailing list >> Architecture@wso2.org >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > > -- > Gimantha Bandara > Software Engineer > WSO2. Inc : http://wso2.com > Mobile : +94714961919 > > ___ > Architecture mailing list > Architecture@wso2.org > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Embedding Log Analyzer with Analytics distribution of Products
Hi, The initial use case I had with Log Analyzer was, as a general log analysis tool, where users can just point to a log location, can be WSO2/non-WSO2 logs, and run queries against it / create dashboards. The concern I've with integrating log analyzer also with our new analytics distributions is, whether we will have some considering overlapping functionality between the two. The DAS4X analytics effort is to basically create mostly the static dashboards that would be there (maybe with alerts), which can be successfully done by internally publishing all the events required for those. But then, if we also say, you can/should use log analyzer (which is a different UI/experience altogether) to create dashboards/queries, that we missed from the earlier effort, that does not sound right. So the point is, as I see, if we do the pure DAS4X solution right for a product, they do not have an immediate need to use the log analysis features again to do any custom analysis. But of course, if they want to process the logs also nevertheless, they can setup the log analyzer product and do it, for example, as a replacement to syslog, for centralized log storage. Cheers, Anjana. On Mon, Feb 1, 2016 at 2:04 PM, Srinath Perera <srin...@wso2.com> wrote: > Hi All, > > I believe we should integrate Log Analyzer with analytics distributions of > the products. > > It is true some of the information you can take from Log analyzer is > already available under normal analytics. For those, we do not need to use > Log analyzer. > > However, log analyzer let us find and understand use cases that is not > already instrumented. For example, when we see a error, we might check has > a similar error happened before. Basically we can check ad-hoc dynamic use > cases via log analyzer. Example of this is analytics done by our Cloud > team. > > In general, log analyzer will be used by advanced users who will > understand inner workings for the product. It will be a very powerful > debugging tool. > > However, if we want to embed the log analyzer, then it is challenging due > to ruby based log stash we use with log analyzer. I think in that case, we > also need a java based log agent. > > Please comment. > > Thanks > Srinath > -- > > Blog: http://srinathsview.blogspot.com twitter:@srinath_perera > Site: http://people.apache.org/~hemapani/ > Photos: http://www.flickr.com/photos/hemapani/ > Phone: 0772360902 > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Embedding Log Analyzer with Analytics distribution of Products
On Mon, Feb 1, 2016 at 2:35 PM, Srinath Perera <srin...@wso2.com> wrote: > > > On Mon, Feb 1, 2016 at 2:21 PM, Anjana Fernando <anj...@wso2.com> wrote: > >> Hi, >> >> The initial use case I had with Log Analyzer was, as a general log >> analysis tool, where users can just point to a log location, can be >> WSO2/non-WSO2 logs, and run queries against it / create dashboards. The >> concern I've with integrating log analyzer also with our new analytics >> distributions is, whether we will have some considering overlapping >> functionality between the two. The DAS4X analytics effort is to basically >> create mostly the static dashboards that would be there (maybe with >> alerts), which can be successfully done by internally publishing all the >> events required for those. But then, if we also say, you can/should use log >> analyzer (which is a different UI/experience altogether) to create >> dashboards/queries, that we missed from the earlier effort, that does not >> sound right. >> > > Anjana, point is dynamic/ad-hoc query use cases. E.g. > 1) You see a new error, and want to check has it happend before. > 2) You see two error happening together. You need to know it has happend > together before. > True. the use cases are there. I was just thinking, if it will fit the flow with the other analytics operations we do. Anyways, on second thought, even if it's totally separate also, having searchable (analyzable) logs readily available, after we install the full analytics solution for a product, would be useful. Cheers, Anjana. > > >> >> So the point is, as I see, if we do the pure DAS4X solution right for a >> product, they do not have an immediate need to use the log analysis >> features again to do any custom analysis. But of course, if they want to >> process the logs also nevertheless, they can setup the log analyzer product >> and do it, for example, as a replacement to syslog, for centralized log >> storage. >> >> Cheers, >> Anjana. >> >> On Mon, Feb 1, 2016 at 2:04 PM, Srinath Perera <srin...@wso2.com> wrote: >> >>> Hi All, >>> >>> I believe we should integrate Log Analyzer with analytics distributions >>> of the products. >>> >>> It is true some of the information you can take from Log analyzer is >>> already available under normal analytics. For those, we do not need to use >>> Log analyzer. >>> >>> However, log analyzer let us find and understand use cases that is not >>> already instrumented. For example, when we see a error, we might check has >>> a similar error happened before. Basically we can check ad-hoc dynamic use >>> cases via log analyzer. Example of this is analytics done by our Cloud >>> team. >>> >>> In general, log analyzer will be used by advanced users who will >>> understand inner workings for the product. It will be a very powerful >>> debugging tool. >>> >>> However, if we want to embed the log analyzer, then it is challenging >>> due to ruby based log stash we use with log analyzer. I think in that case, >>> we also need a java based log agent. >>> >>> Please comment. >>> >>> Thanks >>> Srinath >>> -- >>> >>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>> Site: http://people.apache.org/~hemapani/ >>> Photos: http://www.flickr.com/photos/hemapani/ >>> Phone: 0772360902 >>> >> >> >> >> -- >> *Anjana Fernando* >> Senior Technical Lead >> WSO2 Inc. | http://wso2.com >> lean . enterprise . middleware >> > > > > -- > > Blog: http://srinathsview.blogspot.com twitter:@srinath_perera > Site: http://people.apache.org/~hemapani/ > Photos: http://www.flickr.com/photos/hemapani/ > Phone: 0772360902 > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Adding streams and scripts to DAS using an API
Great! .. Cheers, Anjana. On Thu, Jan 7, 2016 at 12:57 PM, Chathura Ekanayake <chath...@wso2.com> wrote: > Thanks Anjana. Yes, we can use admin services. > > On Thu, Jan 7, 2016 at 12:10 PM, Anjana Fernando <anj...@wso2.com> wrote: > >> Hi Chathura, >> >> We don't have like any special external APIs for this. But we do have the >> admin services that does these operations. So is it possible to use the >> admin services for these operations? .. You will of course need to store >> the credentials for these services in a configuration file in Process >> Center, and use them with the admin service calls. >> >> Cheers, >> Anjana. >> >> On Thu, Jan 7, 2016 at 11:33 AM, Chathura Ekanayake <chath...@wso2.com> >> wrote: >> >>> Process Center needs to add new streams and scripts to DAS when users >>> configure new KPIs on processes. These KPI configurations can be performed >>> by process center users at runtime, therefore I think the best method is to >>> add corresponding streams/scripts using an API. For example, users can >>> select which process variables to publish and how to summarize them to >>> construct KPIs, so that an event stream and required scripts have to be >>> added at runtime. >>> >>> Is this supported by DAS? If not, what is the best approach to do this? >>> >>> Regards, >>> Chathura >>> >> >> >> >> -- >> *Anjana Fernando* >> Senior Technical Lead >> WSO2 Inc. | http://wso2.com >> lean . enterprise . middleware >> > > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Notebook Support Use cases for DAS
Hi Srinath, I'm afraid, we couldn't do any work on this yet, because at the moment, everyone is occupied on working for the DAS 3.0.1 release and the Log Analyzer work. I just had a chat with Miyuru, he mentioned he is checking CEP specific functionality for notebooks. I guess, the batch analytics integration with the notebook approach is somewhat straightforward, where what we basically have in Spark Console is a subset of that approach. So according to the current plan we made for next year, we planned on checking that for DAS 3.1.0, with the change to C5, where we would be changing all the UIs, which would be removing all the current functionality from the admin console and unifying the UIs. So in that effort, we can integrate this aspect too. Miyuru suggested that we'll have a quick chat on Friday, let's talk more then. Cheers, Anjana. On Tue, Dec 8, 2015 at 9:18 AM, Srinath Perera <srin...@wso2.com> wrote: > Anjana, how is this thread progressing? Who is looking at/ thinking about > notebooks? > > On Thu, Nov 26, 2015 at 9:19 AM, Anjana Fernando <anj...@wso2.com> wrote: > >> Hi Srinath, >> >> On Thu, Nov 26, 2015 at 9:08 AM, Srinath Perera <srin...@wso2.com> wrote: >> >>> Hi Anjana, >>> >>> Great!! I think the next step is deciding whether we do this with >>> Zeppelin and or we build it from scratch. >>> >>> Pros of Zeppelin >>> >>>1. We get lot of features OOB >>>2. Code maintained by community, patches etc. >>>3. New features will get added and it will evolve >>>4. We get to contribute to an Apache project and build recognition >>> >>> Cons >>> >>>1. Real deep integration might be lot of work ( we get initial >>>version very fast, but integrating details .. e.g. make our UIs work >>>in Zeppelin, or get Zeppelin to post to UES) might be tricky. >>>2. Zeppelin is still in incubator >>>3. Need to assess community >>> >>> I suggest you guys have a detailed chat with MiyuruD, who looked at it >>> in detail, try out things, thing about it and report back. >>> >> >> +1, we'll work with Miyuru also and see how to go forward. >> >> >>> >>> >>> On Thu, Nov 26, 2015 at 3:12 AM, Anjana Fernando <anj...@wso2.com> >>> wrote: >>> >>>> Hi Srinath, >>>> >>>> The story looks good. For that part about, the "user can play with the >>>> data interactively", to make it more functional, we should probably >>>> consider integration of Scala scripts to the mix, rather than only having >>>> Spark SQL. Spark SQL maybe limited in functionality on certain data >>>> operations, and with Scala, we should be able to use all the functionality >>>> of Spark. For example, it would be easier to integrate ML operations with >>>> other batch operations etc.. to create a more natural flow of operations. >>>> The implementation may be tricky though, considering clustering, >>>> multi-tenancy etc.. >>>> >>> Lets keep Scala version post MVP. >>> >> >> Sure. >> >> >>> >>> >>>> >>>> Also, I would like to also bring up the question on, are most batch >>>> jobs actually meant to be scheduled as such repeatedly, for a data set that >>>> actually grows always? .. or is it mostly a thing where we execute >>>> something once and get the results and that's it. Maybe this is a different >>>> discussion though. But, for scheduled batch jobs as such, I guess >>>> incremental processing would be critical, which no one seems to bother that >>>> much though. >>>> >>> I think it is mostly scheduled batches as we have. Shall we take this up >>> in a different thread? >>> >> >> Yep, sure. >> >> >>> >>> >>>> >>>> Cheers, >>>> Anjana. >>>> >>>> On Mon, Nov 23, 2015 at 2:57 PM, Srinath Perera <srin...@wso2.com> >>>> wrote: >>>> >>>>> Hi All, >>>>> >>>>> I tried to write down the use cases, to start thinking about this >>>>> starting from what we discussed in the meeting. Please comment. ( doc is >>>>> at >>>>> https://docs.google.com/document/d/1355YEXbhcd2fvS-zG_CiMigT-iTncxYn3DTHlJRTYyo/edit# >>>>> ( same content is below). >>>>> >>>>> Thanks >>>>> Srinath &
Re: [Architecture] Notebook Support Use cases for DAS
Hi Srinath, On Thu, Nov 26, 2015 at 9:08 AM, Srinath Perera <srin...@wso2.com> wrote: > Hi Anjana, > > Great!! I think the next step is deciding whether we do this with Zeppelin > and or we build it from scratch. > > Pros of Zeppelin > >1. We get lot of features OOB >2. Code maintained by community, patches etc. >3. New features will get added and it will evolve >4. We get to contribute to an Apache project and build recognition > > Cons > >1. Real deep integration might be lot of work ( we get initial version >very fast, but integrating details .. e.g. make our UIs work in Zeppelin, >or get Zeppelin to post to UES) might be tricky. >2. Zeppelin is still in incubator >3. Need to assess community > > I suggest you guys have a detailed chat with MiyuruD, who looked at it in > detail, try out things, thing about it and report back. > +1, we'll work with Miyuru also and see how to go forward. > > > On Thu, Nov 26, 2015 at 3:12 AM, Anjana Fernando <anj...@wso2.com> wrote: > >> Hi Srinath, >> >> The story looks good. For that part about, the "user can play with the >> data interactively", to make it more functional, we should probably >> consider integration of Scala scripts to the mix, rather than only having >> Spark SQL. Spark SQL maybe limited in functionality on certain data >> operations, and with Scala, we should be able to use all the functionality >> of Spark. For example, it would be easier to integrate ML operations with >> other batch operations etc.. to create a more natural flow of operations. >> The implementation may be tricky though, considering clustering, >> multi-tenancy etc.. >> > Lets keep Scala version post MVP. > Sure. > > >> >> Also, I would like to also bring up the question on, are most batch jobs >> actually meant to be scheduled as such repeatedly, for a data set that >> actually grows always? .. or is it mostly a thing where we execute >> something once and get the results and that's it. Maybe this is a different >> discussion though. But, for scheduled batch jobs as such, I guess >> incremental processing would be critical, which no one seems to bother that >> much though. >> > I think it is mostly scheduled batches as we have. Shall we take this up > in a different thread? > Yep, sure. > > >> >> Cheers, >> Anjana. >> >> On Mon, Nov 23, 2015 at 2:57 PM, Srinath Perera <srin...@wso2.com> wrote: >> >>> Hi All, >>> >>> I tried to write down the use cases, to start thinking about this >>> starting from what we discussed in the meeting. Please comment. ( doc is at >>> https://docs.google.com/document/d/1355YEXbhcd2fvS-zG_CiMigT-iTncxYn3DTHlJRTYyo/edit# >>> ( same content is below). >>> >>> Thanks >>> Srinath >>> Batch, interactive, and Predictive Story >>> >>>1. >>> >>>Data is uploaded to the system or send as a data stream and >>>collected for some time ( in DAS) >>>2. >>> >>>Data Scientist come in and select a data set, and look at schema of >>>data and do standard descriptive statistics like Mean, Max, Percentiles >>> and >>>standard deviation about the data. >>>3. >>> >>>Data Scientist cleans up the data using series of transformations. >>>This might include combining multiple data sets into one data set. >>> [Notebooks] >>>4. >>> >>>He can play with the data interactively >>>5. >>> >>>He visualize the data in several ways [Notebooks] >>>6. >>> >>>If he need descriptive statistics, he can export the data mutations >>>in the notebooks as a script and schedule it. >>>7. >>> >>>If what he needs is machine learning, he can initialize and run the >>>ML Wizard from the Notebooks and create a model. >>>8. >>> >>>He can export the model he created and any data mutation operations >>>he did as a script and deploy both the model and data mutation operations >>>in the CEP ( Realtime Pipeline). This is the actual transaction flow. >>>9. >>> >>>He can export the data mutation operations and machine learning >>>model building logic as a script and schedule it to run periodically. >>> This >>>is the >>> >>> >>> >>> [image: NotebookPipeline.png] >&
Re: [Architecture] Notebook Support Use cases for DAS
Hi Srinath, The story looks good. For that part about, the "user can play with the data interactively", to make it more functional, we should probably consider integration of Scala scripts to the mix, rather than only having Spark SQL. Spark SQL maybe limited in functionality on certain data operations, and with Scala, we should be able to use all the functionality of Spark. For example, it would be easier to integrate ML operations with other batch operations etc.. to create a more natural flow of operations. The implementation may be tricky though, considering clustering, multi-tenancy etc.. Also, I would like to also bring up the question on, are most batch jobs actually meant to be scheduled as such repeatedly, for a data set that actually grows always? .. or is it mostly a thing where we execute something once and get the results and that's it. Maybe this is a different discussion though. But, for scheduled batch jobs as such, I guess incremental processing would be critical, which no one seems to bother that much though. Cheers, Anjana. On Mon, Nov 23, 2015 at 2:57 PM, Srinath Perera <srin...@wso2.com> wrote: > Hi All, > > I tried to write down the use cases, to start thinking about this starting > from what we discussed in the meeting. Please comment. ( doc is at > https://docs.google.com/document/d/1355YEXbhcd2fvS-zG_CiMigT-iTncxYn3DTHlJRTYyo/edit# > ( same content is below). > > Thanks > Srinath > Batch, interactive, and Predictive Story > >1. > >Data is uploaded to the system or send as a data stream and collected >for some time ( in DAS) >2. > >Data Scientist come in and select a data set, and look at schema of >data and do standard descriptive statistics like Mean, Max, Percentiles and >standard deviation about the data. >3. > >Data Scientist cleans up the data using series of transformations. >This might include combining multiple data sets into one data set. > [Notebooks] >4. > >He can play with the data interactively >5. > >He visualize the data in several ways [Notebooks] >6. > >If he need descriptive statistics, he can export the data mutations in >the notebooks as a script and schedule it. >7. > >If what he needs is machine learning, he can initialize and run the ML >Wizard from the Notebooks and create a model. >8. > >He can export the model he created and any data mutation operations he >did as a script and deploy both the model and data mutation operations in >the CEP ( Realtime Pipeline). This is the actual transaction flow. >9. > >He can export the data mutation operations and machine learning model >building logic as a script and schedule it to run periodically. This is the > > > > [image: NotebookPipeline.png] > > > > Realtime Story > > Realtime story also we can start with a data set, write realtime queries, > test them by replaying the data, and then only we deploy queries. ( We do > this event now). We can do the same. > > >1. > >User start with a dataset. >2. > >He write a set of queries using dataset as a stream. Streams and >dataset shares the same record format. For example, consider the following >data set. > > > We can consider this as a batch data set by taking it as a whole or as a > stream by taking record by record. > > For example, if we run query > > select * from CountryData where GDP>35000 > > it will provide following results. > > > > >1. > >Tables created by replay data with CEP queries, we can visualize like >other data. ( except that time is special) >2. > >When Data Scientist is happy, Data Scientist can click a button and >export the CEP queries as a execution plan and any charts as a realtime >gadgets. ( one complication is time is special, and we need to transform >from any visualization to time based visualization) > > > -- > > Blog: http://srinathsview.blogspot.com twitter:@srinath_perera > Site: http://people.apache.org/~hemapani/ > Photos: http://www.flickr.com/photos/hemapani/ > Phone: 0772360902 > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] [LogAnalyzer] How the user can configure log publishing agent
Hi Anuruddha, On Tue, Nov 10, 2015 at 7:04 PM, Anuruddha Premalal <anurud...@wso2.com> wrote: > Hi Anjana, > > What was meant by " log stash log configuration files" is its > configuration format, not that we are making use of logstash to publish > data, of course we are writing our own agent based on similar config format. > Yeah, I know, it is the configuration format I told to review carefully, to see if the semantics defined there is enough for our use cases. Cheers, Anjana. > > On Wed, Nov 11, 2015 at 1:54 AM, Anjana Fernando <anj...@wso2.com> wrote: > >> >>>1. If log stash log configuration files are well done, can we do the >>>same formats? >>> >>> Yes, this has already been discussed in architecture mail "Component >> level description of the log analyzer tool" >> >> Please check this with Sachith also, he has some experience in working >> with logstash (he did a logstash adapter earlier), and he will know the >> limitations/benefits in using it to map to our events, starting from >> arbitrary field support etc.. We should check the balance of creating >> something on our own vs living with the limitations/annoyances of logstash >> would have which would not directly map to our use cases. >> >> Cheers, >> Anjana. >> >> Thanks >>> Srinath >>> >>> p.s. above are opinions only, please shout if disagree. >>> >>> >>> >>> >>> On Fri, Nov 6, 2015 at 6:33 PM, Malith Dhanushka <mal...@wso2.com> >>> wrote: >>> > >>> > Yes I agree with the complication on applying agent configs in large >>> clusters. But centralized config management using a message broker is a >>> critical decision to take as it weighs maintenance effort. That decision >>> depends on how big the cluster is and how frequently the log configs are >>> getting changed. >>> > >>> > On Fri, Nov 6, 2015 at 3:22 PM, Inosh Goonewardena <in...@wso2.com> >>> wrote: >>> >> >>> >> Hi Anurudda, >>> >> >>> >> >>> >> On Fri, Nov 6, 2015 at 3:06 PM, Anuruddha Premalal < >>> anurud...@wso2.com> wrote: >>> >>> >>> >>> Hi Inosh, >>> >>> >>> >>> Can you be specific on the added complexities of managed >>> configuration mode? I have explained in the sequence diagram how this will >>> function. Manage configuration mode is actually a user choice, if the >>> deployment is quite simple user can use default agent side configurations >>> (as in logstash). >>> >> >>> >> >>> >> As Malith pointed out, my idea was to avoiding configuring the log >>> agent remotely and publishing the config. But yes, in a larger cluster, >>> configuring each of the agent won't be practical and managed config mode is >>> the better approach. If the user has the choice he/she can select depending >>> on his/her preference. >>> >> >>> >>> >>> >>> >>> >>> Managed config mode addresses a major lacking feature which agent >>> config mode doesn't have; If a user needs to change/ update configs for a >>> large cluster, configuring them each won't be practical. >>> >>> >>> >>> In terms of the overhead concern of splitting an event at the agent >>> side over master side, since a single log event usually have less amount of >>> characters, it won't cost much to perform the filtering; if we consider >>> master side, there won't only be a single log stream so it obviously adds >>> more overhead to the master. Because of this we shouldn't do filtering >>> never on master side. >>> >>> >>> >>> We are writing the agent using python, which doesn't consume more >>> resources as a jvm, and it will absolutely be an advantage for a smooth run. >>> >>> >>> >>> >>> >>> On Fri, Nov 6, 2015 at 2:43 PM, Inosh Goonewardena <in...@wso2.com> >>> wrote: >>> >>>> >>> >>>> Hi, >>> >>>> >>> >>>> On Fri, Nov 6, 2015 at 1:48 PM, Sachith Withana <sach...@wso2.com> >>> wrote: >>> >>>>> >>> >>>>> Hi Malith, >>> >>>>> >>> >>>>> In terms of the 1st option, >>> >>>>> - the o
Re: [Architecture] [DAS] Java Agent to monitor server activities
Hi Udani, Can you please explain a bit more on, how the field names of the streams will be derived. That is, for example, how will an event look like, when a method before scenario gets hit, method after, insert at and so on. Basically, give some sample event payloads for each scenario. Also, ideally later on, we should be able to copy new configuration files for new scenarios to a specific folder of the agent, and the agent should pick up all the configuration files, load up all the scenario in the agent startup and execute them. So we can create these configuration files for specific scenarios and install them when needed. For example, database monitoring scenario, JMS event monitoring scenario configuration files etc.. Cheers, Anjana. On Wed, Oct 28, 2015 at 1:28 AM, Udani Weeraratne <ud...@wso2.com> wrote: > Hi, > > I am working on a java agent which can be used to monitor different > activities carried out within DAS. Main concept of java agent is to modify > bytecode of classes before they load onto JVM (bytecode instrumentation). > This provide the ability to inject code into classes according to our > requirement. > > Currently we are trying to implement a simple agent, which can monitor > method calls and parameters passed under a given scenario and publish them > to a stream in DAS. The architecture of this approach will be as follows. > > [image: Inline image 1] > > > We will provide a simple configuration file, where user has to specify the > class name, method name with signature, parameters to monitor and the > location to be inserted (using javassist we can insert code at the top, at > bottom and at a specific line of the method). Then the agent will be > initialized based on the user requirement and instrument the requested > methods before respective classes load onto JVM. (Javassist will be the > library used in the instrumentation process) Once the classes are > instrumented before the server start running, we will be able to publish > events containing the intercepted data to a stream in DAS. Using the > ability to publishing arbitrary fields in DAS, we are trying to provide the > ability to index and store events with intercepted data. This can be used > as a profiler to monitor the activities of the server. > > Layout of configuration file > > > > > > > > > > > > signature="(Ljava/lang/String;)Ljava/sql/PreparedStatement;"> > > > > > > > >$1 > > > > > > > > > > > > > > > > This is the overall idea about the java agent we are working on. Hope this > will be able to add value to the product. Appreciate any suggestions on > this. > > > Thanks, > > Udani > > -- > *Udani Weeraratne* > Software Engineer - Intern > WSO2 Inc.: www.wso2.com > lean.enterprise.middleware > > Email: ud...@wso2.com > Mobile: +94 775437714 > LinkedIn: *https://lk.linkedin.com/in/udaniweeraratne > <https://lk.linkedin.com/in/udaniweeraratne>* > Blog : https://udaniweeraratne.wordpress.com/ > > ___ > Architecture mailing list > Architecture@wso2.org > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] [Dev] [VOTE] Release WSO2 DSS 3.5.0 RC2
Hi, I tested the following:- * OData functionality - Read/write/update/delete - Reading metadata - Reading with conditions * New boxcarring functionality (request_box) - Multiple operation execution - Transaction commit/rollback on success/error * Verified RC1 blocker. [X] Stable - go ahead and release Cheers, Anjana. On Sat, Oct 24, 2015 at 1:33 AM, Rajith Vitharana <raji...@wso2.com> wrote: > Hi, > > This is the second release candidate of WSO2 DSS 3.5.0 > > This release fixes the following issues: > *https://wso2.org/jira/issues/?filter=12469 > <https://wso2.org/jira/issues/?filter=12469>* > > Please download, test and vote. The vote will be open for 72 hours or as > needed. > > Source & binary distribution files: > https://svn.wso2.org/repos/wso2/scratch/DSS/3.5.0/RC2/wso2dss-3.5.0.zip > <https://svn.wso2.org/repos/wso2/scratch/DSS/3.5.0/RC1/wso2dss-3.5.0.zip> > > JavaDocs > https://svn.wso2.org/repos/wso2/scratch/DSS/3.5.0/RC2/javaDocs/index.html > > Maven staging repo: > *http://maven.wso2.org/nexus/content/repositories/orgwso2dss-058/ > <http://maven.wso2.org/nexus/content/repositories/orgwso2dss-058/>* > > The tag to be voted upon: > *https://github.com/wso2/product-dss/tree/v3.5.0-RC2 > <https://github.com/wso2/product-dss/tree/v3.5.0-RC2>* > > > [ ] Broken - do not release (explain why) > [ ] Stable - go ahead and release > > Thanks, > The WSO2 DSS Team > > _______ > Architecture mailing list > Architecture@wso2.org > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] [Dev] [VOTE] Release WSO2 DSS 3.5.0 RC2
Hi, Please note that, earlier mail's distribution link's target wrong, it actually points to the RC1 link (which is what you get when you click it, you will have to copy and paste the link text to get the correct one), anyways, the correct one again can be found below:- https://svn.wso2.org/repos/wso2/scratch/DSS/3.5.0/RC2/wso2dss-3.5.0.zip Cheers, Anjana. On Sat, Oct 24, 2015 at 1:33 AM, Rajith Vitharana <raji...@wso2.com> wrote: > Hi, > > This is the second release candidate of WSO2 DSS 3.5.0 > > This release fixes the following issues: > *https://wso2.org/jira/issues/?filter=12469 > <https://wso2.org/jira/issues/?filter=12469>* > > Please download, test and vote. The vote will be open for 72 hours or as > needed. > > Source & binary distribution files: > https://svn.wso2.org/repos/wso2/scratch/DSS/3.5.0/RC2/wso2dss-3.5.0.zip > <https://svn.wso2.org/repos/wso2/scratch/DSS/3.5.0/RC1/wso2dss-3.5.0.zip> > > JavaDocs > https://svn.wso2.org/repos/wso2/scratch/DSS/3.5.0/RC2/javaDocs/index.html > > Maven staging repo: > *http://maven.wso2.org/nexus/content/repositories/orgwso2dss-058/ > <http://maven.wso2.org/nexus/content/repositories/orgwso2dss-058/>* > > The tag to be voted upon: > *https://github.com/wso2/product-dss/tree/v3.5.0-RC2 > <https://github.com/wso2/product-dss/tree/v3.5.0-RC2>* > > > [ ] Broken - do not release (explain why) > [ ] Stable - go ahead and release > > Thanks, > The WSO2 DSS Team > > _______ > Architecture mailing list > Architecture@wso2.org > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] [Dev] [VOTE] Release WSO2 DSS 3.5.0 RC1
-1. Discovered the following issue [1]. Even though, we can workaround it, it is a significant user experience issue, so we must fix it. [1] https://wso2.org/jira/browse/DS-1128 Cheers, Anjana. On Fri, Oct 23, 2015 at 2:38 AM, Rajith Vitharana <raji...@wso2.com> wrote: > Hi, > > This is the first release candidate of WSO2 DSS 3.5.0 > > This release fixes the following issues: > https://wso2.org/jira/browse/DS-1126?filter=12469 > > Please download, test and vote. The vote will be open for 72 hours or as > needed. > > Source & binary distribution files: > https://svn.wso2.org/repos/wso2/scratch/DSS/3.5.0/RC1/wso2dss-3.5.0.zip > > JavaDocs > https://svn.wso2.org/repos/wso2/scratch/DSS/3.5.0/RC1/javaDocs/index.html > > Maven staging repo: > http://maven.wso2.org/nexus/content/repositories/orgwso2dss-045/ > > The tag to be voted upon: > https://github.com/wso2/product-dss/tree/v3.5.0-RC1 > > > [ ] Broken - do not release (explain why) > [ ] Stable - go ahead and release > > Thanks, > The WSO2 DSS Team > > ___ > Architecture mailing list > Architecture@wso2.org > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] [DAS][BPS]Business process monitoring dashboard
Hi Srinath, We should not generally recommend to output data to a standard RDBMS, and to query using SQL, then we lose the portability of the functionality we have with DAS, when using it's DAL. That is, if you change it to some other database server, e.g. Cassandra, HBase etc.. we would not be able to do that, where if we use the standard APIs exposed by DAS, they will always be available. Cheers, Anjana. On Thu, Oct 1, 2015 at 2:42 PM, Srinath Perera <srin...@wso2.com> wrote: > I chatted with Chathura. > > We can use spark to aggregate data grouped by user and task-id and save it > a SQL DB. Then we can use SQL query (called from the UI) to get the data > for a specific task-id. > > Thanks > Srinath > > On Thu, Oct 1, 2015 at 1:21 PM, Anjana Fernando <anj...@wso2.com> wrote: > >> Hi Chathura, >> >> The only way you can pass a parameter to a query as such in a script >> would be to use an UDF. This is mentioned in the docs on how to do it. But >> I'm wondering, if this would also be proper. Since, these are scheduled >> batch scripts, and will most probably take some time to again start >> executing and finish it. So a user setting from a UI setting this >> parameters, not sure if it's practical. Like, it cannot be used in a >> dashboard, where the results are expected quickly. You may also want to >> check out indexing functionality, where you can most probably use a static >> query for the batch operation, and when inserting the resultant summarized >> data, you can index it, so you can quickly look it up using time ranges and >> so on. Also, there is a possibility to bypass Spark SQL altogether using >> our aggregates features in our indexing functionality. >> >> @Gimantha, is [1] the only documentation we have on the indexing >> aggregation features? .. if so, please update it to be more comprehensive. >> It is better if we can give side by side solutions onto how we do >> aggregates in SQL, and the comparable approach we would do in our indexing >> features. >> >> [1] >> https://docs.wso2.com/display/DAS300/Retrieving+Aggregated+Values+of+Given+Records+via+JS+API >> >> Cheers, >> Anjana. >> >> On Wed, Sep 30, 2015 at 10:43 PM, Chathura Ekanayake <chath...@wso2.com> >> wrote: >> >>> Process monitoring graphs in [1] were proposed to give some level of top >>> to bottom analysis. For example, a business analyst may first identify slow >>> performing processes using the graph number 2. Then he can analyze >>> bottleneck tasks of those slow processes from the graph number 10, where he >>> has to generate graph 10 for each slow process. Then he can further analyze >>> the users who performed bottleneck tasks frequently by generating graph >>> number 11 for each slow task. Therefore, ability to execute parameterized >>> queries is critical for these process monitoring features. >>> >>> a.) Is that possible in DAS side ? >>>> >>>> eg: SELECT processDefinitionId, COUNT(processInstanceId) AS >>>> processInstanceCount, AVG(duration) AS avgExecutionTime FROM >>>> BPMNProcessInstances WHERE date BETWEEN *"fromDate" *AND* "toDate" *GROUP >>>> BY processDefinitionId; >>>> (here *fromDate* and *toDate* are variables that need to be passed at >>>> runtime) >>>> >>>> b.) If not we can store the summarized data with primary and secondary >>>> filters which mentioned in [1] on DAS and then we can fetch them through >>>> DAS REST API by passing appropriate parameters. >>>> >>> >>> Isuru, I think the approach (b) does not scale. There can be hundreds >>> of processes and thousands of tasks (in all processes). Therefore, it is >>> not practical to pre-compute data for all graphs. >>> >>> Ability to execute parameterized queries or to provide queries at >>> runtime through an API would be helpful to solve this problem. >>> >>> [1] >>> https://docs.google.com/a/wso2.com/spreadsheets/d/1pQAK6x4-rL-hQA7-NOaoT2llyjxv_nfc_vUarwUr74w/edit?usp=sharing >>> >>> Regards, >>> Chathura >>> >>> >>> >>> >> >> >> -- >> *Anjana Fernando* >> Senior Technical Lead >> WSO2 Inc. | http://wso2.com >> lean . enterprise . middleware >> > > > > -- > > Srinath Perera, Ph.D. >http://people.apache.org/~hemapani/ >http://srinathsview.blogspot.com/ > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Feature to send multiple operation requests in a single request
Hi Rajith, Let's use the same "enableBoxcarring" flag for this improvement, since we already have that, and just note that, begin_boxcar etc.. operations are deprecated. The new operation, how about "request_box"?, since the "box" term is anywhere there in "boxcarring". Cheers, Anjana. On Fri, Sep 11, 2015 at 5:35 PM, Rajith Vitharana <raji...@wso2.com> wrote: > Hi All, > > We thought of using "request_batch" as the reserved operation name and > "enableRequestBatch" as the parameter in dbs, But this may confuse end > users as we already have "enableBatchRequest" parameter in the dbs. So it > would be better if we can change this to suitable name. Appreciate any > feedback on this. > > Thanks, > > On Fri, Sep 11, 2015 at 4:17 PM, Rajith Vitharana <raji...@wso2.com> > wrote: > >> Hi Vidura, >> >> >> On Fri, Sep 11, 2015 at 4:07 PM, Vidura Gamini Abhaya <vid...@wso2.com> >> wrote: >> >>> Thanks Rajith. >>> >>> Would we still keep the semantics the same? i.e. client calls, >>> >>> stub.begin_requestbox(); >>> stub.operation1(foo, bar); >>> stub.operation2(bar); >>> stub.end_requestbox(); >>> >> No it'll going to be a single call, which will be the one in the initial >> mail. It will contain all the operations required within that. >> >>> >>> How are we planning to get the code that does the collating on to the >>> client? Would the users be forced to use a our tools to generate the stubs? >>> >> I don't think there will be any issue since we are providing WSDL with >> the required operations, which will also contain new "request_box" >> operation as well. >> >> Thanks, >> >> -- >> Rajith Vitharana >> >> Software Engineer, >> WSO2 Inc. : wso2.com >> Mobile : +94715883223 >> Blog : http://lankavitharana.blogspot.com/ >> > > > > -- > Rajith Vitharana > > Software Engineer, > WSO2 Inc. : wso2.com > Mobile : +94715883223 > Blog : http://lankavitharana.blogspot.com/ > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] How to Ship Fraud Solution?
Hi Srinath, Yeah, we should be able to do that. The dashboard have the capability to add a static web page to the dashboard, so we can put it in like that. So yeah, we can test it now and see, how it will work, and we can host the toolbox separately. That is, it doesn't necessarily have to go with the product itself. Cheers, Anjana. On Thu, Aug 13, 2015 at 10:47 AM, Srinath Perera srin...@wso2.com wrote: Hi Anjana, Can we ship it as a car file that people can just download and install to DAS? Would that work in coming release? Thanks Srinath -- Blog: http://srinathsview.blogspot.com twitter:@srinath_perera Site: http://people.apache.org/~hemapani/ Photos: http://www.flickr.com/photos/hemapani/ Phone: 0772360902 -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Carbon datasource implementation for Cassandra
Hi Gokul, Thanks, the pull request is merged now. Cheers, Anjana. On Fri, Jul 24, 2015 at 12:31 PM, Gokul Balakrishnan go...@wso2.com wrote: Hi Devs, I've completed implementation of $subject based on the DataStax Java driver. This component enables connection to a Cassandra cluster through its CQL interface, and provides the client with a com.datastax.driver.core.Cluster reference based on which com.datastax.driver.core.Session instances could be created for use by the client. CQL native protocol versions v1 through v3 are supported. Provisions have been made for specifying most connection parameters through the datasource configuration, including protocol, pool, socket and query options. A sample configuration would look like the following: provider org.wso2.carbon.datasource.reader.cassandra.CassandraDataSourceReader/ provider datasource nameWSO2_ANALYTICS_EVENT_STORE_CASSANDRA/name descriptionThe datasource used for analytics record store/ description definition type=CASSANDRA configuration contactPoints192.168.1.1, 192.168.1.2/contactPoints port9042/port usernameadmin/username passwordadmin/password clusterNamecluster1/clusterName compressiongzip/compression poolingOptions coreConnectionsPerHost hostDistance=LOCAL8/ coreConnectionsPerHost maxSimultaneousRequestsPerHostThreshold hostDistance =REMOTE256/maxSimultaneousRequestsPerHostThreshold /poolingOptions queryOptions fetchSize100/fetchSize consistencyLevelLOCAL_ONE/consistencyLevel serialConsistencyLevelSERIAL/serialConsistencyLevel /queryOptions socketOptions keepAlivetrue/keepAlive tcpNoDelaytrue/tcpNoDelay sendBufferSize15/sendBufferSize connectTimeoutMillis12000/connectTimeoutMillis readTimeoutMillis12000/readTimeoutMillis /socketOptions /configuration /definition /datasource I've sent the pull request for $subject at [1]. @DSS team, please review and merge. [1] https://github.com/wso2/carbon-data/pull/19 Thanks, Gokul. -- Gokul Balakrishnan Senior Software Engineer, WSO2, Inc. http://wso2.com Mob: +94 77 593 5789 | +1 650 272 9927 -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] [DAS] Changing the name of Message Console
Hi, +1 for Data Explorer for message console. The name Spark Console is fine the way it is now. Cheers, Anjana. On Sun, Jul 12, 2015 at 7:59 AM, Niranda Perera nira...@wso2.com wrote: Hi all, DAS currently ships a UI component named 'message console'. it can be used to browse data inside the DAS tables. IMO this name message console, is misleading. for a person who's new to DAS would not know the exact use of it just by reading the name. I suggest a more self-explanatory name such as, 'data explorer', 'data navigator' etc WDYT? -- *Niranda Perera* Software Engineer, WSO2 Inc. Mobile: +94-71-554-8430 Twitter: @n1r44 https://twitter.com/N1R44 https://pythagoreanscript.wordpress.com/ -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] [NTask] What are the distinct purposes of setProperties(), init() and execute() methods of Task interface ?
Actually, Madhawa, shall we fix it with a new version of ntask component. As Sagara mentioned, put just a single method called execute(MapString, String properties), and remove all the other methods. Please create a JIRA for it, and fix it. Cheers, Anjana. On Wed, Jun 3, 2015 at 8:58 AM, Anjana Fernando anj...@wso2.com wrote: Hi Sagara, Yes, you're correct, earlier when this was designed first, thinking, it would work in the way that, properties set first, init called once, and execute called multiple times. But later on, it was discovered actually, that Quartz creates instances of the task implementations and calls this every time. At that time, I didn't properly make the changes to reflect this behavior, and I agree it is a bit misleading. This has to be fixed properly eventually. For now, we explicitly has to remember that is how the flow will work. Cheers, Anjana. On Wed, Jun 3, 2015 at 6:42 AM, Sagara Gunathunga sag...@wso2.com wrote: org.wso2.carbon.ntask.core.Task interface has defined following 3 methods. setProperties(Map map) init() execute() According to my understanding it's obvious to think setProperties() and init() as task's lifecycle methods and call only one time during initialization while execute() method is call by scheduler several times depend on cron expression. I wrote very simple Registry Task [1] and tested, it seems all 3 methods runs several times. I only expect to run execute() method N times but actual result is all 3 methods run N times. Little debugging revealed during the TaskQuartzJobAdapter:execute()[2] method it calls above 3 methods one after another as follows. *task.setProperties(properties);* int tenantId = Integer.parseInt(properties.get(TaskInfo.TENANT_ID_PROP)); try { PrivilegedCarbonContext.startTenantFlow(); PrivilegedCarbonContext.getThreadLocalCarbonContext().setTenantId(tenantId, true); *task.init();* *task.execute();* } With this I have following questions. 1.) What are the distinct design objectives of above 3 methods ? 2.) If TaskQuartzJobAdapter implementation is correct then why we need 3 distinct methods ? IMO *execute(properties) * can provide all these capabilities ? [1] - https://docs.wso2.com/display/Governance460/Scheduled+Task+Sample [2] - https://github.com/wso2/carbon-commons/blob/master/components/ntask/org.wso2.carbon.ntask.core/src/main/java/org/wso2/carbon/ntask/core/impl/TaskQuartzJobAdapter.java Thanks ! -- Sagara Gunathunga Architect; WSO2, Inc.; http://wso2.com V.P Apache Web Services;http://ws.apache.org/ Linkedin; http://www.linkedin.com/in/ssagara Blog ; http://ssagara.blogspot.com -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Searching registry artifacts in the enterprise store
Hi, Yeah, we simply use the Lucene query syntax. There was no reason for us to create our own on top of it, because it provides a very powerful syntax to query the data. For example, Elastic also use Lucene query language for there solution. I'm not sure, for registry if this is suitable or not, as in, by giving the full power to the user to query all the attributes indexed, and whether some should be filtered/hidden from the end user. Cheers, Anjana. On Mon, May 25, 2015 at 8:53 AM, Srinath Perera srin...@wso2.com wrote: Shazni, is backend our code? if so we can fix it. Or we can translate from simpler version to complex version automatically in our code. I also think it should be country=usa. Also, BAM had the same problem and gone with Solr syntax. I am not sure what is the right answer, but pretty use it should be same for both. Sagara, Anjana please talk. --Srinath On Fri, May 22, 2015 at 5:58 PM, Shazni Nazeer sha...@wso2.com wrote: @Manuranga - Fair question. But that's the way the search attribute service in the backend expects. Further, the query I have given is specifically to query a property in the artifact. So specifying country=usa, we should internally find out that it's a property that the user is querying. And for your concern that convenient method is not that convenient, that's what the question is all about; whether to keep the query as it's or use a different syntax and pass the attribute map to the search service within the method. Shazni Nazeer Mob : +94 37331 LinkedIn : http://lk.linkedin.com/in/shazninazeer Blog : http://shazninazeer.blogspot.com On Fri, May 22, 2015 at 5:29 PM, Manuranga Perera m...@wso2.com wrote: That convenient method is not that convenient. Why propertyName=countryrightOp=eqrightPropertyValue=usa Instead country=usa ? ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- Srinath Perera, Ph.D. http://people.apache.org/~hemapani/ http://srinathsview.blogspot.com/ -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Carbon Datasource Reader Implementation for Apache Hadoop
Hi Srinath, Yeah, I'd a chat with Gokul yesterday, we are changing this to HDFS and also having another HBase one as well, I think he has already done the changes. @Gokul, please send the updated information. Cheers, Anjana. On Thu, May 14, 2015 at 1:10 PM, Srinath Perera srin...@wso2.com wrote: Can we call type HDFS instead of Hadoop? ( if we can change that without much trouble) On Tue, May 12, 2015 at 8:38 PM, Gokul Balakrishnan go...@wso2.com wrote: Hi all, As part of the HBase analytics datasource implementation for DAS 3.0, we have come up with $subject which is envisioned to offer a standardised way to specify connectivity parameters for a remote Hadoop-based instance in a Carbon datasource configuration. The datasource reader will expect the configuration to be specified in a similar format which is used for standard Apache Commons Configuration [1], as used by both HDFS and HBase. An example datasource definition would look like: datasource nameWSO2_ANALYTICS_FS_DB_HDFS/name descriptionThe datasource used for analytics file system/ description jndiConfig namejdbc/WSO2HDFSDB/name /jndiConfig definition type=HADOOP configuration property namefs.default.name/name valuehdfs://localhost:9000/value /property property namedfs.data.dir/name value/dfs/data/value /property property namefs.hdfs.impl/name valueorg.apache.hadoop.hdfs.DistributedFileSystem/ value /property property namefs.file.impl/name valueorg.apache.hadoop.fs.LocalFileSystem/value /property /configuration /definition /datasource The definition type for the above is set as HADOOP. The datasource reader implementation is currently hosted at [2], and would be merged with the carbon-data git repo once reviewed. Appreciate your thought and suggestions. Thanks, Gokul. [1] http://commons.apache.org/proper/commons-configuration/ [2] https://github.com/gokulbs/carbon-data/tree/master/components/data-sources/org.wso2.carbon.datasource.reader.hadoop -- Balakrishnan Gokulakrishnan Senior Software Engineer, WSO2, Inc. http://wso2.com Mob: +94 77 593 5789 | +1 650 272 9927 ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- Srinath Perera, Ph.D. http://people.apache.org/~hemapani/ http://srinathsview.blogspot.com/ ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Analytics Facets APIs in AnalyticsDataService
Gimantha, it is better if you can give some possible use cases for each of the features we have listed here, so people can get an better understanding of what to use where. Cheers, Anjana. On Thu, Mar 19, 2015 at 8:23 PM, Gimantha Bandara giman...@wso2.com wrote: Hi all, Analytics facets APIs provide indexing capabilities for hierarchical categorization of table entries in New analytics data service (Please refer to [Architecture] BAM 3.0 REST APIs for AnalyticsDataService / Indexing / Search for more information). Using facet APIs, an user can define multiple categories as indices for a table and later can be used to search table entries based on categories. These APIs will be generic, so the user can assign a weight for each category when indexing, combine a mathematical function to calculate weights, *Facet Counts* As an example in log analysis, consider the following E.g. log-time : 2015/mar/12/ 20:30:23, 2015/jan/16 13:34:76, 2015/jan/11 01:34:76 ( in 3 different log lines) In the above example the log time can be defined as a hierarchical facet as year/month/date. Later if the user wants to get the counts of log entries by year/month, API would return 2015/jan - Count :2 2015/mar - Count 1 If the user wants to get the total count of log entries by year, API would return 2015 - Count :3 If the user wants to get the count of log entries by year/month/date, API returns, 2015/jan/11 - Count :1 2015/jan/16 - Count :1 2015/mar/12 - Count : 1 *Drill-Down capabilities* Dill down capabilities are provided by Facets APIs. User can drill down through the facet hierarchy of the index and search table entries. User also can combine a search query so he can filter out the table entries. As an example, in above example, User queries for the total count of log lines in 2015/jan/11 ( he gets 1 as the count) and then he wants to view the other attributes of the log line ( TID, Component name, log level, ..etc). *REST APIs for Facets* Users will be able to use facets API through REST APIs. Users can create facets indices via the usual Analytics indexing REST APIs and insert hierarchical category information through Analytics REST APIs, Following are the updated Analytics REST APIs. 1. Drill-down through a facets hierarchy /analytics/drilldown or /analytics/drilldown-count { tableName : categories : [{ name : hierarchy name e.g. Publish date categoryPath : [ ], hierarchy as an array e.g. [2001, March, 02] }], language : lucene or regex query : lucene query or regular expression scoreFunction : Javascript function to define scoring function scoreParams : [] Array of docvalue fields used as parameters for scoring function } 2. Querying for Ranges (Additional to facets) /analytics/searchrange or /analytics/rangecount { tableName : sample-table-name, ranges : [{ label: from : to: minInclusive: maxInclusive: }], language : query : } In addition to the existing index types two more are introduced. They are FACET and SCOREPARAM. FACET is used to define a hierarchical facet field and SCOREPARAM is used to define scoring parameters for score function. *Adding Facet fields and score fields* *to a table/tables* Facet fields and score fields need to be defined using indexing APIs. /analytics/tables/table-name/indices { field : STRING, facetField : FACET, scoreField : SCOREPARAM } Later user can add facet and score fields using POST to, /analytics/tables/table-name [ { values : { field : value, facetField : { weight : categoryPath : [ ] }, scoreField : numeric-value } } ] or /analytics/records [ { tableName : values : { field : value, facetField : { weight : categoryPath : [ ] }, scoreField : numeric-value } } ] Feedback and suggestions are appreciated. -- Gimantha Bandara Software Engineer WSO2. Inc : http://wso2.com Mobile : +94714961919 -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
[Architecture] WSO2 BAM 3.0 M2 Released!
The WSO2 BAM team is pleased to announce the second milestone release of WSO2 BAM v3.0. The distribution is available at [1]. The release includes the following new features. New Features - [BAM-1957 https://wso2.org/jira/browse/BAM-1957] - Spark Script Scheduling - [BAM-1959 https://wso2.org/jira/browse/BAM-1959] - Support Spark Clustering The documentation for BAM v3.0 can be found at [2]. Your feedback is most welcome, and any issues can be reported to the project at [3]. [1] https://svn.wso2.org/repos/wso2/people/anjana/BAM30/wso2bam-3.0.0-M2.zip [2] https://docs.wso2.com/display/BAM300/WSO2+Business+Activity+Monitor+Documentation [3] https://wso2.org/jira/browse/BAM - WSO2 BAM Team -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
[Architecture] WSO2 BAM 3.0 M1 Released!
The WSO2 BAM team is pleased to announce the first milestone release of WSO2 BAM v3.0. The distribution is available at [1]. The release includes the following new features. New Features - [BAM-1948 https://wso2.org/jira/browse/BAM-1948] - Data Abstraction Layer for Analytics - [BAM-1949 https://wso2.org/jira/browse/BAM-1949] - Spark SQL based Analytics Query Execution - [BAM-1950 https://wso2.org/jira/browse/BAM-1950] - DataPublisher Rewrite - [BAM-1951 https://wso2.org/jira/browse/BAM-1951] - RDBMS Datasource Support - [BAM-1952 https://wso2.org/jira/browse/BAM-1952] - REST APIs for Analytics Data Service - [BAM-1953 https://wso2.org/jira/browse/BAM-1953] - CLI like UI interface for Spark Integration The documentation for BAM v3.0 can be found at [2]. Your feedback is most welcome, and any issues can be reported to the project at [3]. [1] https://svn.wso2.org/repos/wso2/people/gihan/wso2bam-3.0.0-M1.zip [2] https://docs.wso2.com/display/BAM300/WSO2+Business+Activity+Monitor+Documentation [3] https://wso2.org/jira/browse/BAM - *WSO2 BAM Team* -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] [BAM] [Security] Securing REST API
On Wed, Feb 4, 2015 at 5:15 AM, Prabath Siriwardena prab...@wso2.com wrote: If you say Basic Auth is easy - then there is no difference in using OAuth too:-) Basically the resource owner credentials grant type was introduced in OAuth to migrate clients from Basic/Digest authentication into OAuth... By looking at the use case - its clearly something to do with the access delegation. One server needs to access a resource (API) on behalf another user.. it clearly something to do with OAuth. Yes, that's true :) .. guess the simple username/password scenario also can be covered with OAuth, if the requirement comes. Cheers, Anjana. Thanks regards, -Prabath On Tue, Feb 3, 2015 at 3:21 AM, Anjana Fernando anj...@wso2.com wrote: Yes, I guess, we should anyway give the ability for users to use the API with something simple like basic auth (if it makes sense for a specific scenario), and then also support something like OAuth for other scenarios, like here, we are talking about, internally using it from our dashboards etc.. for accessing the backend APIs. Cheers, Anjana. On Tue, Feb 3, 2015 at 4:44 PM, Isabelle Mauny isabe...@wso2.com wrote: All, Who is going to use those REST APIs ? And from where ? While I agree with all the discussion about making the APIs secure, it's kind of pointless without a usage context. Generating/managing an OAuth token is not easy from the client side, if the REST APIs are used from a script for example, OAuth might not be optimal. Would the APIs be exposed externally for any reason ( to the general public ? ) - We had that problem with G-Reg before, with users incapable to integrate with G.REG due to the requirement of an OAuth token. Shouldn't we leave people a choice ? Isabelle. __ *Isabelle Mauny*VP, Product Management; WSO2, Inc.; http://wso2.com/ On Feb 3, 2015, at 11:53 AM, Manuranga Perera m...@wso2.com wrote: Hi Johann, so if a user is logged is using SAML, is there a way we call a OAuth2 API form the front end js (via REST) directly without going through a proxy? On Tue, Feb 3, 2015 at 11:22 PM, Johann Nallathamby joh...@wso2.com wrote: The discussion is about how to secure APIs, and OAuth2 is the popular choice here. How to do SSO to the web front end is a separate question and OpenID Connect can be one possibility. Like others have mentioned in this thread above, there can be other ways to login to the web front end, e.g. SAML2 SSO, username/password, etc. Depending on the login mechanism there are other grant types you may be able to use to secure APIs using OAuth2 such as SAML2 Bearer, Resource Owner Password, self-issued tokens, etc. OpenID Connect might be the ideal choice, but right now the limitation we have with OpenID Connect is that we don't support the session management protocol which is required for single logout. On Tue, Feb 3, 2015 at 5:18 AM, Manuranga Perera m...@wso2.com wrote: Hi Johann, As I understand (form Dulanja) we need OpenID Connect [1] to fully integrate with web front-end. so we can keep the token in fount end (in JS) and do the call using REST. isn't that the way to go? [1] http://openid.net/connect/ -- Thanks Regards, *Johann Dilantha Nallathamby* Associate Technical Lead Product Lead of WSO2 Identity Server Integration Technologies Team WSO2, Inc. lean.enterprise.middleware Mobile - *+9476950* Blog - *http://nallaa.wordpress.com http://nallaa.wordpress.com/* -- With regards, *Manu*ranga Perera. phone : 071 7 70 20 50 mail : m...@wso2.com ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- Thanks Regards, Prabath Twitter : @prabath LinkedIn : http://www.linkedin.com/in/prabathsiriwardena Mobile : +1 650 625 7950 http://blog.facilelogin.com http://blog.api-security.org ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] [BAM] [Security] Securing REST API
Hi, I guess our admin services are also accessible via basic auth, isn't it? .. We just thought, as a convenience method for the end user, they can use their username/password to access our API if required. So basically, if using OAuth, other than using SAML2 bearer token grant type or anything similar, is it possible to use the login username/password to our dashboard UI to generate the access token with resource owner credentials grant type maybe? .. Cheers, Anjana. On Tue, Jan 27, 2015 at 2:42 PM, Supun Malinga sup...@wso2.com wrote: Hi Gihan, IMO using basic auth will make it vulnerable for dos attacks and less secure. So you need to think this thru. There is a possibility of authenticating already logged in users via the cookie data. But we will need to write a new cookie based oauth grant type for this. AFAIK we don't have such a grant type yet (Correct me if I'm wrong). On your latest note I think you can use the SAML2 grant type [0]. [0] https://docs.wso2.com/display/AM170/Token+API#TokenAPI-ExchangingSAML2bearertokenswithOAuth2(SAMLextensiongranttype) thanks, On Tue, Jan 27, 2015 at 1:48 PM, Gihan Anuruddha gi...@wso2.com wrote: No. We thought, it might convenient for the end user if we provide basic auth capabilities. We will integrate OAuth functionalities for our REST APIs. Regarding our requirement, We have multiple dashboards that validate the user through single login page. How can we do the backend API communication? Regards, Gihan On Tue, Jan 27, 2015 at 12:02 PM, Sumedha Rubasinghe sume...@wso2.com wrote: Any particular reason for securing product APIs using Basic Auth? Products like G-Reg, CDM are using OAuth 2.0 tokens for this instead. On Tue, Jan 27, 2015 at 11:53 AM, Gihan Anuruddha gi...@wso2.com wrote: Hi All, We are going to use a set of REST API [1] to communicate with the data layer. Basically, we are securing these REST APIs with basic auth. But we wanted to communicate with these REST APIs with already logged in user as well. Reason is we have a plan to use these REST API in our Message console dashboard and we want to have SSO kind of a logging solution for these dashboards without any individual login pages. So is it possible to use existing HTTP session cookie and authenticate REST API calls or do we have to use OAuth with some specific grant types? Appreciate your inputs here? [1] - [Architecture] BAM 3.0 REST APIs for AnalyticsDataService / Indexing / Search -- W.G. Gihan Anuruddha Senior Software Engineer | WSO2, Inc. M: +94772272595 -- /sumedha m: +94 773017743 b : bit.ly/sumedha -- W.G. Gihan Anuruddha Senior Software Engineer | WSO2, Inc. M: +94772272595 ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- Supun Malinga, Senior Software Engineer, WSO2 Inc. http://wso2.com email: sup...@wso2.com sup...@wso2.com mobile: +94 (0)71 56 91 321 ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] BAM 3.0 Data Layer Implementation / RDBMS / Distributed Indexing / Search
Hi Nirmal, Yeah, it can be re-used, if it only meets your criteria though. There are specific functionality we expect from this data layer, for example, look at the AnalyticsRecordStore interface, which contains the basic record storage, with timestamp and pagination support. Basically, we can't make it too generic also. We can discuss more and see. Cheers, Anjana. On Mon, Jan 26, 2015 at 11:10 AM, Nirmal Fernando nir...@wso2.com wrote: Hi Anjana, Isn't this a generic interface to talk to a back-end data store? If so, do you think this can be reused in other products? In ML, we have a similar use-case where we need to talk to a generic data layer to store the models that are generated. On Wed, Dec 10, 2014 at 1:37 PM, Anjana Fernando anj...@wso2.com wrote: Hi, I've finished the initial implementation of $subject. This basically contains the standard interfaces we use to plug-in different data sources as the back-end record storage, and for indexing purposes. These pluggable data sources are called Analytics Data Sources here, where from a configuration file, you can give the implementation class and the properties required for the initialization. The first implementation of this is done, which is the RDBMS implementation. It basically stores all the records and other data in a relational database, and any type of database can be supported via a configuration file, which gives the query templates used to define a standard set of actions. At the moment, H2 and MySQL query templates have been tested, and we will be adding the rest of popular RDBMS templates as well. The RDBMS AnalyticsDataSource implementation detects the query template by looking at the database connection information, retrieved from the data source (e.g. mentioned in master-datasources.xml), and automatically switches to that mode, so the user basically doesn't have to do anything when configuring. Also, inside the AnalyticsDataSource interface, there is a FileSystem interface you need to implement for your data source implementation, which is basically used for indexing, which is done by Lucene. We use Lucene indexes as index shards for a distributed index and search. So with the sharding approach, we can add more nodes to our cluster to improve the indexing performance, and for storage addition. Basically, provided the backend storage is scalable, the index operations also would be scalable in the same manner. But the limit we first hit is the processing requirements, and the random data access and locking requirements for each shard, so for a typical database system, just by adding new BAM nodes, I'm hoping the indexing performance will almost increase linearly. The AnalyicsDataSource implementations are finally used by a component called AnalyticsDataService, which is the interface seen by clients, and has the indexing related operations with the record store functionality exposed through AnalyticsDataSource. This interface can be looked up as an OSGi service, and we plan on also exposing these functionality as a JAX-RS service. The general design, and documentation on the test cases can be found here at [1] and [2], and the source code at [3]. I will be doing some further performance tests, by integrating this to the product properly, specially the distributed search, and will provide the results here. For the moment, we have a few performance tests as unit tests in the modules. This implementation will be first used by the log analysis implementation done by Gimantha. And we are planning on writing further AnalyticsDataSource implementations for this, such as MongoDB, HBase etc.. There will be separate notes on those. [1] https://docs.google.com/a/wso2.com/spreadsheets/d/10mHRE6FEgF6wDZ-LSBx18zL8ZcIay5ZIhb8MIk7pfeg/edit#gid=0 [2] https://docs.google.com/a/wso2.com/spreadsheets/d/1iXoZ8BzaefN3EGOL05y5aUX6SLZH7Bu8YM4bF3xOSvQ/edit#gid=0 [3] https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics Cheers, Anjana. -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- Thanks regards, Nirmal Senior Software Engineer- Platform Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/ -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] BAM 3.0 REST APIs for AnalyticsDataService / Indexing / Search
-- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
[Architecture] Replacing BAM Toolbox Format with CAR
Hi everyone, From BAM 3.0, we are thinking of replacing the toolbox packaging to CAR files. The main motive for this came with CEP also requiring a packaging format for their artifacts. So either, they also needed to use our toolbox format, or else, go to a CAR packaging format, which is used with other artifacts in the platform. So basically, as I feel, our artifacts like, stream definitions, analytics scripts, UI pages are also in the same category as ESBs sequences, proxies, endpoints etc.. so if they also don't use a new packaging format, but rather use CAR, we also don't have a special reason to have a separate one. So for these reasons, and also not to have too many packaging formats in the platform, we also though of going with the standard model with CAR. CEP have already suggested this for their artifacts in the thread [1]. If there are any concerns, please shout. [1] [Architecture] cApp deployer support for WSO2 CEP Cheers, Anjana. -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] RFC: Building a Generic Configurable UI Gadget for Analytics
Hi, I guess, for BAM 3.0, this can be the base for our eventual KPI implementation as well. We will just need some additional functionality to provide some limits to the data/visualizations we are having, and to show it in an appropriate way, and to trigger alerts etc.. Looking forward to checking out the initial implementation of this, so probably the BAM team can enhance it with the other required features. Cheers, Anjana. On Mon, Dec 8, 2014 at 7:42 PM, Srinath Perera srin...@wso2.com wrote: Currently to visualize the data, users have to write their own gadgets. If a advanced user this is OK, but not for all. Specially, things like drill downs need complicated planning. I believe it is possible to start with data in tabular form, and write a generic Gadget that let user configure and create his own data chart with filters and drill downs. Chart could look like following ( some of the controls can be hidden under configure button) Lets work though an example. 1) Key idea is that we load data to the Gadget as a table (always). Following can be a example data. *Country* *Year* *GDP* *Population* *LifeExpect* Sri Lanka 2004 20 19435000 73 Sri Lanka 2005 24 19644000 73 Sri Lanka 2006 28 19858000 73 Sri Lanka 2007 32 20039000 73 2) When Gadget is loaded, it shows the data as a table. User can select and add a data type and fields. Following are some example. 1. Line - two Numerical fields 2. Bar - one numerical, one categorical field 3. Scatter - two numerical fields 4. Map - Location field + categorical or numerical field 5. Graph - two categorical or string fields that provide links 3) Let user add more information to the chart using other fields in the table 1. Add color (Categorical field) or shade (numerical field) to the plot (e.g. Use different color for each country) 2. Point Size - Numerical field (e.g. Adjust the point size in the scatter plot according to the population) 3. Label - any field 4) Then he can add filters based on a variable. Then the chart will have sliders (for numerical data) and tick buttons (for categorical data). When those sliders are changed they will change the chart. 5) Final step is define drill downs. Drill downs are done using two columns in the table that has hierarchical relationships. (e.g. Country and State fields, Year and month fields) . We need users to select two of those fields and tell us about relationships and then we can code the support of drill downs. When above steps are done, user save configs and save it in the DataViz store as a visualisation, so others can pull it and use it. This will not cover all cases, but IMO it will cover 80% and also a very good tool for demos etc. Please comment --Srinath -- Blog: http://srinathsview.blogspot.com twitter:@srinath_perera Site: http://people.apache.org/~hemapani/ Photos: http://www.flickr.com/photos/hemapani/ Phone: 0772360902 -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] RFC: Building a Generic Configurable UI Gadget for Analytics
More correctly, most probably BAM 3.1 plans :) .. Cheers, Anjana. On Mon, Dec 15, 2014 at 7:16 AM, Anjana Fernando anj...@wso2.com wrote: Hi, I guess, for BAM 3.0, this can be the base for our eventual KPI implementation as well. We will just need some additional functionality to provide some limits to the data/visualizations we are having, and to show it in an appropriate way, and to trigger alerts etc.. Looking forward to checking out the initial implementation of this, so probably the BAM team can enhance it with the other required features. Cheers, Anjana. On Mon, Dec 8, 2014 at 7:42 PM, Srinath Perera srin...@wso2.com wrote: Currently to visualize the data, users have to write their own gadgets. If a advanced user this is OK, but not for all. Specially, things like drill downs need complicated planning. I believe it is possible to start with data in tabular form, and write a generic Gadget that let user configure and create his own data chart with filters and drill downs. Chart could look like following ( some of the controls can be hidden under configure button) Lets work though an example. 1) Key idea is that we load data to the Gadget as a table (always). Following can be a example data. *Country* *Year* *GDP* *Population* *LifeExpect* Sri Lanka 2004 20 19435000 73 Sri Lanka 2005 24 19644000 73 Sri Lanka 2006 28 19858000 73 Sri Lanka 2007 32 20039000 73 2) When Gadget is loaded, it shows the data as a table. User can select and add a data type and fields. Following are some example. 1. Line - two Numerical fields 2. Bar - one numerical, one categorical field 3. Scatter - two numerical fields 4. Map - Location field + categorical or numerical field 5. Graph - two categorical or string fields that provide links 3) Let user add more information to the chart using other fields in the table 1. Add color (Categorical field) or shade (numerical field) to the plot (e.g. Use different color for each country) 2. Point Size - Numerical field (e.g. Adjust the point size in the scatter plot according to the population) 3. Label - any field 4) Then he can add filters based on a variable. Then the chart will have sliders (for numerical data) and tick buttons (for categorical data). When those sliders are changed they will change the chart. 5) Final step is define drill downs. Drill downs are done using two columns in the table that has hierarchical relationships. (e.g. Country and State fields, Year and month fields) . We need users to select two of those fields and tell us about relationships and then we can code the support of drill downs. When above steps are done, user save configs and save it in the DataViz store as a visualisation, so others can pull it and use it. This will not cover all cases, but IMO it will cover 80% and also a very good tool for demos etc. Please comment --Srinath -- Blog: http://srinathsview.blogspot.com twitter:@srinath_perera Site: http://people.apache.org/~hemapani/ Photos: http://www.flickr.com/photos/hemapani/ Phone: 0772360902 -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
[Architecture] BAM 3.0 Data Layer Implementation / RDBMS / Distributed Indexing / Search
Hi, I've finished the initial implementation of $subject. This basically contains the standard interfaces we use to plug-in different data sources as the back-end record storage, and for indexing purposes. These pluggable data sources are called Analytics Data Sources here, where from a configuration file, you can give the implementation class and the properties required for the initialization. The first implementation of this is done, which is the RDBMS implementation. It basically stores all the records and other data in a relational database, and any type of database can be supported via a configuration file, which gives the query templates used to define a standard set of actions. At the moment, H2 and MySQL query templates have been tested, and we will be adding the rest of popular RDBMS templates as well. The RDBMS AnalyticsDataSource implementation detects the query template by looking at the database connection information, retrieved from the data source (e.g. mentioned in master-datasources.xml), and automatically switches to that mode, so the user basically doesn't have to do anything when configuring. Also, inside the AnalyticsDataSource interface, there is a FileSystem interface you need to implement for your data source implementation, which is basically used for indexing, which is done by Lucene. We use Lucene indexes as index shards for a distributed index and search. So with the sharding approach, we can add more nodes to our cluster to improve the indexing performance, and for storage addition. Basically, provided the backend storage is scalable, the index operations also would be scalable in the same manner. But the limit we first hit is the processing requirements, and the random data access and locking requirements for each shard, so for a typical database system, just by adding new BAM nodes, I'm hoping the indexing performance will almost increase linearly. The AnalyicsDataSource implementations are finally used by a component called AnalyticsDataService, which is the interface seen by clients, and has the indexing related operations with the record store functionality exposed through AnalyticsDataSource. This interface can be looked up as an OSGi service, and we plan on also exposing these functionality as a JAX-RS service. The general design, and documentation on the test cases can be found here at [1] and [2], and the source code at [3]. I will be doing some further performance tests, by integrating this to the product properly, specially the distributed search, and will provide the results here. For the moment, we have a few performance tests as unit tests in the modules. This implementation will be first used by the log analysis implementation done by Gimantha. And we are planning on writing further AnalyticsDataSource implementations for this, such as MongoDB, HBase etc.. There will be separate notes on those. [1] https://docs.google.com/a/wso2.com/spreadsheets/d/10mHRE6FEgF6wDZ-LSBx18zL8ZcIay5ZIhb8MIk7pfeg/edit#gid=0 [2] https://docs.google.com/a/wso2.com/spreadsheets/d/1iXoZ8BzaefN3EGOL05y5aUX6SLZH7Bu8YM4bF3xOSvQ/edit#gid=0 [3] https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics Cheers, Anjana. -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Generic RDBMS Output adapter support
mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- Software Engineer WSO2 Inc.; http://wso2.com http://www.google.com/url?q=http%3A%2F%2Fwso2.comsa=Dsntz=1usg=AFQjCNEZvyc0uMD1HhBaEGCBxs6e9fBObg lean.enterprise.middleware mobile: *+94728671315 %2B94728671315* -- Software Engineer WSO2 Inc.; http://wso2.com http://www.google.com/url?q=http%3A%2F%2Fwso2.comsa=Dsntz=1usg=AFQjCNEZvyc0uMD1HhBaEGCBxs6e9fBObg lean.enterprise.middleware mobile: *+94728671315 %2B94728671315* ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Generic RDBMS Output adapter support
Hi, An approach we can follow is, to give query templates from a configuration file, so we don't have to embed/hard-code any queries in the code. I'm using this approach in our new BAM RDBMS connector implementation, where I'm going to use an XML configuration to have separate sections for each database type, the required queries. This implementation can be found here [1], specifically look for something like H2FileDBAnalyticsDataSourceTest, MySQLInnoDBAnalyticsDataSourceTest, and QueryConfiguration classes. QueryConfiguration will actually be converted to an JAXB mapping class in the future to represent a section in the source XML file that has the mappings. Reading from the actual XML configuration file is not yet done in my code. [1] https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics Cheers, Anjana. On Fri, Nov 28, 2014 at 5:23 PM, Sriskandarajah Suhothayan s...@wso2.com wrote: The main issue is that we have to solve is the ability to handle different syntax. Please have a look at DSS and Hive Table definitions (from BAM) they may help. Suho On Fri, Nov 28, 2014 at 4:49 PM, Damith Wickramasinghe dami...@wso2.com wrote: Hi, Currently we have the support only for Mysql and it is decided to implement a generic adapter to support any RDBMS database. For now adapter implementation will be focused on supporting Oracle, Mysql and H2. I will update the thread on decided architecture for the said requirement soon. Any feedbacks on the requirement will be greatly appreciated. Regards, Damith. -- Software Engineer WSO2 Inc.; http://wso2.com http://www.google.com/url?q=http%3A%2F%2Fwso2.comsa=Dsntz=1usg=AFQjCNEZvyc0uMD1HhBaEGCBxs6e9fBObg lean.enterprise.middleware mobile: *+94728671315 %2B94728671315* -- *S. Suhothayan* Technical Lead Team Lead of WSO2 Complex Event Processor *WSO2 Inc. *http://wso2.com * http://wso2.com/* lean . enterprise . middleware *cell: (+94) 779 756 757 %28%2B94%29%20779%20756%20757 | blog: http://suhothayan.blogspot.com/ http://suhothayan.blogspot.com/twitter: http://twitter.com/suhothayan http://twitter.com/suhothayan | linked-in: http://lk.linkedin.com/in/suhothayan http://lk.linkedin.com/in/suhothayan* ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] RFC: Doing Bulk Events Updates to HDFS instead of Cassandra
Hi Sanjiva, On Thu, Nov 6, 2014 at 4:01 PM, Sanjiva Weerawarana sanj...@wso2.com wrote: Anjana I think the idea was for the file system - HDFS upload to happen via a simple cron job type thing. Even so, we will be just moving the problem to another area, the overall effort done by that hardware is still the same (writing to disk, reading it back, write it to network). That is, even though we can goto very a high throughput initially by writing it to the local disk at first, later on we have to read it back and write it to HDFS via the network, which is the slower part of our operation. So if we continue to load the machine with an extreme throughput, you will eventually lose space in that disk. Cheers, Anjana. On Wed, Nov 5, 2014 at 9:19 AM, Anjana Fernando anj...@wso2.com wrote: Hi Srinath, Wouldn't it better, if we just make the batch size bigger, that is, lets just have a sizable local in-memory store, something probably close to 64MB, which is the default HDFS block size, and only after this is filled, or if the receiver is idle maybe, we can flush the buffer. I was just thinking, writing to the file system first itself will be expensive, where there are additional steps of writing all the records to the local file system and again reading it back, and then finally writing it to HDFS, and of course, again having a network file system would be an overhead, and not to mention the implementation/configuration complications that will come with this. IMHO, we should try to make these scenarios as simple as possible. I'm doing our new BAM data layer implementations here [1], where I'm almost done with an RDBMS implementation, doing some refactoring now (mail on this yet to come :)), I can also do an HDFS one after that and check it. [1] https://github.com/wso2/carbon-analytics/tree/master/components/xanalytics Cheers, Anjana. On Tue, Nov 4, 2014 at 6:56 PM, Srinath Perera srin...@wso2.com wrote: Hi All, Following came out of chat with Sanjiva on a scenario involve very large number of events coming into BAM. Currently we use Cassandra to store the events and number we got out of it has not been great and Cassandra need too much attention to get to those number. With Cassandra (or any DB) we write data as records. We can batch it, but still amount of data in one IO operation is small. In comparison, file transfers are much much faster and that is fastest way to get some data from A to B. So I am proposing to write the events that comes into a local file in the Data Receiver, and periodically append them to a HDFS file. We can arrange data in a folder by stream and files by timestamp (e.g. 1h data go to a new file), so we can selectively pull and process data using Hive. (We can use something like https://github.com/OpenHFT/Chronicle-Queue to write data to disk). If user needs avoid losing any messages at all in case of a disk failure, either he can have a SAN or NTFS or can run two replicas of receivers (we should write some code so only one of the receivers will actually put data to HDFS). Coding wise, this should not be too hard. I am sure this will be factor of time faster than Cassandra (of course we need to do a PoC and verify). WDYT? --Srinath -- Blog: http://srinathsview.blogspot.com twitter:@srinath_perera Site: http://people.apache.org/~hemapani/ Photos: http://www.flickr.com/photos/hemapani/ Phone: 0772360902 -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware -- Sanjiva Weerawarana, Ph.D. Founder, Chairman CEO; WSO2, Inc.; http://wso2.com/ email: sanj...@wso2.com; office: (+1 650 745 4499 | +94 11 214 5345) x5700; cell: +94 77 787 6880 | +1 408 466 5099; voip: +1 650 265 8311 blog: http://sanjiva.weerawarana.org/; twitter: @sanjiva Lean . Enterprise . Middleware -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] RFC: Doing Bulk Events Updates to HDFS instead of Cassandra
Hi Srinath, I think that example is a bit flawed :) .. I didn't mean to compare Cassandra with the HDFS case here, I know Cassandra is far more complicated than the HDFS operations, where the data operations in HDFS is very simple, and I've a feeling, that with that much small events, it may have turned into an CPU bound operation rather than I/O bound, because of the processing required for each event (maybe their batch impl. is crappy), that maybe why even the bigger batch is also slow. OS level buffers you said, yeah, so they efficiently batch the physical disk writes, in the memory, and flush it out later. But that's a different thing, here, we are just writing to the disk and reading it back again, so as I see, we are just using the local disk as a buffer, where we could just do this in the RAM. Basically, build up sizable chunks in memory, and write to HDFS. So we lose the, even though comparably little, overhead of writing and reading to the local disk, where still, the bottleneck would be writing the data out of the network, to a remote server's disk somewhere. Simply put, this direct HDFS operation should be able to saturate the network link we have, even if we can't, we can ask ourself, how can writing it to the local disk and reading it again, optimize it more. Cheers, Anjana. On Thu, Nov 6, 2014 at 6:15 PM, Srinath Perera srin...@wso2.com wrote: Of course we need to try it out and verify, I am just making a case that we should try it out :) Also, RDBMS should be default as most scenarios can be handled with DBs and those is no reason to make everyone's life complicated. --Srinath On Fri, Nov 7, 2014 at 7:44 AM, Srinath Perera srin...@wso2.com wrote: 1) Anjana you assuming the bandwidth is the bottleneck. Let me give an example. With sequential reads and writes, a HDD can do 100MB/sec and 1G network can do 50 MB/sec But BAM best number we have seen is about 40k event/sec (that with 4 machines or so, lets assume one machine). Lets assume 20 bytes events. Then it will be doing 1MB/sec. Problem is Cassandra break data to lot of small operations losing OS level buffer to buffer transfers files transfers can do. I have tried increasing batch size for cassandra, which help with smaller batches. But after about few thousand operations in the same batch, things start get much slower. Best numbers will come when we run two receivers instead of NFS. 2) Frank, this is analytics data. So it is read only and most cases we need only time based queries with less resolution (15min smallest resolution is fine for most case). This to say run this batch query on last hour of data so on. However, we have some scenarios where we do Adhoc queries for things like activity monitoring. Those would not work for those and we will have to run a batch job to push that data to RDBMS or Solar etc. Anjana, we need to discuss this. But also there are lot of usecases to receive and write the event to disk as soon as possible and later run MapReduce on top them. For those above will work. --Srinath On Fri, Nov 7, 2014 at 7:23 AM, Anjana Fernando anj...@wso2.com wrote: Hi Sanjiva, On Thu, Nov 6, 2014 at 4:01 PM, Sanjiva Weerawarana sanj...@wso2.com wrote: Anjana I think the idea was for the file system - HDFS upload to happen via a simple cron job type thing. Even so, we will be just moving the problem to another area, the overall effort done by that hardware is still the same (writing to disk, reading it back, write it to network). That is, even though we can goto very a high throughput initially by writing it to the local disk at first, later on we have to read it back and write it to HDFS via the network, which is the slower part of our operation. So if we continue to load the machine with an extreme throughput, you will eventually lose space in that disk. Cheers, Anjana. On Wed, Nov 5, 2014 at 9:19 AM, Anjana Fernando anj...@wso2.com wrote: Hi Srinath, Wouldn't it better, if we just make the batch size bigger, that is, lets just have a sizable local in-memory store, something probably close to 64MB, which is the default HDFS block size, and only after this is filled, or if the receiver is idle maybe, we can flush the buffer. I was just thinking, writing to the file system first itself will be expensive, where there are additional steps of writing all the records to the local file system and again reading it back, and then finally writing it to HDFS, and of course, again having a network file system would be an overhead, and not to mention the implementation/configuration complications that will come with this. IMHO, we should try to make these scenarios as simple as possible. I'm doing our new BAM data layer implementations here [1], where I'm almost done with an RDBMS implementation, doing some refactoring now (mail on this yet to come :)), I can also do an HDFS one after that and check it. [1] https
Re: [Architecture] RFC: Doing Bulk Events Updates to HDFS instead of Cassandra
On Thu, Nov 6, 2014 at 7:19 PM, Srinath Perera srin...@wso2.com wrote: Ah sorry, I misunderstood. Buffering to memory and writing to HDFS will be faster. By writing to disk, you reduce a probability of losing that data by making it bit slower. However, if you are running two receivers, probability you will loose data is less anyway. So I guess buffer in memory and writing to HDFS would be OK. Great!, yeah, true. In either approach, and even now, there's anyway a high probability of losing some events in the case of a failure of the server, because, most often there will be few events in the publisher queue, other in-memory buffers, the OS I/O buffers for the file scenario etc.. To be totally reliable, we will have to use a transport like JMS to archive that. Cheers, Anjana. --Srinath On Fri, Nov 7, 2014 at 8:24 AM, Anjana Fernando anj...@wso2.com wrote: Hi Srinath, I think that example is a bit flawed :) .. I didn't mean to compare Cassandra with the HDFS case here, I know Cassandra is far more complicated than the HDFS operations, where the data operations in HDFS is very simple, and I've a feeling, that with that much small events, it may have turned into an CPU bound operation rather than I/O bound, because of the processing required for each event (maybe their batch impl. is crappy), that maybe why even the bigger batch is also slow. OS level buffers you said, yeah, so they efficiently batch the physical disk writes, in the memory, and flush it out later. But that's a different thing, here, we are just writing to the disk and reading it back again, so as I see, we are just using the local disk as a buffer, where we could just do this in the RAM. Basically, build up sizable chunks in memory, and write to HDFS. So we lose the, even though comparably little, overhead of writing and reading to the local disk, where still, the bottleneck would be writing the data out of the network, to a remote server's disk somewhere. Simply put, this direct HDFS operation should be able to saturate the network link we have, even if we can't, we can ask ourself, how can writing it to the local disk and reading it again, optimize it more. Cheers, Anjana. On Thu, Nov 6, 2014 at 6:15 PM, Srinath Perera srin...@wso2.com wrote: Of course we need to try it out and verify, I am just making a case that we should try it out :) Also, RDBMS should be default as most scenarios can be handled with DBs and those is no reason to make everyone's life complicated. --Srinath On Fri, Nov 7, 2014 at 7:44 AM, Srinath Perera srin...@wso2.com wrote: 1) Anjana you assuming the bandwidth is the bottleneck. Let me give an example. With sequential reads and writes, a HDD can do 100MB/sec and 1G network can do 50 MB/sec But BAM best number we have seen is about 40k event/sec (that with 4 machines or so, lets assume one machine). Lets assume 20 bytes events. Then it will be doing 1MB/sec. Problem is Cassandra break data to lot of small operations losing OS level buffer to buffer transfers files transfers can do. I have tried increasing batch size for cassandra, which help with smaller batches. But after about few thousand operations in the same batch, things start get much slower. Best numbers will come when we run two receivers instead of NFS. 2) Frank, this is analytics data. So it is read only and most cases we need only time based queries with less resolution (15min smallest resolution is fine for most case). This to say run this batch query on last hour of data so on. However, we have some scenarios where we do Adhoc queries for things like activity monitoring. Those would not work for those and we will have to run a batch job to push that data to RDBMS or Solar etc. Anjana, we need to discuss this. But also there are lot of usecases to receive and write the event to disk as soon as possible and later run MapReduce on top them. For those above will work. --Srinath On Fri, Nov 7, 2014 at 7:23 AM, Anjana Fernando anj...@wso2.com wrote: Hi Sanjiva, On Thu, Nov 6, 2014 at 4:01 PM, Sanjiva Weerawarana sanj...@wso2.com wrote: Anjana I think the idea was for the file system - HDFS upload to happen via a simple cron job type thing. Even so, we will be just moving the problem to another area, the overall effort done by that hardware is still the same (writing to disk, reading it back, write it to network). That is, even though we can goto very a high throughput initially by writing it to the local disk at first, later on we have to read it back and write it to HDFS via the network, which is the slower part of our operation. So if we continue to load the machine with an extreme throughput, you will eventually lose space in that disk. Cheers, Anjana. On Wed, Nov 5, 2014 at 9:19 AM, Anjana Fernando anj...@wso2.com wrote: Hi Srinath, Wouldn't it better, if we just make the batch size bigger, that is, lets just have
Re: [Architecture] Integrating ntask component into ESB
I hope you understood, what I told is, not what you mentioned earlier, you do not have to store anything in the registry, and the ESB does not have to load anything themselves. The tasks will be automatically loaded. Cheers, Anjana. On Wed, Oct 1, 2014 at 12:00 PM, Malaka Silva mal...@wso2.com wrote: Hi Anjana, Yes that is the plan. Will be implementing this at the task adapter level. Best Regards, Malaka On Wed, Oct 1, 2014 at 11:23 AM, Anjana Fernando anj...@wso2.com wrote: Hi Malaka, Kasun sometime earlier asked me about this; And basically, from ntask, the tasks will automatically start up when the server is started up. It does not wait till a tenant is loaded or anything like that, it is automatically handled by ntask. If the task itself wants some tenant specific functionalities, the task implementation can load that. Basically, the ESB has an task adapter implementation, which bridges the ntask task interface and ESB task interfaces, in the adaptor, you can write the code to load any tenant information as needed. Cheers, Anjana. On Wed, Oct 1, 2014 at 8:58 AM, Malaka Silva mal...@wso2.com wrote: Hi All, At the time of inbound EP code review Azeez has identified an issue with ntask integration in tenant mode. The problem is when a task is schedules in tenant mode this will not run until the tenant is loaded. Following is the solution I'm planning to implement. When a task is scheduled it'll put a entry in the registry, under tenant specific structure. At the time ESB starts, we are going to load the tenant, if they have one or more tasks scheduled. Above will solve the task implementation and polling inbound EPs issue in tenant mode. But the issue will still exists for listening Inbound EPs. Let me know your feedback on this? Best Regards, Malaka On Tue, May 20, 2014 at 5:37 PM, Ishan Jayawardena is...@wso2.com wrote: We have implemented the $subject and it is available in the ESB's git repo. As we initially planned we will be releasing this new task manager with our next release. Thanks, Ishan. On Mon, Apr 21, 2014 at 5:27 PM, Ishan Jayawardena is...@wso2.com wrote: Today we had a discussion to review the current implementation of $subject. We have developed two task providers/managers to manage quartz and ntask based task types. The correct task manager gets registered according to the synapse configuration, during the startup. When a user deploys a new task through the UI, Synapse schedules a task in the registered task manager. Although each task manager is capable of executing its own task type, currently none of the task managers can execute tasks of a different type. Due to this, the new ntask task manager cannot execute existing tasks such as Synapse MessageInjector. We cannot support this yet without Synapse having a dependency to ntask component. At the moment we are looking into a solution to this problem. At the same time, we are working on the inbound endpoint (VFS) to make it reuse the same ntask provider that we developed. Thanks, Ishan. On Mon, Apr 21, 2014 at 9:42 AM, Ishan Jayawardena is...@wso2.com wrote: Hi Kasun, We managed to solve the issue and now we are working on the final stage of the development. We will complete this within this week. Thanks, Ishan. On Tue, Apr 15, 2014 at 9:48 AM, Kasun Indrasiri ka...@wso2.com wrote: Did you check whether the required packages are osgi imported properly? On a separate note, what's the ETA of a working deliverable of this? On Sun, Apr 13, 2014 at 12:43 PM, Anjana Fernando anj...@wso2.com wrote: Obviously, check if that class is available and where it is referred from in the code. As I remember, there isn't a package called ntaskint, so check where this is coming from. Cheers, Anjana. On Sat, Apr 12, 2014 at 6:46 AM, Ishan Jayawardena is...@wso2.com wrote: We developed the quartz task manager and we are currently working on the ntask task manager. While developing the task handling component that uses ntask, we observed that we cannot schedule a task in it due to a class not found error. See the below error message. The ntask component (which is used by the component that we are currently writing) cannot load the actual task implementation. Does anyone know how to get rid of this? java.lang.ClassNotFoundException: class org.wso2.carbon.ntaskint.core.Task at org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:501) at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:421) at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:412) at org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.loadClass(DefaultClassLoader.java:107) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at org.wso2.carbon.ntask.core.impl.TaskQuartzJobAdapter.execute(TaskQuartzJobAdapter.java:58
Re: [Architecture] Rule Based Task Location Resolver
On Wed, Oct 1, 2014 at 9:32 AM, Chanika Geeganage chan...@wso2.com wrote: What will happen if the task is not matched with any of the rule mentioned in the configuration? It will fall back to the first sever that is available, basically, the task scheduling will not fail, just because a rule is not matched, it will make the best effort. Cheers, Anjana. Thanks On Mon, Sep 29, 2014 at 5:37 AM, Anjana Fernando anj...@wso2.com wrote: Hi, I've added $subject to the ntask component, to give more control onto where scheduled tasks can be scheduled in a cluster. TaskLocationResolvers are used in ntask to basically to find a location in the available set of nodes, given the information about the environment. Earlier we had out of the box task location resolvers like RandomTaskLocationResolver and RoundRobinTaskLocationResolver. The new org.wso2.carbon.ntask.core.impl.RuleBasedLocationResolver has the following configuration to be used tasks-config.xml:- defaultLocationResolver locationResolverClassorg.wso2.carbon.ntask.core.impl.RuleBasedLocationResolver/locationResolverClass properties property name=rule-1HIVE_TASK,HTTP_SCRIPT*,192.168.1.*/property property name=rule-2HIVE_TASK,.*,192.168.2.*/property property name=rule-5.*,.*,.*/property /properties /defaultLocationResolver Basically, here, a rule section contains [task-type-pattern],[task-name-pattern],[address-pattern], and a specific task checked if its task type matches the task-type-pattern, then it's task name to task-name-pattern and then it checks the available nodes' addresses against address-pattern, and if it finds 1 or many, it selects on of those addresses in a round robin manner. The property names denotes the sequence the rules will be evaluated, i.e. rule-1 is checked before rule-2. With this task location resolver, we can implement scenarios such as executing tasks in a specific zone at first, then only fail-over to another zone, if the earlier one is not available. This code has been added to the 4.2.0 branch and also to GitHub. Cheers, Anjana. -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- Best Regards.. Chanika Geeganage Software Engineer WSO2, Inc.; http://wso2.com ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Integrating ntask component into ESB
Hi Malaka, Kasun sometime earlier asked me about this; And basically, from ntask, the tasks will automatically start up when the server is started up. It does not wait till a tenant is loaded or anything like that, it is automatically handled by ntask. If the task itself wants some tenant specific functionalities, the task implementation can load that. Basically, the ESB has an task adapter implementation, which bridges the ntask task interface and ESB task interfaces, in the adaptor, you can write the code to load any tenant information as needed. Cheers, Anjana. On Wed, Oct 1, 2014 at 8:58 AM, Malaka Silva mal...@wso2.com wrote: Hi All, At the time of inbound EP code review Azeez has identified an issue with ntask integration in tenant mode. The problem is when a task is schedules in tenant mode this will not run until the tenant is loaded. Following is the solution I'm planning to implement. When a task is scheduled it'll put a entry in the registry, under tenant specific structure. At the time ESB starts, we are going to load the tenant, if they have one or more tasks scheduled. Above will solve the task implementation and polling inbound EPs issue in tenant mode. But the issue will still exists for listening Inbound EPs. Let me know your feedback on this? Best Regards, Malaka On Tue, May 20, 2014 at 5:37 PM, Ishan Jayawardena is...@wso2.com wrote: We have implemented the $subject and it is available in the ESB's git repo. As we initially planned we will be releasing this new task manager with our next release. Thanks, Ishan. On Mon, Apr 21, 2014 at 5:27 PM, Ishan Jayawardena is...@wso2.com wrote: Today we had a discussion to review the current implementation of $subject. We have developed two task providers/managers to manage quartz and ntask based task types. The correct task manager gets registered according to the synapse configuration, during the startup. When a user deploys a new task through the UI, Synapse schedules a task in the registered task manager. Although each task manager is capable of executing its own task type, currently none of the task managers can execute tasks of a different type. Due to this, the new ntask task manager cannot execute existing tasks such as Synapse MessageInjector. We cannot support this yet without Synapse having a dependency to ntask component. At the moment we are looking into a solution to this problem. At the same time, we are working on the inbound endpoint (VFS) to make it reuse the same ntask provider that we developed. Thanks, Ishan. On Mon, Apr 21, 2014 at 9:42 AM, Ishan Jayawardena is...@wso2.com wrote: Hi Kasun, We managed to solve the issue and now we are working on the final stage of the development. We will complete this within this week. Thanks, Ishan. On Tue, Apr 15, 2014 at 9:48 AM, Kasun Indrasiri ka...@wso2.com wrote: Did you check whether the required packages are osgi imported properly? On a separate note, what's the ETA of a working deliverable of this? On Sun, Apr 13, 2014 at 12:43 PM, Anjana Fernando anj...@wso2.com wrote: Obviously, check if that class is available and where it is referred from in the code. As I remember, there isn't a package called ntaskint, so check where this is coming from. Cheers, Anjana. On Sat, Apr 12, 2014 at 6:46 AM, Ishan Jayawardena is...@wso2.com wrote: We developed the quartz task manager and we are currently working on the ntask task manager. While developing the task handling component that uses ntask, we observed that we cannot schedule a task in it due to a class not found error. See the below error message. The ntask component (which is used by the component that we are currently writing) cannot load the actual task implementation. Does anyone know how to get rid of this? java.lang.ClassNotFoundException: class org.wso2.carbon.ntaskint.core.Task at org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:501) at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:421) at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:412) at org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.loadClass(DefaultClassLoader.java:107) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at org.wso2.carbon.ntask.core.impl.TaskQuartzJobAdapter.execute(TaskQuartzJobAdapter.java:58) at org.quartz.core.JobRunShell.run(JobRunShell.java:213) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) Thanks, Ishan. On Mon, Apr 7, 2014 at 9:11 AM
[Architecture] Rule Based Task Location Resolver
Hi, I've added $subject to the ntask component, to give more control onto where scheduled tasks can be scheduled in a cluster. TaskLocationResolvers are used in ntask to basically to find a location in the available set of nodes, given the information about the environment. Earlier we had out of the box task location resolvers like RandomTaskLocationResolver and RoundRobinTaskLocationResolver. The new org.wso2.carbon.ntask.core.impl.RuleBasedLocationResolver has the following configuration to be used tasks-config.xml:- defaultLocationResolver locationResolverClassorg.wso2.carbon.ntask.core.impl.RuleBasedLocationResolver/locationResolverClass properties property name=rule-1HIVE_TASK,HTTP_SCRIPT*,192.168.1.*/property property name=rule-2HIVE_TASK,.*,192.168.2.*/property property name=rule-5.*,.*,.*/property /properties /defaultLocationResolver Basically, here, a rule section contains [task-type-pattern],[task-name-pattern],[address-pattern], and a specific task checked if its task type matches the task-type-pattern, then it's task name to task-name-pattern and then it checks the available nodes' addresses against address-pattern, and if it finds 1 or many, it selects on of those addresses in a round robin manner. The property names denotes the sequence the rules will be evaluated, i.e. rule-1 is checked before rule-2. With this task location resolver, we can implement scenarios such as executing tasks in a specific zone at first, then only fail-over to another zone, if the earlier one is not available. This code has been added to the 4.2.0 branch and also to GitHub. Cheers, Anjana. -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] BAM Performance tests
Looks good! .. so with this we basically gets like 19K TPS average I guess .. In this setup, do we use separate partitions for the commit log and for the data? .. that is what basically recommended for a Cassandra setup. And also, how are we sending data, is it in a client load balanced manner? .. or through a single node? .. please test those scenarios separately, to check how it will affect the performance. And also, can you like reduce the number of Cassandra nodes also and check how it can affect the performance. Basically wants to see, if we can get higher TPS values if we add more nodes, or by any chance, if the opposite effect happens. And also, do mention the replication factor used for this. Cheers, Anjana. On Tue, Sep 2, 2014 at 4:26 PM, Thayalan thaya...@wso2.com wrote: Hi All, Please find below the initial receiver perf test results depicted in a graph. I'm in the process of capturing the throughput with analyzer script running as well. I'll share the results soon. [image: Inline image 2] [image: Inline image 3] Notes: 1. The perf test performed using the DEBS data contains 69008700 events (records) 2. Throughput captured for every 100 events 3. Environment Details: (Perf Cloud, openstack) Deployment Pattern: As per BAM Cluster guide https://docs.wso2.com/display/CLUSTER420/Fully-Distributed%2C+High-Availability+BAM+Setup#Fully-Distributed,High-AvailabilityBAMSetup-Hadoopcluster Machine Configuration: Node1 Node2: 8 Core, 16GB Mem, 160GB HDD Node3, 4 5: 8 Core, 16GB Mem, 160GB HDD 1TB Volume seperately attached mounted for Cassandra Data partition Node1 Node2: BAM Receiver Analzer Noders, Hadoop Master Secondary respect On Fri, Aug 15, 2014 at 6:59 PM, Sinthuja Ragendran sinth...@wso2.com wrote: Hi, I have completed upto puppetizing BAM Receiver configurations, and due to support priorities I wasn't able to work full time on this. Hence cassandra and analyzer confgurations are still pending. Anyhow since I have done with the base confgurations, I should be able to provide the complete puppet scripts by next week. Thanks, Sinthuja. On Fri, Aug 15, 2014 at 6:22 PM, Sanjiva Weerawarana sanj...@wso2.com wrote: Is the setup all automated with Puppet? On Fri, Aug 15, 2014 at 11:26 AM, Thayalan thaya...@wso2.com wrote: Hi Srinath, Due to support priority this has not been started yet. However I've already done the environment set-up in performance cloud. Probably I can share the initial findings by next week. Thanks, Thayalan On Fri, Aug 15, 2014 at 8:58 AM, Srinath Perera srin...@wso2.com wrote: How are we doing with the subject? -- Director, Research, WSO2 Inc. Visiting Faculty, University of Moratuwa Member, Apache Software Foundation Research Scientist, Lanka Software Foundation Blog: http://srinathsview.blogspot.com twitter:@srinath_perera Site: http://people.apache.org/~hemapani/ Photos: http://www.flickr.com/photos/hemapani/ Phone: 0772360902 ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- Sanjiva Weerawarana, Ph.D. Founder, Chairman CEO; WSO2, Inc.; http://wso2.com/ email: sanj...@wso2.com; office: (+1 650 745 4499 | +94 11 214 5345) x5700; cell: +94 77 787 6880 | +1 408 466 5099; voip: +1 650 265 8311 blog: http://sanjiva.weerawarana.org/; twitter: @sanjiva Lean . Enterprise . Middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *Sinthuja Rajendran* Senior Software Engineer http://wso2.com/ WSO2, Inc.:http://wso2.com Blog: http://sinthu-rajan.blogspot.com/ Mobile: +94774273955 ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- Regards, Thayalan Sivapaleswararajah Associate Technical Lead - QA Mob: +94(0)777872485 Tel : +94(0)(11)2145345 Fax : +94(0)(11)2145300 Email: thaya...@wso2.com *Disclaimer*: *This communication may contain privileged or other confidential information and is intended exclusively for the addressee/s. If you are not the intended recipient/s, or believe that you may have received this communication in error, please reply to the sender indicating that fact and delete the copy you received and in addition, you should not print, copy, retransmit, disseminate, or otherwise use the information contained in this communication. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. The sender does not accept liability for any errors or omissions.* -- *Anjana Fernando
Re: [Architecture] Invoke ServerStartupHandlers before start transports
On Sat, Aug 23, 2014 at 6:16 AM, Afkham Azeez az...@wso2.com wrote: Some handlers would need to be called after transports are started. So, we could modify the interface to behave like the Axis2ConfigurationContextObserver, and have pre post transport initialization methods. +1, as I remember, ntask uses this to schedule the actual tasks at the very last moment, and specific task implementations like our data services tasks would require the transports to be available at that time. Cheers, Anjana. On Fri, Aug 22, 2014 at 8:15 PM, Sagara Gunathunga sag...@wso2.com wrote: According to current StartupFinalizerServiceComponent implementation, it calls registered ServerStartupHandlers after starting transports but IMHO it would be better to invoke ServerStartupHandlers before server start any transports. We have a requirement to perform few tasks just before server startup completion but before transport listeners get start. Further by looking at API-M APIManagerStartupPublisher class ( which is one of the implementation of ServerStartupHandler interface) I think it would be much better to add local APIs before start transports. Please refer the patch here[1] [1] - https://github.com/wso2-dev/carbon4-kernel/pull/84 Thanks ! -- Sagara Gunathunga Senior Technical Lead; WSO2, Inc.; http://wso2.com V.P Apache Web Services;http://ws.apache.org/ Linkedin; http://www.linkedin.com/in/ssagara Blog ; http://ssagara.blogspot.com -- *Afkham Azeez* Director of Architecture; WSO2, Inc.; http://wso2.com Member; Apache Software Foundation; http://www.apache.org/ * http://www.apache.org/* *email: **az...@wso2.com* az...@wso2.com * cell: +94 77 3320919 %2B94%2077%203320919 blog: * *http://blog.afkham.org* http://blog.afkham.org *twitter: **http://twitter.com/afkham_azeez* http://twitter.com/afkham_azeez * linked-in: **http://lk.linkedin.com/in/afkhamazeez http://lk.linkedin.com/in/afkhamazeez* *Lean . Enterprise . Middleware* -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Configuring transport and security policy in dataservice config
Hi, Earlier I had the idea that ESB is also doing the same thing, and thought it is easier, where it needed lesser properties, and didn't really think the policy could be re-used by other services, but later I also got to know, that is not the case. So yeah, in that case, lets have another property like policyKey or policyPath to give the path to policy. Cheers, Anjana. On Thu, Aug 21, 2014 at 9:01 AM, Selvaratnam Uthaiyashankar shan...@wso2.com wrote: Why do you prefer convention over explicit policy location. For example, I have Data service 1, 2, 3. Data service 1 and 2 are using policy 1. Data service 3 is using policy 2. With using convention, either you can have 1 policy or 3 policy for above case. You will not be able to have only 2 policy. On Wednesday, August 20, 2014, Anjana Fernando anj...@wso2.com wrote: Hi Chanika, Lets just put enableSec as an attribute in the root element of the data service configuration. Like, data enableSec=true .. and as for the policy file location, I guess there is a standard location the ESB would look up if its not given explicitly, we will also just skip the policy location attribute and just go by convention where the policy file would be located. Cheers, Anjana. On Wed, Aug 20, 2014 at 2:17 PM, Chanika Geeganage chan...@wso2.com wrote: Hi, We recently came across a requirement to support QoS related configurations to .dbs file itself rather than adding a separate services.xml file. Therefore we are going to add the transport and security policy related configurations in the same way that in ESB proxy services configurations. The changes are: 1. Adding transports=https http attribute to configure transport info 2. Adding enableSec tag with the policy key to configure security i.e: policy key=path/to/policy/ enableSec/ In the deployment time these configurations will be extracted. Will this be a good approach to follow? Thanks -- Best Regards.. Chanika Geeganage Software Engineer WSO2, Inc.; http://wso2.com -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware -- S.Uthaiyashankar VP Engineering WSO2 Inc. http://wso2.com/ - lean . enterprise . middleware Phone: +94 714897591 ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] [POC] Performance evaluation of Hive vs Shark
Hi Niranda, Excellent analysis of Hive vs Shark! .. This gives a lot of insight into how both operates in different scenarios. As the next step, we will need to run this in an actual cluster of computers. Since you've used a subset of the dataset of 2014 DEBS challenge, we should use the full data set in a clustered environment and check this. Gokul is already working on the Hive based setup for this, after that is done, you can create a Shark cluster in the same hardware and run the tests there, to get a clear comparison on how these two match up in a cluster. Until the setup is ready, do continue with your next steps on checking the RDD support and Spark SQL use. After these are done, we should also do a trial run of our own APIM Hive scripts, migrated to Shark. Cheers, Anjana. On Mon, Aug 11, 2014 at 12:21 PM, Niranda Perera nira...@wso2.com wrote: Hi all, I have been evaluating the performance of Shark (distributed SQL query engine for Hadoop) against Hive. This is with the objective of seeing the possibility to move the WSO2 BAM data processing (which currently uses Hive) to Shark (and Apache Spark) for improved performance. I am sharing my findings herewith. *AMP Lab Shark* Shark can execute Hive QL queries up to 100 times faster than Hive without any modification to the existing data or queries. It supports Hive's QL, metastore, serialization formats, and user-defined functions, providing seamless integration with existing Hive deployments and a familiar, more powerful option for new ones. [1] *Apache Spark*Apache Spark is an open-source data analytics cluster computing framework. It fits into the Hadoop open-source community, building on top of the HDFS and promises performance up to 100 times faster than Hadoop MapReduce for certain applications. [2] Official documentation: [3] I carried out the comparison between the following Hive and Shark releases with input files ranging from 100 to 1 billion entries. QL Engine Apache Hive 0.11 Shark Shark 0.9.1 (Latest release) which uses, - Scala 2.10.3 - Spark 0.9.1 - AMPLab’s Hive 0.9.0 Framework Hadoop 1.0.4 Spark 0.9.1 File system HDFS HDFS Attached herewith is a report which describes in detail about the performance comparison between Shark and Hive. hive_vs_shark https://docs.google.com/a/wso2.com/folderview?id=0B1GsnfycTl32QTZqUktKck1Ucjgusp=drive_web hive_vs_shark_report.odt https://docs.google.com/a/wso2.com/file/d/0B1GsnfycTl32X3J5dTh6Slloa0E/edit?usp=drive_web In summary, From the evaluation, following conclusions can be derived. - Shark is indifferent to Hive in DDL operations (CREATE, DROP .. TABLE, DATABASE). Both engines show a fairly constant performance as the input size increases. - Shark is indifferent to Hive in DML operations (LOAD, INSERT) but when a DML operation is called in conjuncture of a data retrieval operation (ex. INSERT TBL SELECT PROP FROM TBL), Shark significantly over-performs Hive with a performance factor of 10x+ (Ranging from 10x to 80x in some instances). Shark performance factor reduces with the input size increases, while HIVE performance is fairly indifferent. - Shark clearly over-performs Hive in Data Retrieval operations (FILTER, ORDER BY, JOIN). Hive performance is fairly indifferent in the data retrieval operations while Shark performance reduces as the input size increases. But at every instance Shark over-performed Hive with a minimum performance factor of 5x+ (Ranging from 5x to 80x in some instances). Please refer the 'hive_vs_shark_report', it has all the information about the queries and timings pictographically. The code repository can also be found in https://github.com/nirandaperera/hiveToShark/tree/master/hiveVsShark Moving forward, I am currently working on the following. - Apache Spark's resilient distributed dataset (RDD) abstraction (which is a collection of elements partitioned across the nodes of the cluster that can be operated on in parallel). The use of RDDs and its impact to the performance. - Spark SQL - Use of this Spark SQL over Shark on Spark framework [1] https://github.com/amplab/shark/wiki [2] http://en.wikipedia.org/wiki/Apache_Spark [3] http://spark.apache.org/docs/latest/ Would love to have your feedback on this. Best regards -- *Niranda Perera* Software Engineer, WSO2 Inc. Mobile: +94-71-554-8430 Twitter: @n1r44 https://twitter.com/N1R44 -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] [POC] Performance evaluation of Hive vs Shark
On Wed, Aug 13, 2014 at 3:51 PM, Sumedha Rubasinghe sume...@wso2.com wrote: After these are done, we should also do a trial run of our own APIM Hive scripts, migrated to Shark. Do we need to migrate?I thought existing Hive scripts can run as it is. First of all we need to create a large data set of API stats. Oh yeah, wrong selection of words I guess :) .. we wouldn't have to migrate .. I just referred to as just testing the same APIM Hive script in Shark. Cheers, Anjana. Cheers, Anjana. On Mon, Aug 11, 2014 at 12:21 PM, Niranda Perera nira...@wso2.com wrote: Hi all, I have been evaluating the performance of Shark (distributed SQL query engine for Hadoop) against Hive. This is with the objective of seeing the possibility to move the WSO2 BAM data processing (which currently uses Hive) to Shark (and Apache Spark) for improved performance. I am sharing my findings herewith. AMP Lab Shark Shark can execute Hive QL queries up to 100 times faster than Hive without any modification to the existing data or queries. It supports Hive's QL, metastore, serialization formats, and user-defined functions, providing seamless integration with existing Hive deployments and a familiar, more powerful option for new ones. [1] Apache Spark Apache Spark is an open-source data analytics cluster computing framework. It fits into the Hadoop open-source community, building on top of the HDFS and promises performance up to 100 times faster than Hadoop MapReduce for certain applications. [2] Official documentation: [3] I carried out the comparison between the following Hive and Shark releases with input files ranging from 100 to 1 billion entries. QL Engine Apache Hive 0.11 Shark Shark 0.9.1 (Latest release) which uses, Scala 2.10.3 Spark 0.9.1 AMPLab’s Hive 0.9.0 Framework Hadoop 1.0.4 Spark 0.9.1 File system HDFS HDFS Attached herewith is a report which describes in detail about the performance comparison between Shark and Hive. hive_vs_shark hive_vs_shark_report.odt In summary, From the evaluation, following conclusions can be derived. Shark is indifferent to Hive in DDL operations (CREATE, DROP .. TABLE, DATABASE). Both engines show a fairly constant performance as the input size increases. Shark is indifferent to Hive in DML operations (LOAD, INSERT) but when a DML operation is called in conjuncture of a data retrieval operation (ex. INSERT TBL SELECT PROP FROM TBL), Shark significantly over-performs Hive with a performance factor of 10x+ (Ranging from 10x to 80x in some instances). Shark performance factor reduces with the input size increases, while HIVE performance is fairly indifferent. Shark clearly over-performs Hive in Data Retrieval operations (FILTER, ORDER BY, JOIN). Hive performance is fairly indifferent in the data retrieval operations while Shark performance reduces as the input size increases. But at every instance Shark over-performed Hive with a minimum performance factor of 5x+ (Ranging from 5x to 80x in some instances). Please refer the 'hive_vs_shark_report', it has all the information about the queries and timings pictographically. The code repository can also be found in https://github.com/nirandaperera/hiveToShark/tree/master/hiveVsShark Moving forward, I am currently working on the following. Apache Spark's resilient distributed dataset (RDD) abstraction (which is a collection of elements partitioned across the nodes of the cluster that can be operated on in parallel). The use of RDDs and its impact to the performance. Spark SQL - Use of this Spark SQL over Shark on Spark framework [1] https://github.com/amplab/shark/wiki [2] http://en.wikipedia.org/wiki/Apache_Spark [3] http://spark.apache.org/docs/latest/ Would love to have your feedback on this. Best regards -- Niranda Perera Software Engineer, WSO2 Inc. Mobile: +94-71-554-8430 Twitter: @n1r44 -- Anjana Fernando Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Proposal: Annotation Based BAM/CEP Data Publisher - Replacement for Async/LoadBalance Data Publishers
+1, looks good to me, shall we just change the property order to ordinal, I guess that is more suitable. And lets do a code review after this is done, have to make sure, there won't be any performance overhead because of this approach. Cheers, Anjana. On Thu, Jul 24, 2014 at 6:53 AM, Srinath Perera srin...@wso2.com wrote: +1 from me. If everyone are OK, we can get this done soon so other toolboxes can be build on top of this. On Tue, Jul 22, 2014 at 5:55 PM, Chamil Jeewantha cha...@wso2.com wrote: +1 for optimization concern. In general annotation based systems uses a cache to avoid processing annotations again and again. First time when publish receives the Event, It process the annotations and put them into a cache. The cache key is the class name. Next time onwards no need to process annotations. just read the values from POJO and send to BAM. On Tue, Jul 22, 2014 at 5:16 PM, Sriskandarajah Suhothayan s...@wso2.com wrote: +1 it looks clean, we might need to do some optimisation at the publisher when converting the annotated class to the stream and Databridge Event. Suho On Tue, Jul 22, 2014 at 3:15 PM, Maninda Edirisooriya mani...@wso2.com wrote: +1. This is very clean to write a publisher. We have to generalize this annotations to become compatible with other publishers. How do we get the BAM/CEP server connection details? Where are we setting the loadbalancing URLs and other Async Publisher related settings? May be we can set them globally per product. (In this case specific for each AS cluster) WDYT? *Maninda Edirisooriya* Senior Software Engineer *WSO2, Inc. *lean.enterprise.middleware. *Blog* : http://maninda.blogspot.com/ *E-mail* : mani...@wso2.com *Skype* : @manindae *Twitter* : @maninda On Tue, Jul 22, 2014 at 2:43 PM, Chamil Jeewantha cha...@wso2.com wrote: This is a proposal to develop an easy to use, readable Anootation based Data publisher for BAM. When using the AsyncDataPublisher / LoadBalancingDataPublisher, the programmer must do a significant amount of boilerplate work before he publish data to the stream. See [1]. This will be really easy if we can have annotation based data publisher which can be used in the following way. We write a POJO, annotated with Some Stream Meta Data. *Example:* @DataStream(name=stat.data.stream, version=1.0.0, nickName=nick name, description=the description) public class StatDataStreamEvent{ @Column(name=serverName, order=2) private String serverName; @Column(order = 1) // no column name defined. so the name will be timestamp private long timestamp; @Column(name=id, type=DataType.STRING, order = 3) // the column data type is String though the field is int. (example only) private int statId; // getters and setters } *Publishing:* StatDataStreamEvent event = new StatDataStreamEvent(); event.setServerName(The server Name); event.setTimestamp(System.currentTimeMillis()); event.setStatId(5000); DataPublisher.publish(event); Please improve this with your valuable ideas. [1] http://wso2.com/library/articles/2012/07/creating-custom-agents-publish-events-bamcep/ -- K.D. Chamil Jeewantha Associate Technical Lead WSO2, Inc.; http://wso2.com Mobile: +94716813892 ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *S. Suhothayan * Technical Lead Team Lead of WSO2 Complex Event Processor *WSO2 Inc. *http://wso2.com * http://wso2.com/* lean . enterprise . middleware *cell: (+94) 779 756 757 %28%2B94%29%20779%20756%20757 | blog: http://suhothayan.blogspot.com/ http://suhothayan.blogspot.com/ twitter: http://twitter.com/suhothayan http://twitter.com/suhothayan | linked-in: http://lk.linkedin.com/in/suhothayan http://lk.linkedin.com/in/suhothayan* ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- K.D. Chamil Jeewantha Associate Technical Lead WSO2, Inc.; http://wso2.com Mobile: +94716813892 ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- Srinath Perera, Ph.D. http://people.apache.org/~hemapani/ http://srinathsview.blogspot.com/ ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware
Re: [Architecture] Standardize defining of BAM server profiles for Carbon products
On Wed, Jul 23, 2014 at 12:02 AM, Srinath Perera srin...@wso2.com wrote: How about event-pulblisher.xml? I think we do not put config to our config files usually? Need to be consistent about this. Yeah, true, no need for the config part, +1 for event-publisher.xml. Cheers, Anjana. +1 Giving and id for each publisher and a default As Anjana said dataSourceName name should not be here. Shall we add a publisher class when a customer asks for it? +1 to create an OSGI service to find the current publisher. (Nandika also proposed this yesterday). --Srinath On Tue, Jul 22, 2014 at 11:07 PM, Sriskandarajah Suhothayan s...@wso2.com wrote: Hi IMHO we should not restrict data publishing to WSO2 BAM and CEP, and our servers should be able to publish other analytic servers as well. So I believe adding the PublisherClass will be a good option and this can be an optional field. Regards Suho On Tue, Jul 22, 2014 at 8:50 PM, Anjana Fernando anj...@wso2.com wrote: Hi Sagara, Maybe we can have default publisher that will be used by the products if a specific id is not given, and if needed, clients can give a specific ID, as you said, if we have separate BAM and CEP servers and so on. And we should not have datasSourceName, it's a implementation specific property for how someone does analytics, and shouldn't be part of the publisher config. And also, I'm not sure what this PublisherClass is, we shouldn't have that, I guess it's a APIM specific thing. Cheers, Anjana. On Tue, Jul 22, 2014 at 11:16 AM, Sagara Gunathunga sag...@wso2.com wrote: Please find draft format for analytics.xml or event-publisher-config.xml. event-publisher-config publisher idbam/id enabledtrue/enabled protocolthrift/protocol serverURLtcp://BAM host IP:7614//serverURL usernameadmin/username passwordadmin/password dataSourceNamejdbc/WSO2AM_STATS_DB/dataSourceName publisher event-publisher-config - It is possible to uniquely refer each publisher from product specific configurations such as mediator, Valve etc. - In a given product it is possible to configure both CEP and BAM servers separately ( or two BAM/CEP servers) - As we host dashboards with each product now I included dataSourceName to refer stat database. - API-M uses PublisherClass class to refer publisher implementation class, if same thing possible with all products we can add PublisherClass element too. Please suggest additions and removals for above format ? @Maninda, Can you please elaborate more on where do we configure Publisher throttling constraints today and current format ? may be we can leverage those settings as well. Thanks ! On Tue, Jul 22, 2014 at 7:44 PM, Anjana Fernando anj...@wso2.com wrote: Now, since this is just to contain the publisher information, shouldn't it be something like event-publisher-config.xml? .. when we say analytics.xml, it gives an idea like it's a configuration for whole of analytics operations, like a config for some analyzing operation settings. Anyways, this will just contain the settings required to connect to an event receiver, that is the hosts, the secure/non-secure ports etc.. After this, we can create an OSGi service, which will expose an API to just create a DataPublisher for you. Cheers, Anjana. On Tue, Jul 22, 2014 at 6:26 AM, Sagara Gunathunga sag...@wso2.com wrote: On Tue, Jul 22, 2014 at 2:06 PM, Afkham Azeez az...@wso2.com wrote: analytics.xml seems like a better name. +1 On Tue, Jul 22, 2014 at 1:51 PM, Srinath Perera srin...@wso2.com wrote: These events can go to BAM or CEP. Shall we go with analytics.xml file instead of a bam.xml file? Sagara, can you send the content for current bam.xml file to this thread so we can finalise the content. Current bam.xml files is only used with AS and contains following two lines to control AS service/web-app stat publishing in global level. WebappDataPublishingdisable/WebappDataPublishing ServiceDataPublishingdisable/ServiceDataPublishing I will send draft design for new analytics.xml file soon. Thanks ! that will mean BPS, ESB, API-M needs to fix this (may be with BAM toolbox improvements). Also, when decided Shammi, MB training project needs to use this too. WDYT? --Srinath On Tue, Jul 22, 2014 at 1:43 PM, Afkham Azeez az...@wso2.com wrote: The correct approach is to introduce a bam.xml config. BAM is optional, hence we should avoid BAM specific configs to the carbon.xml. Azeez On Mon, Jul 21, 2014 at 9:52 PM, Sagara Gunathunga sag...@wso2.com wrote: Right now each of our product use it's own way to define BAM server profiles, it would be nice if we can follow an unified process when configuring BAM servers and to enable/disable server level data publishing. FYI these are some of the approaches used by our products. ESB - Through BAM server profile UI and no configuration file. AS - Use bam.xml
Re: [Architecture] Fwd: Create CQL data source from master-datasources.xml
Yeah, the format looks good .. Hope you used JAXB to represent this model in the code in the DataSourceReader, rather than parsing in raw DOM or something. Also, what is the data source object you're using here, I guess it would be the Session object that you need to return, to be used by the clients. Cheers, Anjana. On Tue, Jul 22, 2014 at 2:44 AM, Prabath Abeysekera praba...@wso2.com wrote: Hi Dhanuka, This looks good and comprehensive! Let's delve further into this and see whether there's any other parameters available in CQL driver configurations, which one might find useful to be used in a production setup. If we come across any, can consider supporting them in the proposed datasource configuration structure too. Cheers, Prabath On Tue, Jul 22, 2014 at 12:02 PM, Dhanuka Ranasinghe dhan...@wso2.com wrote: looping architecture *Dhanuka Ranasinghe* Senior Software Engineer WSO2 Inc. ; http://wso2.com lean . enterprise . middleware phone : +94 715381915 -- Forwarded message -- From: Dhanuka Ranasinghe dhan...@wso2.com Date: Tue, Jul 22, 2014 at 12:00 PM Subject: Create CQL data source from master-datasources.xml To: WSO2 Developers' List d...@wso2.org Cc: Prabath Abeysekera praba...@wso2.com, Hasitha Hiranya hasit...@wso2.com, Anjana Fernando anj...@wso2.com, Deependra Ariyadewa d...@wso2.com, Bhathiya Jayasekara bhath...@wso2.com, Shani Ranasinghe sh...@wso2.com, Poshitha Dabare poshi...@wso2.com, Harsha Kumara hars...@wso2.com Hi, While working on $Subject, found there are lot of configuration options available in CQL driver. Most of them are same as hector client configurations and we have identified some of them are critical for performance and reliability. Below describe the sample data source configuration that came up with the solution after analyzing CQL driver. Please let me know your thoughts regarding this. datasource nameWSO2_CASSANDRA_DB/name descriptionThe datasource used for cassandra/description jndiConfig nameCassandraRepo/name /jndiConfig definition type=CASSANDRA configuration asyncfalse/async clusterNameTestCluster/clusterName compressionSNAPPY/compression concurrency100/concurrency usernameadmin/username password encrypted=trueadmin/password port9042/port maxConnections100/maxConnections hosts host192.1.1.0/host host192.1.1.1/host /hosts loadBalancePolicy exclusionThreshold2.5/exclusionThreshold latencyAwaretrue/latencyAware minMeasure100/minMeasure policyNameRoundRobinPolicy/policyName retryPeriod10/retryPeriod scale2/scale /loadBalancePolicy poolOptions coreConnectionsForLocal10/coreConnectionsForLocal coreConnectionsForRemote10/coreConnectionsForRemote maxConnectionsForLocal10/maxConnectionsForLocal maxConnectionsForRemote10/maxConnectionsForRemote maxSimultaneousRequestsForLocal10/maxSimultaneousRequestsForLocal maxSimultaneousRequestsForRemote10/maxSimultaneousRequestsForRemote minSimultaneousRequestsForLocal10/minSimultaneousRequestsForLocal minSimultaneousRequestsForRemote10/minSimultaneousRequestsForRemote /poolOptions reconnectPolicy baseDelayMs100/baseDelayMs policyNameConstantReconnectionPolicy/policyName /reconnectPolicy socketOptions connectTimeoutMillis200/connectTimeoutMillis keepAlivetrue/keepAlive readTimeoutMillis200/readTimeoutMillis tcpNoDelaytrue/tcpNoDelay /socketOptions /configuration /definition /datasource Cheers, *Dhanuka Ranasinghe* Senior Software Engineer WSO2 Inc. ; http://wso2.com lean . enterprise . middleware phone : +94 715381915 ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- Prabath Abeysekara Associate Technical Lead, Data TG. WSO2 Inc. Email: praba...@wso2.com Mobile: +94774171471 ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Fwd: Create CQL data source from master-datasources.xml
Hi Dhanuka, On Tue, Jul 22, 2014 at 9:03 AM, Dhanuka Ranasinghe dhan...@wso2.com wrote: *Dhanuka Ranasinghe* Senior Software Engineer WSO2 Inc. ; http://wso2.com lean . enterprise . middleware phone : +94 715381915 On Tue, Jul 22, 2014 at 5:47 PM, Anjana Fernando anj...@wso2.com wrote: Yeah, the format looks good .. Hope you used JAXB to represent this model in the code in the DataSourceReader, rather than parsing in raw DOM or something. Yes same as rdbms component, used JAXB. Also, what is the data source object you're using here, I guess it would be the Session object that you need to return, to be used by the clients. com.datastax.driver.core.Cluster Be mindful in using Cluster here, because when you create a session out of it, the connection pool resides in the Session object [1], so if Cluster object is what you expose here, when multiple applications lookup this, they will create their own Session objects, and will have their own separate connection pools etc.. so the data source defined in your datasources.xml doesn't mean that globally all the applications only use the number of connections defined there. For example, RDBMS by default shares a single javax.sql.DataSource object, so everyone shares the connection pool. So maybe, also consider using Session object here also, and with that, you would also need to give the specific keyspace being used, as they say, a session has to be used in only one keyspace. [1] http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/quick_start/qsSimpleClientAddSession_t.html Cheers, Anjana. Cheers, Anjana. On Tue, Jul 22, 2014 at 2:44 AM, Prabath Abeysekera praba...@wso2.com wrote: Hi Dhanuka, This looks good and comprehensive! Let's delve further into this and see whether there's any other parameters available in CQL driver configurations, which one might find useful to be used in a production setup. If we come across any, can consider supporting them in the proposed datasource configuration structure too. Cheers, Prabath On Tue, Jul 22, 2014 at 12:02 PM, Dhanuka Ranasinghe dhan...@wso2.com wrote: looping architecture *Dhanuka Ranasinghe* Senior Software Engineer WSO2 Inc. ; http://wso2.com lean . enterprise . middleware phone : +94 715381915 -- Forwarded message -- From: Dhanuka Ranasinghe dhan...@wso2.com Date: Tue, Jul 22, 2014 at 12:00 PM Subject: Create CQL data source from master-datasources.xml To: WSO2 Developers' List d...@wso2.org Cc: Prabath Abeysekera praba...@wso2.com, Hasitha Hiranya hasit...@wso2.com, Anjana Fernando anj...@wso2.com, Deependra Ariyadewa d...@wso2.com, Bhathiya Jayasekara bhath...@wso2.com, Shani Ranasinghe sh...@wso2.com, Poshitha Dabare poshi...@wso2.com, Harsha Kumara hars...@wso2.com Hi, While working on $Subject, found there are lot of configuration options available in CQL driver. Most of them are same as hector client configurations and we have identified some of them are critical for performance and reliability. Below describe the sample data source configuration that came up with the solution after analyzing CQL driver. Please let me know your thoughts regarding this. datasource nameWSO2_CASSANDRA_DB/name descriptionThe datasource used for cassandra/description jndiConfig nameCassandraRepo/name /jndiConfig definition type=CASSANDRA configuration asyncfalse/async clusterNameTestCluster/clusterName compressionSNAPPY/compression concurrency100/concurrency usernameadmin/username password encrypted=trueadmin/password port9042/port maxConnections100/maxConnections hosts host192.1.1.0/host host192.1.1.1/host /hosts loadBalancePolicy exclusionThreshold2.5/exclusionThreshold latencyAwaretrue/latencyAware minMeasure100/minMeasure policyNameRoundRobinPolicy/policyName retryPeriod10/retryPeriod scale2/scale /loadBalancePolicy poolOptions coreConnectionsForLocal10/coreConnectionsForLocal coreConnectionsForRemote10/coreConnectionsForRemote maxConnectionsForLocal10/maxConnectionsForLocal maxConnectionsForRemote10/maxConnectionsForRemote maxSimultaneousRequestsForLocal10/maxSimultaneousRequestsForLocal maxSimultaneousRequestsForRemote10/maxSimultaneousRequestsForRemote minSimultaneousRequestsForLocal10/minSimultaneousRequestsForLocal minSimultaneousRequestsForRemote10/minSimultaneousRequestsForRemote /poolOptions reconnectPolicy baseDelayMs100/baseDelayMs policyNameConstantReconnectionPolicy/policyName /reconnectPolicy socketOptions connectTimeoutMillis200/connectTimeoutMillis keepAlivetrue/keepAlive readTimeoutMillis200/readTimeoutMillis tcpNoDelaytrue/tcpNoDelay /socketOptions /configuration /definition /datasource Cheers, *Dhanuka Ranasinghe* Senior Software Engineer WSO2 Inc. ; http://wso2.com lean . enterprise . middleware phone : +94 715381915 ___ Architecture mailing list
Re: [Architecture] Standardize defining of BAM server profiles for Carbon products
Now, since this is just to contain the publisher information, shouldn't it be something like event-publisher-config.xml? .. when we say analytics.xml, it gives an idea like it's a configuration for whole of analytics operations, like a config for some analyzing operation settings. Anyways, this will just contain the settings required to connect to an event receiver, that is the hosts, the secure/non-secure ports etc.. After this, we can create an OSGi service, which will expose an API to just create a DataPublisher for you. Cheers, Anjana. On Tue, Jul 22, 2014 at 6:26 AM, Sagara Gunathunga sag...@wso2.com wrote: On Tue, Jul 22, 2014 at 2:06 PM, Afkham Azeez az...@wso2.com wrote: analytics.xml seems like a better name. +1 On Tue, Jul 22, 2014 at 1:51 PM, Srinath Perera srin...@wso2.com wrote: These events can go to BAM or CEP. Shall we go with analytics.xml file instead of a bam.xml file? Sagara, can you send the content for current bam.xml file to this thread so we can finalise the content. Current bam.xml files is only used with AS and contains following two lines to control AS service/web-app stat publishing in global level. WebappDataPublishingdisable/WebappDataPublishing ServiceDataPublishingdisable/ServiceDataPublishing I will send draft design for new analytics.xml file soon. Thanks ! that will mean BPS, ESB, API-M needs to fix this (may be with BAM toolbox improvements). Also, when decided Shammi, MB training project needs to use this too. WDYT? --Srinath On Tue, Jul 22, 2014 at 1:43 PM, Afkham Azeez az...@wso2.com wrote: The correct approach is to introduce a bam.xml config. BAM is optional, hence we should avoid BAM specific configs to the carbon.xml. Azeez On Mon, Jul 21, 2014 at 9:52 PM, Sagara Gunathunga sag...@wso2.com wrote: Right now each of our product use it's own way to define BAM server profiles, it would be nice if we can follow an unified process when configuring BAM servers and to enable/disable server level data publishing. FYI these are some of the approaches used by our products. ESB - Through BAM server profile UI and no configuration file. AS - Use bam.xml to enable disable server level data publishing and Webapp/Service Data Publishing UI for server configuration. BPS - Through bps.xml and writing a BAMServerProfile.xml file. API-M - Through api-manager.xml file. IMHO we can unified this process among all the servers up to some extend, as an example 1. Configuring BAM server details - urls, user name, password 2. Globally enable and disable data publishing 3. Name of the stat database 4. Publishing protocol and it's configuration I have two suggestion on this. a.) As BAM publishing is common for most of the product define new element called Analytic under carbon.xml to hold above common configurations. b.) Alternatively define bam.xml file to hold above common configurations. WDYT ? NOTE - I only considered BAM but I guess we can consider CEP as well. Thanks ! -- Sagara Gunathunga Senior Technical Lead; WSO2, Inc.; http://wso2.com V.P Apache Web Services;http://ws.apache.org/ Linkedin; http://www.linkedin.com/in/ssagara Blog ; http://ssagara.blogspot.com -- *Afkham Azeez* Director of Architecture; WSO2, Inc.; http://wso2.com Member; Apache Software Foundation; http://www.apache.org/ * http://www.apache.org/* *email: **az...@wso2.com* az...@wso2.com * cell: +94 77 3320919 %2B94%2077%203320919 blog: * *http://blog.afkham.org* http://blog.afkham.org *twitter: **http://twitter.com/afkham_azeez* http://twitter.com/afkham_azeez * linked-in: **http://lk.linkedin.com/in/afkhamazeez http://lk.linkedin.com/in/afkhamazeez* *Lean . Enterprise . Middleware* -- Director, Research, WSO2 Inc. Visiting Faculty, University of Moratuwa Member, Apache Software Foundation Research Scientist, Lanka Software Foundation Blog: http://srinathsview.blogspot.com twitter:@srinath_perera Site: http://people.apache.org/~hemapani/ Photos: http://www.flickr.com/photos/hemapani/ Phone: 0772360902 -- *Afkham Azeez* Director of Architecture; WSO2, Inc.; http://wso2.com Member; Apache Software Foundation; http://www.apache.org/ * http://www.apache.org/* *email: **az...@wso2.com* az...@wso2.com * cell: +94 77 3320919 %2B94%2077%203320919 blog: * *http://blog.afkham.org* http://blog.afkham.org *twitter: **http://twitter.com/afkham_azeez* http://twitter.com/afkham_azeez * linked-in: **http://lk.linkedin.com/in/afkhamazeez http://lk.linkedin.com/in/afkhamazeez* *Lean . Enterprise . Middleware* -- Sagara Gunathunga Senior Technical Lead; WSO2, Inc.; http://wso2.com V.P Apache Web Services;http://ws.apache.org/ Linkedin; http://www.linkedin.com/in/ssagara Blog ; http://ssagara.blogspot.com -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware
Re: [Architecture] Machine Learning using Apache Mahout for WSO2 BAM
Hi, I was simply thinking, the UDF could directly mapped to some basic Mahout operation it implements, and the input/output should be given as parameters to the UDF, so probably, we can publish some input data beforehand to Cassandra etc.. and give the location of that data to the UDF, and the UDF will, as it is called, create the map/reduce jobs and execute. Cheers, Anjana. On Tue, Jul 1, 2014 at 9:18 AM, Srinath Perera srin...@wso2.com wrote: +1 we wanted to explore that more. However, It is not a simple UDF as this is a stateful op where we feed lot of data and start a separate map reduce process. Anjana, do you have any thoughts on how it can be done? On Tue, Jul 1, 2014 at 5:37 AM, Anjana Fernando anj...@wso2.com wrote: Hi, I'm just wondering if we have any way to integrate this to Hive itself (UDF?), to get results of an ML algorithm run, to a result there. A similar scenario is possible in Shark/MLlib integration. Cheers, Anjana. On Mon, Jun 30, 2014 at 12:28 PM, Supun Sethunga sup...@wso2.com wrote: Hi, Im working on the $subject, and the objective is to apply Machine Learning algorithms on the data stored by WSO2 BAM. Apache Mahout will be used as the ML tool, for this purpose. As per the discussion I had With Srinath, the procedure for $subject would be: - Test a Machine Learning algorithm using Mahout libraries within Java. - Implement a RESTful service which provides the above functionality. - Since Mahout also uses Hadoop, the above service can send Map Reduce Jobs to the Hadoop built inside the BAM. - Deploy the service as a Carbon Component on WSO2 BAM. The first step is completed for now. Any feedback is highly appreciated. Thanks, Supun -- *Supun Sethunga* Software Engineer WSO2, Inc. lean | enterprise | middleware Mobile : +94 716546324 -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware -- Director, Research, WSO2 Inc. Visiting Faculty, University of Moratuwa Member, Apache Software Foundation Research Scientist, Lanka Software Foundation Blog: http://srinathsview.blogspot.com twitter:@srinath_perera Site: http://people.apache.org/~hemapani/ Photos: http://www.flickr.com/photos/hemapani/ Phone: 0772360902 -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Machine Learning using Apache Mahout for WSO2 BAM
I see, sure, I was thinking of doing all the operations, including the training operations using an UDF. Will come and meet you. Cheers, Anjana. On Tue, Jul 1, 2014 at 9:56 AM, Srinath Perera srin...@wso2.com wrote: No, we need to get the data, preprocess them using hive, and send all the data (not 1-2 values, rather say 10 millions values) to training phase. Lets chat f2f a bit. --Srinath On Tue, Jul 1, 2014 at 6:24 AM, Anjana Fernando anj...@wso2.com wrote: Hi, I was simply thinking, the UDF could directly mapped to some basic Mahout operation it implements, and the input/output should be given as parameters to the UDF, so probably, we can publish some input data beforehand to Cassandra etc.. and give the location of that data to the UDF, and the UDF will, as it is called, create the map/reduce jobs and execute. Cheers, Anjana. On Tue, Jul 1, 2014 at 9:18 AM, Srinath Perera srin...@wso2.com wrote: +1 we wanted to explore that more. However, It is not a simple UDF as this is a stateful op where we feed lot of data and start a separate map reduce process. Anjana, do you have any thoughts on how it can be done? On Tue, Jul 1, 2014 at 5:37 AM, Anjana Fernando anj...@wso2.com wrote: Hi, I'm just wondering if we have any way to integrate this to Hive itself (UDF?), to get results of an ML algorithm run, to a result there. A similar scenario is possible in Shark/MLlib integration. Cheers, Anjana. On Mon, Jun 30, 2014 at 12:28 PM, Supun Sethunga sup...@wso2.com wrote: Hi, Im working on the $subject, and the objective is to apply Machine Learning algorithms on the data stored by WSO2 BAM. Apache Mahout will be used as the ML tool, for this purpose. As per the discussion I had With Srinath, the procedure for $subject would be: - Test a Machine Learning algorithm using Mahout libraries within Java. - Implement a RESTful service which provides the above functionality. - Since Mahout also uses Hadoop, the above service can send Map Reduce Jobs to the Hadoop built inside the BAM. - Deploy the service as a Carbon Component on WSO2 BAM. The first step is completed for now. Any feedback is highly appreciated. Thanks, Supun -- *Supun Sethunga* Software Engineer WSO2, Inc. lean | enterprise | middleware Mobile : +94 716546324 -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware -- Director, Research, WSO2 Inc. Visiting Faculty, University of Moratuwa Member, Apache Software Foundation Research Scientist, Lanka Software Foundation Blog: http://srinathsview.blogspot.com twitter:@srinath_perera Site: http://people.apache.org/~hemapani/ Photos: http://www.flickr.com/photos/hemapani/ Phone: 0772360902 -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware -- Director, Research, WSO2 Inc. Visiting Faculty, University of Moratuwa Member, Apache Software Foundation Research Scientist, Lanka Software Foundation Blog: http://srinathsview.blogspot.com twitter:@srinath_perera Site: http://people.apache.org/~hemapani/ Photos: http://www.flickr.com/photos/hemapani/ Phone: 0772360902 -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Implementing Datasource Deployer
/chintana linkedin: http://www.linkedin.com/in/engwar twitter: twitter.com/std_err ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- Sagara Gunathunga Senior Technical Lead; WSO2, Inc.; http://wso2.com V.P Apache Web Services;http://ws.apache.org/ Linkedin; http://www.linkedin.com/in/ssagara Blog ; http://ssagara.blogspot.com ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- Thanks regards, Nirmal Senior Software Engineer- Platform Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/ ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- Rajith Vitharana Software Engineer, WSO2 Inc. : wso2.com Mobile : +94715883223 Blog : http://lankavitharana.blogspot.com/ -- S.Uthaiyashankar VP Engineering WSO2 Inc. http://wso2.com/ - lean . enterprise . middleware Phone: +94 714897591 ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Data Storage Architecture Change in BAM
Hi Srinath, On Mon, Jun 9, 2014 at 10:31 AM, Srinath Perera srin...@wso2.com wrote: Hi Anjana, * No support for other data stores for storing events Yes we need to support RDBMS and Hive * Toolboxes being bound to certain types of data sources Need to fix this. * Transports IMO, for this we can depend on ESB to support other transports for near future. Need to make sure ESB thrift mediator is working smoothly. Please note above should go in the release following the toolboxes, not the immediate release which will add toolboxes. We should only allocate people AFTER tool boxes are out. Product toolboxes are a matter of coordinating with the product teams, which we are starting to do, and the product teams will be owning their respective toolboxes. But the BAM team itself, can be allocated to do product features. So anyways, we can start building the product toolboxes with the Cassandra storage handler etc.. and migrate to the new architecture when it is finalized. Cheers, Anjana. --Srinath On Fri, Jun 6, 2014 at 3:31 PM, Anjana Fernando anj...@wso2.com wrote: Hi, The BAM team has been looking into some ways in improving the current approach in we handling the operations in the data layer. So I will here explain the issues we have because of the current BAM architecture, and propose a solution to remedy this. Issues = * No support for other data stores for storing events At the moment, we are strictly limited to storing events into Cassandra, but there have been strong interest in using other types of data stores such as MongoDB, RDBMS etc.. specially because of easy of use for some users to use their existing databases and so on. And also, in order for BAM functionality to be embeddable to other products, this support is critical, for example, as a light-weight analytics solution, people should be able to use an RDBMS based solution. * Toolboxes being bound to certain types of data sources This is the case where, we assume we always retrieve data from Cassandra and write to some certain RDBMS. This approach does not scale, specially for WSO2 product related toolboxes we have / we going to have, because then, the toolboxes are limited to a certain specific combination of databases, and we will then need to support a different versions of toolboxes for each database combination, which is not practical to maintain, and also a huge effort will be spent on testing these each time. * Multi-tenancy limitations At the moment, we use our own MT Cassandra to store the events tenant-wise, and because of this, we cannot use any other Cassandra distribution that is out there to implement MT features. So effectively, anyone who may use their own Cassandra installation cannot use MT features. Which makes the BAM product inconsistent with its features. So ideally, we should support anyone having their own Cassandra, or actually any type of database that is supported without any special modifications for MT. * Transports CEP introduced a new architecture on defining transports/data formats in the system. And there are many transports such as HTTP/JMS etc.. with data types such as XML/Text/JSON available to get events in. But BAM is limited to using the Thrift transport, where, because we explicitly needs authentication support from the transport, because that is how we authenticate to Cassandra data store. So we cannot use any other transport, because we cannot authenticate to our data store. But ideally, what we need is, a way to have a default system user for a tenant, where by only figuring out the tenant this request belongs to, we should be able to write the events to the data store. For example, we can use a JMS queue, where we can use the data from that to write to super-tenant's space. Also, in toolboxes, the stream definitions needs to contain a username/password pair to create streams and their respective representation in the data store, ideally, it should be just, identify the tenant the toolbox should be deployed and just do the data operations that is needed internally. Solution == So the proposed solution, is to create a clear data abstraction layer for BAM. Rather than having just having Cassandra and some other RDBMS for storage of events and analyzed data, we propose having a single interface called AnalyticsDataStore to keep all the required data and its metadata. This would be the store used to store all the events coming into BAM and also the place to put summarized data. So basically AnalyticsDataStore will have several implementations, with backing data stores such as Cassandra, MongoDB and RDBMS. And, the data bridge connector for BAM will be implemented to simply write data to AnalyticsDataStore, and also, we will be having a Hive storage handler called AnalyticsDataStoreStorageHandler which reads and writes data to our common data store. So basically, users will have no idea about
[Architecture] Data Storage Architecture Change in BAM
store for summarized data as well, it wont goto the usual RDBMS based tables, where the earlier point there was, many tools can be already used to visualize data from RDBMS tables. But this requirement will be reduced, where in BAM itself, we are going to provide rich visualization support with UES. And also, AnalyticsDataStore functionalities will also be exposed from a well defined REST API and a Java API, so external tools also can access this data if needed. And also, functionalities such as data archival will also use this interface, rather than directly going to the back-end data store. And also, because of this centralized API based data access, multi-tenancy aspects can be implemented as an implementation detail, where we are free to store the data in any structure we want internally, for example, for Cassandra, we can keep a single admin user in a configuration file, and store all the tenant based data in a single space. And also, now users will not directly go to the backend data store to browse for data and all, they will simply use the API with the proper user credentials to retrieve/update data. So then, we should also remove data store specific tools such as Cassandra explorer and so on from BAM, because, browsing the raw data there may not make sense to the users. And anyway, we should not keep any data store specific tools, since we will be supporting many. So at the end, the aim is to possibly solve all the issues mentioned earlier, with the suggested layered approach, to ultimately create a much more stable and a functional BAM. Any comments on this idea is appreciated. Cheers, Anjana. -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Grouping options for BAM Message Tracer
Hi Manoj, Yeah, that's a good idea. Basically I guess, there are two parts, first is, how we propagate the correlation property / activity id (grouping value), that is, where in the message between nodes do we set it, at the moment, for HTTP, we have a specific HTTP header, and for HL7, we use a specific attribute in the HL7 message. So probably, we can make it customizable, for example, for XML messages, to give an XPath expression to extract a specific element value as the the activity id. So possibly we can put that as a feature in the message tracer agent we have. Also, the other part is in the stream definition, we can let the user select, which property is to be used as the activity id, rather than using a single well known property id called activityId. So at the end, the stream events can have a meaningful property name for the correlation id such as orderId and so on. I've created JIRAs [1] and [2] to track these. [1] https://wso2.org/jira/browse/BAM-1575 [2] https://wso2.org/jira/browse/BAM-1576 Cheers, Anjana. On Sat, Apr 19, 2014 at 1:22 PM, Manoj Fernando man...@wso2.com wrote: Folks, Following a chat I had with Anjana and Shankar, some thoughts for improving the BAM Message Tracer. As of now, a message is traced by a unique ID that is assigned by the the tracer handler (if not already set). This unique ID is what BAM Activity Dashboard uses for grouping all correlated messages (in/out). However, there can be situations where we might need to use a more business related parameter for grouping (take a mobile device ID + session ID for example) which is likely to exist on the header or the message body. To support this, we can use two options. 1. Provide a feature on Message Tracer to let users specify expressions to extract this uniqueID or a combination of them. 2. Let users specify which parameters are 'group-able' (either from header or payload), and use them in the stream definition so that on the Dashboard we can support multiple grouping options. We created an Axis2 module to do something similar (more closer to Option 1) for a PoC, so we can reuse some of that stuff. Regards, Manoj -- Manoj Fernando Director - Solutions Architecture Contact: LK - +94 112 145345 Mob: +94 773 759340 www.wso2.com -- *Anjana Fernando* Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Integrating ntask component into ESB
Obviously, check if that class is available and where it is referred from in the code. As I remember, there isn't a package called ntaskint, so check where this is coming from. Cheers, Anjana. On Sat, Apr 12, 2014 at 6:46 AM, Ishan Jayawardena is...@wso2.com wrote: We developed the quartz task manager and we are currently working on the ntask task manager. While developing the task handling component that uses ntask, we observed that we cannot schedule a task in it due to a class not found error. See the below error message. The ntask component (which is used by the component that we are currently writing) cannot load the actual task implementation. Does anyone know how to get rid of this? java.lang.ClassNotFoundException: class org.wso2.carbon.ntaskint.core.Task at org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:501) at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:421) at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:412) at org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.loadClass(DefaultClassLoader.java:107) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at org.wso2.carbon.ntask.core.impl.TaskQuartzJobAdapter.execute(TaskQuartzJobAdapter.java:58) at org.quartz.core.JobRunShell.run(JobRunShell.java:213) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) Thanks, Ishan. On Mon, Apr 7, 2014 at 9:11 AM, Anjana Fernando anj...@wso2.com wrote: Hi Paul, Task Server is actually another server itself. NTask component is the task scheduling component we put to all our Carbon server when we need distributed task scheduling functionality. That component support scheduling tasks in a standalone manner (in a single server), or in a clustered mode for the distributed nature (it does the coordination using Hazelcast), or else, also a remote mode where it can interface with an external Task Server. So basically the full required functionality of distributed tasks can be achieved with the ntask component working in the clustered mode, where it identifies all the participating servers in the cluster and do the proper fail-over/load balanced scheduling of scheduled tasks. And they schedule the tasks themselves using their internal Quartz functionality. With TS, all the task triggering is offloaded to TS, where it will be sending HTTP messages to each server saying to execute the tasks. This should happen through the LB as I explained in the earlier mail. So basically Task Server = ntask component + remote tasks component. What any other Carbon server will need is just the ntask component for full task scheduling functionality. Cheers, Anjana. On Sat, Apr 5, 2014 at 1:43 PM, Paul Fremantle p...@wso2.com wrote: Can someone clarify? I'm lost but I really don't understand why we are creating any other approach than task server. It is the only approach that scales clearly. Is our task server code too heavyweight? Paul On 5 April 2014 08:47, Chanaka Fernando chana...@wso2.com wrote: Hi Kasun/Anjana, I think what Anjana mentioned and Ishan mentioned are somewhat converge to same idea (even though they looks different). What we have discussed and agreed was that we are developing a separate carbon-component which is used for executing the ntask component. Since we need a common interface to support both the existing quartz based synapse-tasks implementation and the ntask component, we have defined the TaskManager interface. When ESB is loading the synapse configuration, it will create an object of type TaskManager according to the Task provider mentioned in the configuration. This task manager object will delegate the scheduling and other task related stuff to the respective implementation of the TaskManager (which can be either QuartzTaskManager or NTaskManager). @Kasun/Anjana: are we missing something here? Thanks, Chanaka On Sat, Apr 5, 2014 at 9:32 AM, Kasun Indrasiri ka...@wso2.com wrote: On Sat, Apr 5, 2014 at 9:22 AM, Anjana Fernando anj...@wso2.comwrote: Hi Ishan, On Sat, Apr 5, 2014 at 7:33 AM, Ishan Jayawardena is...@wso2.comwrote: Currently, we have developed following design and started to work on it. Synapse will have defined the TaskManager, and Task interfaces whose implementations will provide the concrete tasks and management of those tasks depending on the scheduler(ie quartz or ntask). For instance, for inbuilt quartz based task scheduling, we will refactor and develop a quartz task manager, and a task type while
Re: [Architecture] Shipping Streaming JSON Builder/Formatter by Default in Kernel axis2.xml
Hi Manoj, Attached the new patch in the issue and also sent a pull request for GitHub. Cheers, Anjana. On Fri, Mar 21, 2014 at 2:49 PM, Manoj Kumara ma...@wso2.com wrote: Hi Anjana, Yes tenant-axis2.xml file. Sorry for that. Thanks, Manoj *Manoj Kumara* Software Engineer WSO2 Inc. http://wso2.com/ *lean.enterprise.middleware* Mobile: +94713448188 On Fri, Mar 21, 2014 at 2:36 PM, Anjana Fernando anj...@wso2.com wrote: Hi Manoj, Sure will do, and I'm guessing you mean tenant-axis2.xml, since we are not doing this change to axis2_client.xml. Cheers, Anjana. On Fri, Mar 21, 2014 at 2:20 PM, Manoj Kumara ma...@wso2.com wrote: Hi Anjana, Can you please add the diff relevant to axis2_client.xml as well. Please send a pull request to wso2-dev repo on [1] as well. [1] https://github.com/wso2-dev/carbon4-kernel Thanks, Manoj *Manoj Kumara* Software Engineer WSO2 Inc. http://wso2.com/ *lean.enterprise.middleware* Mobile: +94713448188 On Wed, Mar 19, 2014 at 10:56 AM, Anjana Fernando anj...@wso2.comwrote: Hi Manoj, Not the axis2_client.xml, since ESB is using it, and other servers like DSS and AS will not be using it, which is what this is mainly aimed for, so lets not change that now. As for tenant-axis2.xml, what does that do? .. is it the same as standard axis2.xml for tenants or something. Cheers, Anjana. On Wed, Mar 19, 2014 at 10:50 AM, Manoj Kumara ma...@wso2.com wrote: Hi Anjana, I committed the fix relevant to axis2.xml to patch0006 with r198653. Should we need to apply this change to *axis2_client.xml, tenant-axis2.xml *configuration files also ? Thanks, Manoj *Manoj Kumara* Software Engineer WSO2 Inc. http://wso2.com/ *lean.enterprise.middleware* Mobile: +94713448188 On Tue, Mar 18, 2014 at 7:46 PM, Anjana Fernando anj...@wso2.comwrote: Hi Sameera / Carbon Team, Can you please apply the patch at [1], to patch0006 in Turing branch, Carbon 4.3.0 and the trunk. [1] https://wso2.org/jira/browse/CARBON-14738 Cheers, Anjana. On Tue, Mar 18, 2014 at 7:15 PM, Sagara Gunathunga sag...@wso2.comwrote: On Tue, Mar 18, 2014 at 7:06 PM, Anjana Fernando anj...@wso2.comwrote: Hi, In an offline chat with Sameera, Shameera and Sagara, we decided we will put it in the kernel's axis2.xml, since many products can benefit from the new message builder/receiver, and for ESB, for the moment, they will retain the older settings with their own axis2.xml and later possibly come with a solution for both scenarios to work. Proposed new JSON Builder/Formatter are much effective if the underline server is the final destination but for ESB this is not the case hence we don't need to apply this change to ESB. Thanks ! Cheers, Anjana. On Tue, Mar 18, 2014 at 6:07 PM, Anjana Fernando anj...@wso2.comwrote: Hi, OK, so for now, I will put the changes for DSS product, Sagara, shall we put the same changes for AS as well, I guess AS functionality will not be affected by the new builder/formatter. As for ESB having data services features, there is no straightforward way to make it work now, so we can say, if proper JSON mapping is needed for data services, either DSS or AS have to be used and it wont be possible to embed this in the ESB. Cheers, Anjana. On Tue, Mar 18, 2014 at 5:06 PM, Ishan Jayawardena is...@wso2.com wrote: Several basic ESB mediators depend on the message built by ESB's existing JSON message builder (implemented in Synapse), so switching to this new message builder will break them. If we need to make DSS features work in ESB, we have to rebuild the message for DSS after it has been first built by ESB's builder. Similarly, we have to handle the formatter flow. Thanks, Ishan. On Tue, Mar 18, 2014 at 3:58 PM, Anjana Fernando anj...@wso2.com wrote: Hi, Yeah, but in the ESB case, it will be a bit tricky, where the WSDL they create by default for proxy services actually create a mediate operation and all, so unless the incoming message actually have a mediate wrapper in the message, the message builder will fail. So maybe we should have like a axis2.xml parameter to say, for these type of axis2 services, ignore the schema definition, but then again, the streaming message builder actually fully depends on the schema to actually do the streaming and to build the message, so not sure how feasible this would be. Maybe, in the new message builder, it can revert back to the older message builder's implementation, if he can see that the service dispatching has already happened earlier, probably through the URL based dispatcher, and if it can find out that, for this service/service-type, it is not suppose to use the schema based parsing of the message. Cheers, Anjana. On Tue, Mar 18, 2014 at 3:46 PM, Sameera Jayasoma same...@wso2.com wrote: Hi Anjana/Shameera, Great stuff. Now we have a proper JSON support
[Architecture] Shipping Streaming JSON Builder/Formatter by Default in Kernel axis2.xml
Hi, We've added JSON mapping support for DSS, which is mentioned in the mail with subject JSON Mapping Support for Data Services. For this, I've used the GSON based streaming JSON message builder/formatter, where this was needed for a correct JSON message generation by looking at the service schema. There were some fixes done by Shameera lately, and this is working properly now for all of the scenarios I've tested. So shall we ship this message builder/formatter by default from the axis2.xml in the kernel, so all the products, including AS and DSS will get this feature. It will be specifically required by AS, as it still contains the data services features. And for ESB, I'm not sure how the new message builder/formatter would work, since they will not always have correct service schemas in proxy services etc.. so I guess those scenarios may fail, maybe Shameera can give some insight on this more. Anyways, the ESB has their own axis2.xml, so they will not be affected. So shall we go ahead in updating the kernel's axis2.xml to contain the following sections? .. messageFormatter contentType=application/json class=org.apache.axis2.json.gson.JsonFormatter / messageBuilder contentType=application/json class=org.apache.axis2.json.gson.JsonBuilder / Cheers, Anjana -- *Anjana Fernando* Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Shipping Streaming JSON Builder/Formatter by Default in Kernel axis2.xml
Hi, Yeah, but in the ESB case, it will be a bit tricky, where the WSDL they create by default for proxy services actually create a mediate operation and all, so unless the incoming message actually have a mediate wrapper in the message, the message builder will fail. So maybe we should have like a axis2.xml parameter to say, for these type of axis2 services, ignore the schema definition, but then again, the streaming message builder actually fully depends on the schema to actually do the streaming and to build the message, so not sure how feasible this would be. Maybe, in the new message builder, it can revert back to the older message builder's implementation, if he can see that the service dispatching has already happened earlier, probably through the URL based dispatcher, and if it can find out that, for this service/service-type, it is not suppose to use the schema based parsing of the message. Cheers, Anjana. On Tue, Mar 18, 2014 at 3:46 PM, Sameera Jayasoma same...@wso2.com wrote: Hi Anjana/Shameera, Great stuff. Now we have a proper JSON support in Axis2. But we need to think carefully before adding this formatter and the builder as the default builder/formatter for the application/json content type. I think we need to fix this JSON support to work in ESB as well. Otherwise users will not be able to deploy data services features ESB. If we improve this JSON support to handle xsd:any type then we should be able to support proxy services case. Lets fix this to work in ESB as well and then we can commit it to the Kernel. Thanks, Sameera. On Tue, Mar 18, 2014 at 2:28 PM, Shameera Rathnayaka shame...@wso2.comwrote: Hi Anjana et al, Above new JSON implementation has been introduced to handle XML -- JSON lossless transformation. and this implementation highly depend on the schema definitions, where it generate the message structure by reading this schemas. In short, to work XML stream base JSON implementation we need to have proper schema definition for in and out messages otherwise it won't work. Addition to the above entries we need to do following changes to axis2.xml file in order to integrate above implementation. Remove RequestURIOperationDispatcher handler from dispatch phase and place it as the last handler in transport phase. IMO it is ok to move RequestURIOperationDispatcher to transport phase as we are dealing with URI. Now add new JSONMessageHandler after the RequestURIOperationDispatcher. Finally transport phase would be like following phaseOrder type=InFlow !-- System predefined phases -- phase name=Transport - handler name=RequestURIOperationDispatcher class=org.apache.axis2.dispatchers.RequestURIOperationDispatcher/ handler name=JSONMessageHandler class=org.apache.axis2.json.gson.JSONMessageHandler / /phase /phaseOrder Thanks, Shameera. On Tue, Mar 18, 2014 at 1:40 PM, Anjana Fernando anj...@wso2.com wrote: Hi, We've added JSON mapping support for DSS, which is mentioned in the mail with subject JSON Mapping Support for Data Services. For this, I've used the GSON based streaming JSON message builder/formatter, where this was needed for a correct JSON message generation by looking at the service schema. There were some fixes done by Shameera lately, and this is working properly now for all of the scenarios I've tested. So shall we ship this message builder/formatter by default from the axis2.xml in the kernel, so all the products, including AS and DSS will get this feature. It will be specifically required by AS, as it still contains the data services features. And for ESB, I'm not sure how the new message builder/formatter would work, since they will not always have correct service schemas in proxy services etc.. so I guess those scenarios may fail, maybe Shameera can give some insight on this more. Anyways, the ESB has their own axis2.xml, so they will not be affected. So shall we go ahead in updating the kernel's axis2.xml to contain the following sections? .. messageFormatter contentType=application/json class=org.apache.axis2.json.gson.JsonFormatter / messageBuilder contentType=application/json class=org.apache.axis2.json.gson.JsonBuilder / Cheers, Anjana -- *Anjana Fernando* Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware -- *Software Engineer - WSO2 Inc.* *email: shameera AT wso2.com shame...@wso2.com , shameera AT apache.org shame...@apache.org* *phone: +9471 922 1454 %2B9471%20922%201454* *Linked in : *http://lk.linkedin.com/pub/shameera-rathnayaka/1a/661/561 *Twitter : *https://twitter.com/Shameera_R -- Sameera Jayasoma, Software Architect, WSO2, Inc. (http://wso2.com) email: same...@wso2.com blog: http://sameera.adahas.org twitter
Re: [Architecture] Shipping Streaming JSON Builder/Formatter by Default in Kernel axis2.xml
Hi, OK, so for now, I will put the changes for DSS product, Sagara, shall we put the same changes for AS as well, I guess AS functionality will not be affected by the new builder/formatter. As for ESB having data services features, there is no straightforward way to make it work now, so we can say, if proper JSON mapping is needed for data services, either DSS or AS have to be used and it wont be possible to embed this in the ESB. Cheers, Anjana. On Tue, Mar 18, 2014 at 5:06 PM, Ishan Jayawardena is...@wso2.com wrote: Several basic ESB mediators depend on the message built by ESB's existing JSON message builder (implemented in Synapse), so switching to this new message builder will break them. If we need to make DSS features work in ESB, we have to rebuild the message for DSS after it has been first built by ESB's builder. Similarly, we have to handle the formatter flow. Thanks, Ishan. On Tue, Mar 18, 2014 at 3:58 PM, Anjana Fernando anj...@wso2.com wrote: Hi, Yeah, but in the ESB case, it will be a bit tricky, where the WSDL they create by default for proxy services actually create a mediate operation and all, so unless the incoming message actually have a mediate wrapper in the message, the message builder will fail. So maybe we should have like a axis2.xml parameter to say, for these type of axis2 services, ignore the schema definition, but then again, the streaming message builder actually fully depends on the schema to actually do the streaming and to build the message, so not sure how feasible this would be. Maybe, in the new message builder, it can revert back to the older message builder's implementation, if he can see that the service dispatching has already happened earlier, probably through the URL based dispatcher, and if it can find out that, for this service/service-type, it is not suppose to use the schema based parsing of the message. Cheers, Anjana. On Tue, Mar 18, 2014 at 3:46 PM, Sameera Jayasoma same...@wso2.comwrote: Hi Anjana/Shameera, Great stuff. Now we have a proper JSON support in Axis2. But we need to think carefully before adding this formatter and the builder as the default builder/formatter for the application/json content type. I think we need to fix this JSON support to work in ESB as well. Otherwise users will not be able to deploy data services features ESB. If we improve this JSON support to handle xsd:any type then we should be able to support proxy services case. Lets fix this to work in ESB as well and then we can commit it to the Kernel. Thanks, Sameera. On Tue, Mar 18, 2014 at 2:28 PM, Shameera Rathnayaka shame...@wso2.comwrote: Hi Anjana et al, Above new JSON implementation has been introduced to handle XML -- JSON lossless transformation. and this implementation highly depend on the schema definitions, where it generate the message structure by reading this schemas. In short, to work XML stream base JSON implementation we need to have proper schema definition for in and out messages otherwise it won't work. Addition to the above entries we need to do following changes to axis2.xml file in order to integrate above implementation. Remove RequestURIOperationDispatcher handler from dispatch phase and place it as the last handler in transport phase. IMO it is ok to move RequestURIOperationDispatcher to transport phase as we are dealing with URI. Now add new JSONMessageHandler after the RequestURIOperationDispatcher. Finally transport phase would be like following phaseOrder type=InFlow !-- System predefined phases -- phase name=Transport - handler name=RequestURIOperationDispatcher class=org.apache.axis2.dispatchers.RequestURIOperationDispatcher/ handler name=JSONMessageHandler class=org.apache.axis2.json.gson.JSONMessageHandler / /phase /phaseOrder Thanks, Shameera. On Tue, Mar 18, 2014 at 1:40 PM, Anjana Fernando anj...@wso2.comwrote: Hi, We've added JSON mapping support for DSS, which is mentioned in the mail with subject JSON Mapping Support for Data Services. For this, I've used the GSON based streaming JSON message builder/formatter, where this was needed for a correct JSON message generation by looking at the service schema. There were some fixes done by Shameera lately, and this is working properly now for all of the scenarios I've tested. So shall we ship this message builder/formatter by default from the axis2.xml in the kernel, so all the products, including AS and DSS will get this feature. It will be specifically required by AS, as it still contains the data services features. And for ESB, I'm not sure how the new message builder/formatter would work, since they will not always have correct service schemas in proxy services etc.. so I guess those scenarios may fail
Re: [Architecture] Shipping Streaming JSON Builder/Formatter by Default in Kernel axis2.xml
Hi, In an offline chat with Sameera, Shameera and Sagara, we decided we will put it in the kernel's axis2.xml, since many products can benefit from the new message builder/receiver, and for ESB, for the moment, they will retain the older settings with their own axis2.xml and later possibly come with a solution for both scenarios to work. Cheers, Anjana. On Tue, Mar 18, 2014 at 6:07 PM, Anjana Fernando anj...@wso2.com wrote: Hi, OK, so for now, I will put the changes for DSS product, Sagara, shall we put the same changes for AS as well, I guess AS functionality will not be affected by the new builder/formatter. As for ESB having data services features, there is no straightforward way to make it work now, so we can say, if proper JSON mapping is needed for data services, either DSS or AS have to be used and it wont be possible to embed this in the ESB. Cheers, Anjana. On Tue, Mar 18, 2014 at 5:06 PM, Ishan Jayawardena is...@wso2.com wrote: Several basic ESB mediators depend on the message built by ESB's existing JSON message builder (implemented in Synapse), so switching to this new message builder will break them. If we need to make DSS features work in ESB, we have to rebuild the message for DSS after it has been first built by ESB's builder. Similarly, we have to handle the formatter flow. Thanks, Ishan. On Tue, Mar 18, 2014 at 3:58 PM, Anjana Fernando anj...@wso2.com wrote: Hi, Yeah, but in the ESB case, it will be a bit tricky, where the WSDL they create by default for proxy services actually create a mediate operation and all, so unless the incoming message actually have a mediate wrapper in the message, the message builder will fail. So maybe we should have like a axis2.xml parameter to say, for these type of axis2 services, ignore the schema definition, but then again, the streaming message builder actually fully depends on the schema to actually do the streaming and to build the message, so not sure how feasible this would be. Maybe, in the new message builder, it can revert back to the older message builder's implementation, if he can see that the service dispatching has already happened earlier, probably through the URL based dispatcher, and if it can find out that, for this service/service-type, it is not suppose to use the schema based parsing of the message. Cheers, Anjana. On Tue, Mar 18, 2014 at 3:46 PM, Sameera Jayasoma same...@wso2.comwrote: Hi Anjana/Shameera, Great stuff. Now we have a proper JSON support in Axis2. But we need to think carefully before adding this formatter and the builder as the default builder/formatter for the application/json content type. I think we need to fix this JSON support to work in ESB as well. Otherwise users will not be able to deploy data services features ESB. If we improve this JSON support to handle xsd:any type then we should be able to support proxy services case. Lets fix this to work in ESB as well and then we can commit it to the Kernel. Thanks, Sameera. On Tue, Mar 18, 2014 at 2:28 PM, Shameera Rathnayaka shame...@wso2.com wrote: Hi Anjana et al, Above new JSON implementation has been introduced to handle XML -- JSON lossless transformation. and this implementation highly depend on the schema definitions, where it generate the message structure by reading this schemas. In short, to work XML stream base JSON implementation we need to have proper schema definition for in and out messages otherwise it won't work. Addition to the above entries we need to do following changes to axis2.xml file in order to integrate above implementation. Remove RequestURIOperationDispatcher handler from dispatch phase and place it as the last handler in transport phase. IMO it is ok to move RequestURIOperationDispatcher to transport phase as we are dealing with URI. Now add new JSONMessageHandler after the RequestURIOperationDispatcher. Finally transport phase would be like following phaseOrder type=InFlow !-- System predefined phases -- phase name=Transport - handler name=RequestURIOperationDispatcher class=org.apache.axis2.dispatchers.RequestURIOperationDispatcher/ handler name=JSONMessageHandler class=org.apache.axis2.json.gson.JSONMessageHandler / /phase /phaseOrder Thanks, Shameera. On Tue, Mar 18, 2014 at 1:40 PM, Anjana Fernando anj...@wso2.comwrote: Hi, We've added JSON mapping support for DSS, which is mentioned in the mail with subject JSON Mapping Support for Data Services. For this, I've used the GSON based streaming JSON message builder/formatter, where this was needed for a correct JSON message generation by looking at the service schema. There were some fixes done by Shameera lately, and this is working properly now for all of the scenarios I've tested
Re: [Architecture] Shipping Streaming JSON Builder/Formatter by Default in Kernel axis2.xml
Hi Sameera / Carbon Team, Can you please apply the patch at [1], to patch0006 in Turing branch, Carbon 4.3.0 and the trunk. [1] https://wso2.org/jira/browse/CARBON-14738 Cheers, Anjana. On Tue, Mar 18, 2014 at 7:15 PM, Sagara Gunathunga sag...@wso2.com wrote: On Tue, Mar 18, 2014 at 7:06 PM, Anjana Fernando anj...@wso2.com wrote: Hi, In an offline chat with Sameera, Shameera and Sagara, we decided we will put it in the kernel's axis2.xml, since many products can benefit from the new message builder/receiver, and for ESB, for the moment, they will retain the older settings with their own axis2.xml and later possibly come with a solution for both scenarios to work. Proposed new JSON Builder/Formatter are much effective if the underline server is the final destination but for ESB this is not the case hence we don't need to apply this change to ESB. Thanks ! Cheers, Anjana. On Tue, Mar 18, 2014 at 6:07 PM, Anjana Fernando anj...@wso2.com wrote: Hi, OK, so for now, I will put the changes for DSS product, Sagara, shall we put the same changes for AS as well, I guess AS functionality will not be affected by the new builder/formatter. As for ESB having data services features, there is no straightforward way to make it work now, so we can say, if proper JSON mapping is needed for data services, either DSS or AS have to be used and it wont be possible to embed this in the ESB. Cheers, Anjana. On Tue, Mar 18, 2014 at 5:06 PM, Ishan Jayawardena is...@wso2.comwrote: Several basic ESB mediators depend on the message built by ESB's existing JSON message builder (implemented in Synapse), so switching to this new message builder will break them. If we need to make DSS features work in ESB, we have to rebuild the message for DSS after it has been first built by ESB's builder. Similarly, we have to handle the formatter flow. Thanks, Ishan. On Tue, Mar 18, 2014 at 3:58 PM, Anjana Fernando anj...@wso2.comwrote: Hi, Yeah, but in the ESB case, it will be a bit tricky, where the WSDL they create by default for proxy services actually create a mediate operation and all, so unless the incoming message actually have a mediate wrapper in the message, the message builder will fail. So maybe we should have like a axis2.xml parameter to say, for these type of axis2 services, ignore the schema definition, but then again, the streaming message builder actually fully depends on the schema to actually do the streaming and to build the message, so not sure how feasible this would be. Maybe, in the new message builder, it can revert back to the older message builder's implementation, if he can see that the service dispatching has already happened earlier, probably through the URL based dispatcher, and if it can find out that, for this service/service-type, it is not suppose to use the schema based parsing of the message. Cheers, Anjana. On Tue, Mar 18, 2014 at 3:46 PM, Sameera Jayasoma same...@wso2.comwrote: Hi Anjana/Shameera, Great stuff. Now we have a proper JSON support in Axis2. But we need to think carefully before adding this formatter and the builder as the default builder/formatter for the application/json content type. I think we need to fix this JSON support to work in ESB as well. Otherwise users will not be able to deploy data services features ESB. If we improve this JSON support to handle xsd:any type then we should be able to support proxy services case. Lets fix this to work in ESB as well and then we can commit it to the Kernel. Thanks, Sameera. On Tue, Mar 18, 2014 at 2:28 PM, Shameera Rathnayaka shame...@wso2.com wrote: Hi Anjana et al, Above new JSON implementation has been introduced to handle XML -- JSON lossless transformation. and this implementation highly depend on the schema definitions, where it generate the message structure by reading this schemas. In short, to work XML stream base JSON implementation we need to have proper schema definition for in and out messages otherwise it won't work. Addition to the above entries we need to do following changes to axis2.xml file in order to integrate above implementation. Remove RequestURIOperationDispatcher handler from dispatch phase and place it as the last handler in transport phase. IMO it is ok to move RequestURIOperationDispatcher to transport phase as we are dealing with URI. Now add new JSONMessageHandler after the RequestURIOperationDispatcher. Finally transport phase would be like following phaseOrder type=InFlow !-- System predefined phases -- phase name=Transport - handler name=RequestURIOperationDispatcher class=org.apache.axis2.dispatchers.RequestURIOperationDispatcher/ handler name=JSONMessageHandler class=org.apache.axis2.json.gson.JSONMessageHandler / /phase
[Architecture] JSON Mapping Support for Data Services
Hi, I've implemented JSON mapping support for data services, which is basically, rather than defining the XML elements in the result, now we can give a JSON message template to provide how the JSON representation would be in the result. The following is a sample of a data services result element for JSON mapping:- query id=customersInBostonSQL useConfig=default sqlselect * from Customers where city = 'Boston' and country = 'USA'/sql result outputType=json { customers:{ customer:[ { phone:$phone, city:$city, contact:{ customer-name:$customerName, contact-last-name:$contactLastName, contact-first-name:$contactFirstName } } ] } } /result /query So here, the result element's outputType value is set to json (we had xml and rdf, defaulting to xml). So here, to refer to result set's columns / query-params, we have used the convention of prefixes the name of the parameter with $, which basically signals we are looking up a value. Also, since this is a template based approach, for other special properties like output fields data types, required roles (used for content filtering) needs to be specified in a special way encoded in the value of a field. This is done in the following way, { age : $age(type:integer;requiredRoles:r1,r2) } .. So the first part is the looked up variable (column), and the section covered by ( and ) specifies the extended attributed. Also, another feature to support here are nested queries, that means, from the JSON mapping, we should be able to specify a query to be executed and its result be replaced at the place it was invoked. For this, I've at the moment implemented in the following way: {phone:$phone, @employeesInOfficeSQL:$officeCode-officeCode,$param2-param2 } .. So basically here, calling a query is symbolized by prefixing the field name with an @, where the name of the target query will follow, and the value of the field will contain the parameter mappings, that is out column values map to target query's params, which is connected with a - operator. I personally prefer a short symbol like @ to denote the nested query option, rather than having a keyword like operation, where this is more compact. So this mapping works fully when used with the GSON based streaming JSON implementation [1], that is, if we say, the records are in an JSON array, it will always return as an array, even if the result just gives out a single object. This is done by the JSON message formatter looking at the XML schema created. But, this does not work with nested queries, where the default JSON message formatter works properly. I've verified the XML schema generated do conform to the message returned by the service calls, so this seems like a bug in the new JSON message formatter. I'm attaching here the data service I'm using, and also the WSDL of it. Below contains some sample requests you can do against the data service. (run all with HTTP header Accept: application/json, and HTTP GET) * http://10.100.1.45:9763/services/JSONSample/boston_customers; * http://10.100.1.45:9763/services/JSONSample/employee/1002; * http://10.100.1.45:9763/services/JSONSample/offices; (nested query request) A DSS build with the JSON mapping features can be found here [2], download it and configure the streaming JSON message formatter as explained at [1]. @Shameera, can you please try this out and figure out what the issue might be with the complex result outputs, and if it's a bug in the JSON message formatter, appreciate if you can provide a fix for it. [1] https://builds.apache.org/job/Axis2/javadoc/docs/json_gson_user_guide.html [2] https://svn.wso2.org/repos/wso2/people/anjana/tmp/wso2dss-3.2.0-20140227.zip Cheers, Anjana. -- *Anjana Fernando* Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware JSONSample.dbs Description: Binary data wsdl:definitions xmlns:wsdl=http://schemas.xmlsoap.org/wsdl/; xmlns:xs=http://www.w3.org/2001/XMLSchema; xmlns:ns2=http://ws.wso2.org/dataservice/samples/nested_query_sample/employeesByNumberSQL; xmlns:ns1=http://ws.wso2.org/dataservice/samples/nested_query_sample/employeesInOfficeSQL; xmlns:ns4=http://ws.wso2.org/dataservice/samples/nested_query_sample/customersInBostonSQL; xmlns:ns3=http://ws.wso2.org/dataservice; xmlns:wsaw=http://www.w3.org/2006/05/addressing/wsdl; xmlns:http=http://schemas.xmlsoap.org/wsdl/http/; xmlns:tns=http://ws.wso2.org/dataservice/samples/nested_query_sample; xmlns:ns0=http://ws.wso2.org/dataservice/samples/nested_query_sample/listOfficesSQL; xmlns:soap=http://schemas.xmlsoap.org/wsdl/soap/; xmlns:mime=http://schemas.xmlsoap.org/wsdl/mime/; xmlns:soap12=http://schemas.xmlsoap.org/wsdl/soap12
Re: [Architecture] Dynamically load data sources defined on master-datasources.xml on startup
___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- /sumedha b : bit.ly/sumedha ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- Manoj Fernando Director - Solutions Architecture Contact: LK - +94 112 145345 Mob: +94 773 759340 www.wso2.com ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *[image: http://wso2.com] http://wso2.com Senaka Fernando* Senior Technical Lead; WSO2 Inc.; http://wso2.com * Member; Apache Software Foundation; http://apache.org http://apache.orgE-mail: senaka AT wso2.com http://wso2.com**P: +1 408 754 7388 %2B1%20408%20754%207388; ext: 51736*; *M: +94 77 322 1818 %2B94%2077%20322%201818 Linked-In: http://linkedin.com/in/senakafernando http://linkedin.com/in/senakafernando*Lean . Enterprise . Middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- Manoj Fernando Director - Solutions Architecture Contact: LK - +94 112 145345 Mob: +94 773 759340 www.wso2.com ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *[image: http://wso2.com] http://wso2.com Senaka Fernando* Senior Technical Lead; WSO2 Inc.; http://wso2.com * Member; Apache Software Foundation; http://apache.org http://apache.orgE-mail: senaka AT wso2.com http://wso2.com**P: +1 408 754 7388 %2B1%20408%20754%207388; ext: 51736*; *M: +94 77 322 1818 %2B94%2077%20322%201818 Linked-In: http://linkedin.com/in/senakafernando http://linkedin.com/in/senakafernando*Lean . Enterprise . Middleware -- *Anjana Fernando* Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Dynamically load data sources defined on master-datasources.xml on startup
Hi Manoj, Just having a dependency in the pom.xml does not guarantee the ordering, it is a build time dependency .. maybe you're using some other OSGi service which in-turn transitively uses the ndatasource OSGi service somewhere, or else, most probably, it is by chance, that your bundle is getting activated after ndatasource. Cheers, Anjana. On Fri, Feb 21, 2014 at 8:57 AM, Manoj Fernando man...@wso2.com wrote: I didn't really face that issue, but thanks for pointing out. Since I used the dependency config from the previous throttle component on my pom.xml, the activation order would have worked correctly. Regards, Manoj On Fri, Feb 21, 2014 at 8:45 AM, Amila Maha Arachchi ami...@wso2.comwrote: On Friday, February 21, 2014, Kasun Gajasinghe kas...@wso2.com wrote: I believe the issue here is that the JNDi context won't be available for a carbon component/bundle until the org.wso2.carbon.ndatasource.core bundle is activated during server startup. Any bundle that gets activated before this bundle won't see the JNDi contexts. I think Manoj is facing that issue here. AFAIK, there isn't a way to specify the bundle order. So, one option is to write a o.w.c.core.ServerStartupHandler which will get invoked after the server starts up successfully. By that time, the JNDi contexts etc. will be available. Is there any other option? You can delay the bundle activation by using a osgi service dependency to ndatasource core (if it has registered one). Then the bundle won't get activated until ndatasource core is active. Regards, KasunG On Fri, Feb 21, 2014 at 1:19 AM, Anjana Fernando anj...@wso2.comwrote: Hi guys, Yeah, the existing data source component does exactly that. When you mention a data source in a *-datasources.xml file, you can make it available as JNDI resource, that is what the following section of a data source configuration does: jndiConfig name{RES_NAME}/name !-- optional properties -- environment property name=java.naming.factory.initial{ICS}/property property name=java.naming.provider.url{PROVIDER_URL}/property /environment /jndiConfig And as Senaka mentioned, this is how registry and user-manager looks up its data sources when the server is starting up. Hope this is what Manoj is looking for. Cheers, Anjana. On Fri, Feb 21, 2014 at 1:00 AM, Senaka Fernando sen...@wso2.comwrote: Hi Manoj, Please find the responses inline. On Thu, Feb 20, 2014 at 8:25 PM, Manoj Fernando man...@wso2.com wrote: Hi Senaka, What I meant was the scenario of me as an outside developer wanting to add a new datasource for my own carbon component. Right now, just adding the datasource into the xml doesn't make it available as a JNDI resource. You need to do that extra step of reading the XML and attaching that onto the InitialContext (AFAIK). It would be much nicer IMO to have those datasources added into the initialcontext during bootstrap so that whatever component needing to use any of them can simply use the JNDI key to reference. IINM, you should not be doing this. The JNDI referencing should work from any component, webapp etc. We have done nothing special @ the registry kernel for instance and if the JNDI referencing works in there it should work elsewhere too. Copied Anjana to get this clarified. The convenience on system properties would be similar. We can have a config file under repository conf that will get automatically loaded as system properties for any component that might need them. Yes I know we can pass them as startup parameters, but it was basically a suggestion for sysadmin/developer convenience. Nothing major... but just for convenience. IMHO, it can be a convinience with regards to some use-cases and an inconvinience with regards to some others. I think we need to consider the pros and cons. There are things like clustering, environment-separation, which overrides what (i.e. JAVA_OPTS vs this file) etc that we need to think about. Will add some points later. Thanks, Senaka. Regards, Manoj On Thu, Feb 20, 2014 at 11:14 PM, Senaka Fernando sen...@wso2.comwrote: Hi Manoj, Datasources can be referenced by JNDI key even now. This is how it works in Registry Kernel and UM. Is it done in some other way in carbon components? And, for system properties, you can pass these through the wso2server.sh/bat. I see no benefit of having a separate component to do just that. Am I missing something here? Thanks, Senaka. On Thu, Feb 20, 2014 at 6:37 PM, Manoj Fernando *Kasun Gajasinghe* Software Engineer; WSO2 Inc.; http://wso2.com , *email: * *kasung AT spamfree wso2.com http://wso2.com ** cell: **+94 (77) 678-0813 %2B94%20%2877%29%20678-0813* *linked-in: *http://lk.linkedin.com/in/gajasinghe *blog: **http://kasunbg.org* http://kasunbg.org *twitter: **http://twitter.com/kasunbg* http://twitter.com/kasunbg -- *Amila
Re: [Architecture] CEP UI re-factoring and adding much more functionality
Noted. Cheers, Anjana. On Thu, Jan 30, 2014 at 3:26 PM, Srinath Perera srin...@wso2.com wrote: Mohan, for listing and editing Streams, could you look at integrating WSO2 Store? This component MUST be use by both BAM and CEP to list and edit streams. Anjana please note also. --Srinath On Wed, Jan 22, 2014 at 11:32 AM, Sriskandarajah Suhothayan s...@wso2.com wrote: On Wed, Jan 22, 2014 at 11:18 AM, Lasantha Fernando lasan...@wso2.comwrote: Hi Mohan, +1 for the design. IMO, the in-flow and out-flow UI will be very useful to get an idea about how the events are flowing, which is currently a bit lacking in CEP, I think. Great addition! Will the user be able to sample events generated in the stream UI to test a flow, or will that part come under a separate component? Based on the current plan the Try-it for streams will become a separate component. In future when we have this we can integrate that with the sample event generation UI. Currently the use of Sample event generation UI is, allowing users to create sample events, edit them, and finally copy and send them via curl,JMS, etc.. Suho Thanks, Lasantha On 21 January 2014 19:43, Mohanadarshan Vivekanandalingam mo...@wso2.com wrote: Hi All, As you already knew that we have done major improvements and changes in CEP 3.0.0 (which is a complete re-write) specially in UI aspect. But we found, there are some gaps that we can fix and improve the usability experience further. These changes are targeted for next CEP release which is version 3.1.0. And below UI improvements also targeted on CEP tooling aspect. Please see the below figures which are mock-up design flow of the event stream UI and execution plan UI. Based on the below design we are trying to achieve the default-event concepts and also giving opportunity to advanced event configurations also. Appreciate any ideas and suggestions on this... Thanks Regards, Mohan -- *V. Mohanadarshan* *Software Engineer,* *Data Technologies Team,* *WSO2, Inc. http://wso2.com http://wso2.com * *lean.enterprise.middleware.* email: mo...@wso2.com phone:(+94) 771117673 -- *Lasantha Fernando* Software Engineer - Data Technologies Team WSO2 Inc. http://wso2.com email: lasan...@wso2.com mobile: (+94) 71 5247551 -- *S. Suhothayan * Associate Technical Lead, *WSO2 Inc. *http://wso2.com * http://wso2.com/* lean . enterprise . middleware *cell: (+94) 779 756 757 %28%2B94%29%20779%20756%20757 | blog: http://suhothayan.blogspot.com/ http://suhothayan.blogspot.com/ twitter: http://twitter.com/suhothayan http://twitter.com/suhothayan | linked-in: http://lk.linkedin.com/in/suhothayan http://lk.linkedin.com/in/suhothayan* -- Srinath Perera, Ph.D. Director, Research, WSO2 Inc. Visiting Faculty, University of Moratuwa Member, Apache Software Foundation Research Scientist, Lanka Software Foundation Blog: http://srinathsview.blogspot.com/ Photos: http://www.flickr.com/photos/hemapani/ Phone: 0772360902 -- *Anjana Fernando* Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] [C5] Clustering API
Hi Azeez, Well I used the word 'could' loosely there .. I gave the reasons for the group functionality :) .. I just think it would be a useful functionality for many use cases .. Cheers, Anjana. On Fri, Jan 17, 2014 at 9:30 AM, Afkham Azeez az...@wso2.com wrote: On Fri, Jan 17, 2014 at 10:42 PM, Anjana Fernando anj...@wso2.com wrote: Hi, Yeah, most probably, the task related functionality should not be part of this API, but the group functionality I mentioned *could* be useful, as I explained. The golden rule about API design is when in doubt, leave it out ( http://www.infoq.com/articles/API-Design-Joshua-Bloch) Cheers, Anjana. On Fri, Jan 17, 2014 at 4:08 AM, Kishanthan Thangarajah kishant...@wso2.com wrote: IMO, I think task related APIs are not part of kernel or clustering APIs provided by the kernel. Since this is more of a use-case on functions provided by the hazelcast, we can expose the underline hazelcast instance as an OSGi service which then can be used for the above purpose. On Fri, Jan 17, 2014 at 12:42 PM, Sriskandarajah Suhothayan s...@wso2.com wrote: I'm OK to have a separate API to handle the task stuff, but in that case will it have access to Hazelcast or other internal stuff? and should it be a part of kernel ? I'm not sure what are the bits and pieces we need from Hazelcast to create this API and exposing all of them will make the Caching API ugly :) Regards, Suho On Fri, Jan 17, 2014 at 11:44 AM, Supun Malinga sup...@wso2.comwrote: Hi, Also in here we should consider the use cases of OC as well IMO.. thanks, On Fri, Jan 17, 2014 at 11:24 AM, Afkham Azeez az...@wso2.com wrote: I think this is making clustering more specific to running tasks. Handling tasks should be implemented at a layer above clustering. On Fri, Jan 17, 2014 at 11:06 AM, Sriskandarajah Suhothayan s...@wso2.com wrote: Based on the Anjana's suggestions, to support different products having different way of coordination. My suggestion is as follows //This has to be a *one time thing* I'm not sure how we should have API for this! //ID is Task or GroupID //Algorithm-class can be a class or name registered in carbon TBD void preformElection(ID, Algorithm-class); //Register current node to do/join the Task denoted by the ID void registerAsTaskWorker(ID); //Check is the current node is the coordinator boolean isCoordinator(ID); //Get the coordinator for the ID. NodeID getCoordinator(ID); We also need a Listener for Coordinator CoordinatorListener void coordinatorChanged(ID,NodeID); WDYT? Suho On Thu, Jan 16, 2014 at 8:32 PM, Anjana Fernando anj...@wso2.comwrote: Hi, On Thu, Jan 16, 2014 at 5:10 AM, Sriskandarajah Suhothayan s...@wso2.com wrote: We also need an election API, E.g for certain tasks only one/few node can be responsible and if that node dies some one else need to take that task. Here user should be able to give the Task Key and should be able to get to know whether he is responsible for the task. It is also impotent that the election logic is pluggable based on task The task scenarios are similar to what we do in our scheduled tasks component. I'm not sure if that type of functionality should be included in this API, or did you mean, you need the election API to build on top of it? .. Also, another requirement we have is, creating groups within a cluster. That is, when we work on the cluster, sometimes we need a node a specific group/groups. And it each group will have it's own coordinator. So then, there wouldn't be a single coordinator for the full physical cluster. I know we can build this functionality on a higher layer than this API, but then, effectively the isCoordinator for the full cluster will not be used, and also, each component that uses similar group functionality will roll up their own implementation of this. So I'm thinking if we build in some robust group features to this API itself, it will be very convenient for it consumers. So what I suggest is like, while a member joins for the full cluster automatically, can we have another API method like, joinGroup(groupId), then later when we register a membership listener, we can give the groupId as an optional parameter to register a membership listener for a specific group. And as for the isCoordinator functionality, we can also overload that method to provide a gropuId, or else, in the membership listener itself, we can have an additional method like coordinatorChanged(String memberId) or else, maybe more suitable, assumedCoordinatorRole() or something like that to simply say, you just became the coordinator of this full cluster/group. Cheers, Anjana. Regards Suho On Thu, Jan 16, 2014 at 4:56 PM, Afkham Azeez az...@wso2.comwrote: On Thu, Jan 16, 2014 at 4:55 PM, Kishanthan Thangarajah kishant...@wso2.com wrote: Adding more. Since we will follow
Re: [Architecture] [C5] Clustering API
, Inc. lean.enterprise.middleware Mobile - +94773426635 Blog - *http://kishanthan.wordpress.com http://kishanthan.wordpress.com* Twitter - *http://twitter.com/kishanthan http://twitter.com/kishanthan* -- *Afkham Azeez* Director of Architecture; WSO2, Inc.; http://wso2.com Member; Apache Software Foundation; http://www.apache.org/ * http://www.apache.org/* *email: **az...@wso2.com* az...@wso2.com * cell: +94 77 3320919 %2B94%2077%203320919 blog: * *http://blog.afkham.org* http://blog.afkham.org *twitter: **http://twitter.com/afkham_azeez*http://twitter.com/afkham_azeez * linked-in: **http://lk.linkedin.com/in/afkhamazeez http://lk.linkedin.com/in/afkhamazeez* *Lean . Enterprise . Middleware* ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *S. Suhothayan * Associate Technical Lead, *WSO2 Inc. *http://wso2.com * http://wso2.com/* lean . enterprise . middleware *cell: (+94) 779 756 757 %28%2B94%29%20779%20756%20757 | blog: http://suhothayan.blogspot.com/ http://suhothayan.blogspot.com/ twitter: http://twitter.com/suhothayan http://twitter.com/suhothayan | linked-in: http://lk.linkedin.com/in/suhothayan http://lk.linkedin.com/in/suhothayan* ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *Anjana Fernando* Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] BAM Notifications From Hive
Hi Sanjiva, The deleting works by giving the row key of the record, so it will not conflict with anything else. The scenario is, where after a batch job is run, lets say, some of the result is processed, needs to be sent as an notification to someone. For example, after the mediation stats are processed, we can check if a specific service is overloaded with requests or anything like that, and we send out that information out to a stream using this feature. And we can define a flow where we get event from that stream, and send out an email/sms notification. The coordination across the Hadoop cluster works where, the result will only be written by a single operation, similar to records being written to a database. And the task the processes these data also is running in a fail-over aware manner, where if the task goes down, it will be be started in another node. Cheers, Anjana. On Thu, Oct 24, 2013 at 1:56 PM, Sanjiva Weerawarana sanj...@wso2.comwrote: Anjana given that Cassandra is not transactional how does deleting work when someone else may be writing at the same time? I'm a bit unclear why its critical to send events out of Hive itself. Can you elaborate the scenario please? How do you coordinate that across potentially a large Hadoop cluster? Sanjiva. On Tue, Oct 22, 2013 at 3:00 PM, Anjana Fernando anj...@wso2.com wrote: Hi Srinath, Yeah, the data is always cleaned up when the task is run. Basically, the task reads all the data in the column family, send each event to the target stream, and in the same time, deletes it from the data store. The notification is totally customizable by the user, what this simply does is send any arbitrary data to a stream, at the point when possibly some insert statement is executed from the Hive script, which can be at the end of the script or anywhere. After the event comes to a stream, the user can do anything with it, either run an CEP query against it, or directly passthrough it to some transport like email or sms. Cheers, Anjana. On Tue, Oct 22, 2013 at 2:30 PM, Srinath Perera srin...@wso2.com wrote: Hi Anjana, Basically, we are polling the cassandra location. I think it is OK. But we need to make sure we clean up these tasks when we detected that job has finished. What does the notification says? does it says job has finished or can user give an condition when to send the notification? We eventually need that. --Srinath On Tue, Oct 22, 2013 at 2:03 PM, Anjana Fernando anj...@wso2.comwrote: Hi, For BAM notification, the approach we have at the moment is, using CEP, which we do ship by default with BAM now. But, there is another limitation, where we cannot trigger any notifications from Hive scripts, which is what is used mostly. So the requirement is, somehow, we should be able to send messages from Hive to a stream to send out notifications, that is, when messages comes to a stream, we can use (CEPs) message builder/formatters to send out email/sms etc.. So I've implemented a simple mechanism to do this, where when Hive wants to send out a message to a stream, it will write a data row to a pre-defined Cassandra CF (bam_notification_messages), where it will have a column with the name streamId, and other columns (maps to payload section of a stream). And then, in the BAM server, there is a scheduled task running, where it polls the data in that CF (5 second intervals), to get the existing rows, and reads the streamId and other columns to generate an event to be send to the target stream, and processed rows will be deleted. So with this approach, effectively, we can now send events to a specific stream from Hive. I've tested this feature in BAM. And hope this approach is fine for the requirement. Cheers, Anjana. -- *Anjana Fernando* Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware -- Srinath Perera, Ph.D. Director, Research, WSO2 Inc. Visiting Faculty, University of Moratuwa Member, Apache Software Foundation Research Scientist, Lanka Software Foundation Blog: http://srinathsview.blogspot.com/ Photos: http://www.flickr.com/photos/hemapani/ Phone: 0772360902 -- *Anjana Fernando* Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- Sanjiva Weerawarana, Ph.D. Founder, Chairman CEO; WSO2, Inc.; http://wso2.com/ email: sanj...@wso2.com; phone: +94 11 763 9614; cell: +94 77 787 6880 | +1 650 265 8311 blog: http://sanjiva.weerawarana.org/ Lean . Enterprise . Middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *Anjana Fernando* Technical Lead WSO2 Inc. | http://wso2.com lean
[Architecture] BAM Notifications From Hive
Hi, For BAM notification, the approach we have at the moment is, using CEP, which we do ship by default with BAM now. But, there is another limitation, where we cannot trigger any notifications from Hive scripts, which is what is used mostly. So the requirement is, somehow, we should be able to send messages from Hive to a stream to send out notifications, that is, when messages comes to a stream, we can use (CEPs) message builder/formatters to send out email/sms etc.. So I've implemented a simple mechanism to do this, where when Hive wants to send out a message to a stream, it will write a data row to a pre-defined Cassandra CF (bam_notification_messages), where it will have a column with the name streamId, and other columns (maps to payload section of a stream). And then, in the BAM server, there is a scheduled task running, where it polls the data in that CF (5 second intervals), to get the existing rows, and reads the streamId and other columns to generate an event to be send to the target stream, and processed rows will be deleted. So with this approach, effectively, we can now send events to a specific stream from Hive. I've tested this feature in BAM. And hope this approach is fine for the requirement. Cheers, Anjana. -- *Anjana Fernando* Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] BAM Notifications From Hive
Hi Srinath, Yeah, the data is always cleaned up when the task is run. Basically, the task reads all the data in the column family, send each event to the target stream, and in the same time, deletes it from the data store. The notification is totally customizable by the user, what this simply does is send any arbitrary data to a stream, at the point when possibly some insert statement is executed from the Hive script, which can be at the end of the script or anywhere. After the event comes to a stream, the user can do anything with it, either run an CEP query against it, or directly passthrough it to some transport like email or sms. Cheers, Anjana. On Tue, Oct 22, 2013 at 2:30 PM, Srinath Perera srin...@wso2.com wrote: Hi Anjana, Basically, we are polling the cassandra location. I think it is OK. But we need to make sure we clean up these tasks when we detected that job has finished. What does the notification says? does it says job has finished or can user give an condition when to send the notification? We eventually need that. --Srinath On Tue, Oct 22, 2013 at 2:03 PM, Anjana Fernando anj...@wso2.com wrote: Hi, For BAM notification, the approach we have at the moment is, using CEP, which we do ship by default with BAM now. But, there is another limitation, where we cannot trigger any notifications from Hive scripts, which is what is used mostly. So the requirement is, somehow, we should be able to send messages from Hive to a stream to send out notifications, that is, when messages comes to a stream, we can use (CEPs) message builder/formatters to send out email/sms etc.. So I've implemented a simple mechanism to do this, where when Hive wants to send out a message to a stream, it will write a data row to a pre-defined Cassandra CF (bam_notification_messages), where it will have a column with the name streamId, and other columns (maps to payload section of a stream). And then, in the BAM server, there is a scheduled task running, where it polls the data in that CF (5 second intervals), to get the existing rows, and reads the streamId and other columns to generate an event to be send to the target stream, and processed rows will be deleted. So with this approach, effectively, we can now send events to a specific stream from Hive. I've tested this feature in BAM. And hope this approach is fine for the requirement. Cheers, Anjana. -- *Anjana Fernando* Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware -- Srinath Perera, Ph.D. Director, Research, WSO2 Inc. Visiting Faculty, University of Moratuwa Member, Apache Software Foundation Research Scientist, Lanka Software Foundation Blog: http://srinathsview.blogspot.com/ Photos: http://www.flickr.com/photos/hemapani/ Phone: 0772360902 -- *Anjana Fernando* Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] BAM Data Archival Feature improvements
Hi Dipesh, Thank you for the ideas. Actually, yeah, we can also support archiving a general CF without filtering the records on field like, stream version and so on. It should be a straight forward functionality, without changing much in the backend. As for using Hive, you've a point there. We also actually considered it earlier, but thought not going in that approach, thinking it has some limitations, where we can't address the data when the column names are not known in Cassandra. But by looking into it more now, we identified that it can actually be done. So yeah, we will now look again into using Hive to do the processing. And also with that, we can easily support archiving from/to several data sources, such as RDBMS, Cassandra, and HDFS. And also now, for the indexing concerns we are going to use custom index based approach. Now actually, most probably if Hive is used, we are going to straight away use the the functionality given by the incremental processing, where it already contains the indexing features for timestamps. So with these features tied in, hopefully it would be a solid implementation. Cheers, Anjana. On Wed, Sep 4, 2013 at 11:44 AM, Dipesh Chheda wrote: Hi Malith, The current (hive-based) solution (and it seems the proposed solution) only handles Column Families (CFs) created/maintained by BAM (based on the stream-def). Couple of improvements would really help: - Currently, the archiving configuration is per 'CF+stream-def-version'. Is it possible to have just one Archive configuration that takes care of a given CF irrespective of the stream-def-version. - Archiving feature to support 'any CF' exist in a given Cassandra Cluster. We are currently using Cassandra (instead of RDBMS like MySql) to store Analyzed Data. Of course, the configuration would need to have name of the 'timestamp' column for each CF, based on which the data would be filtered for archiving. For Hector-based implementation, I would imagine that 'non-secondary' indexing on the 'timestamp column' would require to efficiently filter and archive the data. If you agree, how do you folks plan to handle this? If not required, how would the solution scale/perform-better without indexing? Also, in addition to archiving data from Cassandra (ActiveStore) to Cassandra (ArchiveStore), shouldn't it support archiving to traditional-SAN-like-storage-options, HDFS etc. I think, these other options could easily/naturally supported by Hive itself - where the hive-result could be streamed as key-value to these type of archive-stores. Regards, Dipesh Malith Dhanushka wrote Hi folks, We(BAM team, Sumedha) had a discussion about the $Subject and following are the suggested improvements for the Cassandra data archival feature in BAM. - Remove hive script based archiving and use hector API to directly issue archive queries to Cassandra (Current implementation is based on hive where it generates hive script and archiving process uses map-reduce jobs to achieve the task and it has a limitation of discarding custom key value pares in column family) - Use Task component for scheduling purposes - Archive data to external Cassandra ring - Major UI improvements - List the current archiving tasks - Edit, Remove and Schedule archiving tasks - Add new archiving task If there is any additional requirements please raise. Thanks, Malith -- Malith Dhanushka Engineer - Data Technologies *WSO2, Inc. : wso2.com* *Mobile* : +94 716 506 693 ___ Architecture mailing list Architecture@ https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- View this message in context: http://wso2-oxygen-tank.10903.n7.nabble.com/BAM-Data-Archival-Feature-improvements-tp85315p85330.html Sent from the WSO2 Architecture mailing list archive at Nabble.com. ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *Anjana Fernando* Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] NTask component updated to use Hazelcast instead of ZooKeeper
Hi Amila, For BAM as in, BAM required ZK for ntask, since ntask doesn't need ZK anymore, BAM will not need ZK anymore. And yeah, TS can be used, the idea of TS is to be used in large deployments where features like tenant partition is used. Or else, the usual clustered mode is enough. Cheers, Anjana. On Mon, Aug 19, 2013 at 11:16 AM, Amila Maha Arachchi ami...@wso2.comwrote: Cloud deployment was planning to use ZooKeeper for the BAM setup. We will use TS instead. On Mon, Aug 19, 2013 at 11:01 AM, Anjana Fernando anj...@wso2.com wrote: Hi, I had a chat with Dimuthu and, she said, they are not using ZooKeeper in AF it seems. Cheers, Anjana. On Mon, Aug 19, 2013 at 10:49 AM, Anjana Fernando anj...@wso2.comwrote: Hi Sanjiva, Yeah, sure, will schedule a review, and will talk to the app-factory guys. Cheers, Anjana. On Mon, Aug 19, 2013 at 6:31 AM, Sanjiva Weerawarana sanj...@wso2.comwrote: Excellent! Can we do a review too before this is final? Ref AF use of ZK - please help them to undo it ASAP .. we need to totally drop ZooKeeper. Sanjiva. On Sun, Aug 18, 2013 at 2:46 AM, Anjana Fernando anj...@wso2.comwrote: Hi everyone, I've changed the ntask component to use Hazelcast for the coordination / group communication activities. This is because, the earlier ZooKeeper based coordination component use was too much troublesome, where it takes a whole different ZooKeeper cluster to be set up to properly cluster a Carbon server which has scheduled tasks. And also, ZooKeeper has little support for proper error handling, and it's hard/not-possible to prevent some edge cases. So with the Hazelcast integration, you will not have to install a different server, since it just works in a peer to peer fashion inside the Carbon server itself. And also since it's also used in Axis2 clustering, the integration is seamless. The scheduled tasks has three main modes it can work, STANDALONE, CLUSTERED and REMOTE. I've introduced a new setting called AUTO, that is being set in tasks-config.xml, as the default, where, it automatically checks if clustering is enabled in the system, and switches to CLUSTERED mode if so, or it falls back to the STANDALONE mode. So in the typical setup, there no additional settings needs to be changed for distributed tasks to work properly (other than, startup task server count, which is set to 2 by default). With this change, I've removed the coordination (ZK based) components from products which uses it for ntask. The following products are the changes I did in branch/trunk and built the possible ones. DSS - Branch/Trunk AS:- Branch/Trunk, cannot build branch because of a Jaggary version problem ELB:- Trunk, coordination-server also removed GREG:- Branch/Trunk, cannot build branch - Jaggary version problem Manager:- Trunk AppFactory:- Trunk BAM:- Trunk BPS:- Trunk SS also uses the coordination-core feature, which they seem to use for other purposes, not for scheduled tasks. I'd recommend, if possible, to re-write that part of the code to use Hazelcast instead. Cheers, Anjana. -- *Anjana Fernando* Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- Sanjiva Weerawarana, Ph.D. Founder, Chairman CEO; WSO2, Inc.; http://wso2.com/ email: sanj...@wso2.com; phone: +94 11 763 9614; cell: +94 77 787 6880| +1 650 265 8311 blog: http://sanjiva.weerawarana.org/ Lean . Enterprise . Middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *Anjana Fernando* Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware -- *Anjana Fernando* Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *Amila Maharachchi* Senior Technical Lead WSO2, Inc.; http://wso2.com Blog: http://maharachchi.blogspot.com Mobile: +94719371446 ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *Anjana Fernando* Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] NTask component updated to use Hazelcast instead of ZooKeeper
Hi, I had a chat with Dimuthu and, she said, they are not using ZooKeeper in AF it seems. Cheers, Anjana. On Mon, Aug 19, 2013 at 10:49 AM, Anjana Fernando anj...@wso2.com wrote: Hi Sanjiva, Yeah, sure, will schedule a review, and will talk to the app-factory guys. Cheers, Anjana. On Mon, Aug 19, 2013 at 6:31 AM, Sanjiva Weerawarana sanj...@wso2.comwrote: Excellent! Can we do a review too before this is final? Ref AF use of ZK - please help them to undo it ASAP .. we need to totally drop ZooKeeper. Sanjiva. On Sun, Aug 18, 2013 at 2:46 AM, Anjana Fernando anj...@wso2.com wrote: Hi everyone, I've changed the ntask component to use Hazelcast for the coordination / group communication activities. This is because, the earlier ZooKeeper based coordination component use was too much troublesome, where it takes a whole different ZooKeeper cluster to be set up to properly cluster a Carbon server which has scheduled tasks. And also, ZooKeeper has little support for proper error handling, and it's hard/not-possible to prevent some edge cases. So with the Hazelcast integration, you will not have to install a different server, since it just works in a peer to peer fashion inside the Carbon server itself. And also since it's also used in Axis2 clustering, the integration is seamless. The scheduled tasks has three main modes it can work, STANDALONE, CLUSTERED and REMOTE. I've introduced a new setting called AUTO, that is being set in tasks-config.xml, as the default, where, it automatically checks if clustering is enabled in the system, and switches to CLUSTERED mode if so, or it falls back to the STANDALONE mode. So in the typical setup, there no additional settings needs to be changed for distributed tasks to work properly (other than, startup task server count, which is set to 2 by default). With this change, I've removed the coordination (ZK based) components from products which uses it for ntask. The following products are the changes I did in branch/trunk and built the possible ones. DSS - Branch/Trunk AS:- Branch/Trunk, cannot build branch because of a Jaggary version problem ELB:- Trunk, coordination-server also removed GREG:- Branch/Trunk, cannot build branch - Jaggary version problem Manager:- Trunk AppFactory:- Trunk BAM:- Trunk BPS:- Trunk SS also uses the coordination-core feature, which they seem to use for other purposes, not for scheduled tasks. I'd recommend, if possible, to re-write that part of the code to use Hazelcast instead. Cheers, Anjana. -- *Anjana Fernando* Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- Sanjiva Weerawarana, Ph.D. Founder, Chairman CEO; WSO2, Inc.; http://wso2.com/ email: sanj...@wso2.com; phone: +94 11 763 9614; cell: +94 77 787 6880 | +1 650 265 8311 blog: http://sanjiva.weerawarana.org/ Lean . Enterprise . Middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *Anjana Fernando* Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware -- *Anjana Fernando* Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Improving activity ID propagation in activity data publishing
___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *Anjana Fernando* Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
Re: [Architecture] Annotation scheme for Hive scripts
* *Mobile* : +94 716 506 693 ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- Malith Dhanushka Engineer - Data Technologies *WSO2, Inc. : wso2.com* *Mobile* : +94 716 506 693 -- Malith Dhanushka Engineer - Data Technologies *WSO2, Inc. : wso2.com* *Mobile* : +94 716 506 693 ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture -- *Anjana Fernando* Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware ___ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture