Re: [Architecture] tenant specific MQTT receivers in DAS , not listening to topics once tenant get unloaded

2017-03-20 Thread Anjana Fernando
Hi,

On Mon, Mar 20, 2017 at 11:02 AM, Sinthuja Ragendran <sinth...@wso2.com>
wrote:

> Hi,
>
> As the receiver configurations are deployable artefacts, those will be
> active when the tenant is loaded. One approach is to have all tenants
> loaded indefinitely. I think this will have high memory. And therefore we
> internally discussed below approach to handling this problem.
>
> Instead of having multiple MQTT receiver configurations per tenant to
> handle this, implement a specialised/privileged MQTT event receiver which
> could handle multiple subscriptions on behalf of tenants, and it's only
> deployable in the super tenant mode. In that case, this event receiver will
> have the topic URI with {tenantDomain} placeholder and it is used to
> subscribe to the specific tenanted topic. And then, based on which topic
> the event has arrived the tenant flow will be started and an event will be
> inserted into specific tenant space. By this way, only the tenants which
> are actively used/sending events will be loaded, and not all tenants are
> required to be loaded.
>
> Please share your thoughts on this. Also, AFAIR we had the similar
> requirement for Task execution. @Anjana, how are we handling that?
>

Yes, the tasks and their definitions are stored in the super tenant space.
So they get triggered appropriately, and as required, any tenant specific
resources would be loaded by the task implementation.

Cheers,
Anjana.


>
> Thanks,
> Sinthuja.
>
> On Mon, Mar 20, 2017 at 10:50 AM, Jasintha Dasanayake <jasin...@wso2.com>
> wrote:
>
>> HI All
>>
>> When DAS working in tenant mode and a particular tenant has MQTT
>> receivers, those cannot be activated once tenants get unloaded. For an
>> example , if I restart the DAS then those tenants specific MQTT receivers
>> are not loaded unless we explicitly load that particular tenant. IMO,
>> expected behavior would be, those receivers should be loaded and subscribed
>> to a particular topic without loading the tenants explicitly.
>>
>> Are there any known mechanism to address this particular problem ?
>>
>> Thanks and Regards
>> /jasintha
>>
>> --
>>
>> *Jasintha Dasanayake**Associate Technical Lead*
>>
>> *WSO2 Inc. | http://wso2.com <http://wso2.com/>lean . enterprise .
>> middleware*
>>
>>
>> *mobile :- 0711-368-118 <071%20136%208118>*
>>
>
>
>
> --
> *Sinthuja Rajendran*
> Technical Lead
> WSO2, Inc.:http://wso2.com
>
> Blog: http://sinthu-rajan.blogspot.com/
> Mobile: +94774273955 <077%20427%203955>
>
>
>


-- 
*Anjana Fernando*
Associate Director / Architect
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Opentracing

2017-01-31 Thread Anjana Fernando
Hi Srinath,

Looks interesting. @Gokul, can you please have a look and give a summary.
Maybe we can submit a GSoC project for this, if it's actually worth doing.

Cheers,
Anjana.

On Wed, Feb 1, 2017 at 10:36 AM, Srinath Perera <srin...@wso2.com> wrote:

> They are trying to build an open standard ( or so they says).
> It seem to come from zipkin
> Having one would solve lot of problems.
>
>- http://opentracing.io/
>- https://medium.com/opentracing/distributed-tracing-in-10-minutes-
>51b378ee40f1#.5rfk4tfwa
>- https://medium.com/opentracing/towards-turnkey-distributed-tracing-
>5f4297d1736#.xiy7fet0j
>
> Anjana, could you have a look? If it make sense, maybe we can support it.
>
> --
> 
> Srinath Perera, Ph.D.
>http://people.apache.org/~hemapani/
>http://srinathsview.blogspot.com/
>



-- 
*Anjana Fernando*
Associate Director / Architect
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] RDBMS based coordinator election algorithm for MB

2016-11-06 Thread Anjana Fernando
Hi Ramith,

Sure. Actually, I was talking with SameeraR to take over this and create a
common component which has the required coordination functionality. The
idea is to create a component, where the providers can be plugged in, such
as the RDBMS based one, ZK, or any other container specific provider that
maybe out there.

Cheers,
Anjana.

On Mon, Nov 7, 2016 at 12:38 PM, Ramith Jayasinghe <ram...@wso2.com> wrote:

> this might require some work.. shall we have a chat?
>
> On Thu, Nov 3, 2016 at 3:52 PM, Anjana Fernando <anj...@wso2.com> wrote:
>
>> Ping! ..
>>
>> On Wed, Nov 2, 2016 at 5:03 PM, Anjana Fernando <anj...@wso2.com> wrote:
>>
>>> Hi,
>>>
>>> On Wed, Nov 2, 2016 at 3:14 PM, Asanka Abeyweera <asank...@wso2.com>
>>> wrote:
>>>
>>>> Hi Anjana,
>>>>
>>>> Currently, the implementation is part of the MB code (not a common
>>>> component).
>>>>
>>>
>>> Okay, can we please get it as a common component.
>>>
>>> Cheers,
>>> Anjana.
>>>
>>>
>>>>
>>>> On Wed, Nov 2, 2016 at 3:00 PM, Anjana Fernando <anj...@wso2.com>
>>>> wrote:
>>>>
>>>>> Hi Asanka/Ramith,
>>>>>
>>>>> So for C5 based Streaming Analytics solution, we need coordination
>>>>> functionality there as well. Is the functionality mentioned here created 
>>>>> as
>>>>> a common component or baked in to the MB code? .. if so, can we please get
>>>>> it implemented it as a generic component, so other products can also use
>>>>> it.
>>>>>
>>>>> Cheers,
>>>>> Anjana.
>>>>>
>>>>> On Tue, Aug 9, 2016 at 3:10 PM, Anjana Fernando <anj...@wso2.com>
>>>>> wrote:
>>>>>
>>>>>> Great! ..
>>>>>>
>>>>>> Cheers,
>>>>>> Anjana.
>>>>>>
>>>>>> On Tue, Aug 9, 2016 at 1:49 PM, Asanka Abeyweera <asank...@wso2.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Anjana,
>>>>>>>
>>>>>>> Thank you for the suggestion. We have already done a similar thing.
>>>>>>> We have added a backoff time after creating the leader entry and check 
>>>>>>> if
>>>>>>> the leader entry is the entry created by self before informing the 
>>>>>>> leader
>>>>>>> change.
>>>>>>>
>>>>>>> On Tue, Aug 9, 2016 at 12:27 PM, Anjana Fernando <anj...@wso2.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I see, thanks for the clarification, looks good! .. I think small
>>>>>>>> thing to consider is, to avoid the situation where, the current leader 
>>>>>>>> goes
>>>>>>>> away, and two other competes to become the leader, and the first one 
>>>>>>>> and
>>>>>>>> the second one checks (reads) the table to check the last heartbeat and
>>>>>>>> figures out that the leader is outdated at the same time, and then 
>>>>>>>> first
>>>>>>>> one delete the entry and puts his one, and after that, second one will 
>>>>>>>> also
>>>>>>>> delete the existing one and put his one, so both will think they 
>>>>>>>> became the
>>>>>>>> leader, due to the condition that both succeeded in adding the entry
>>>>>>>> without an error. So this can probably be fixed by checking back after 
>>>>>>>> a
>>>>>>>> bit of time if the current node is actually me, which probabilistically
>>>>>>>> will work well, if that time period is sufficient big enough than a 
>>>>>>>> typical
>>>>>>>> database transaction required by a node to do the earlier operations. 
>>>>>>>> Or
>>>>>>>> else, we should make sure the database transaction level used in this
>>>>>>>> scenario is at least REPEATABLE_READ, where when we read the record, it
>>>>>>>> will lock it throughout the transaction. So some DBMSs does not support
>>>>>>>> REPEATABLE_READ, where in that case, we should be able to use 
>>>>&

Re: [Architecture] RDBMS based coordinator election algorithm for MB

2016-11-03 Thread Anjana Fernando
Ping! ..

On Wed, Nov 2, 2016 at 5:03 PM, Anjana Fernando <anj...@wso2.com> wrote:

> Hi,
>
> On Wed, Nov 2, 2016 at 3:14 PM, Asanka Abeyweera <asank...@wso2.com>
> wrote:
>
>> Hi Anjana,
>>
>> Currently, the implementation is part of the MB code (not a common
>> component).
>>
>
> Okay, can we please get it as a common component.
>
> Cheers,
> Anjana.
>
>
>>
>> On Wed, Nov 2, 2016 at 3:00 PM, Anjana Fernando <anj...@wso2.com> wrote:
>>
>>> Hi Asanka/Ramith,
>>>
>>> So for C5 based Streaming Analytics solution, we need coordination
>>> functionality there as well. Is the functionality mentioned here created as
>>> a common component or baked in to the MB code? .. if so, can we please get
>>> it implemented it as a generic component, so other products can also use
>>> it.
>>>
>>> Cheers,
>>> Anjana.
>>>
>>> On Tue, Aug 9, 2016 at 3:10 PM, Anjana Fernando <anj...@wso2.com> wrote:
>>>
>>>> Great! ..
>>>>
>>>> Cheers,
>>>> Anjana.
>>>>
>>>> On Tue, Aug 9, 2016 at 1:49 PM, Asanka Abeyweera <asank...@wso2.com>
>>>> wrote:
>>>>
>>>>> Hi Anjana,
>>>>>
>>>>> Thank you for the suggestion. We have already done a similar thing. We
>>>>> have added a backoff time after creating the leader entry and check if the
>>>>> leader entry is the entry created by self before informing the leader
>>>>> change.
>>>>>
>>>>> On Tue, Aug 9, 2016 at 12:27 PM, Anjana Fernando <anj...@wso2.com>
>>>>> wrote:
>>>>>
>>>>>> I see, thanks for the clarification, looks good! .. I think small
>>>>>> thing to consider is, to avoid the situation where, the current leader 
>>>>>> goes
>>>>>> away, and two other competes to become the leader, and the first one and
>>>>>> the second one checks (reads) the table to check the last heartbeat and
>>>>>> figures out that the leader is outdated at the same time, and then first
>>>>>> one delete the entry and puts his one, and after that, second one will 
>>>>>> also
>>>>>> delete the existing one and put his one, so both will think they became 
>>>>>> the
>>>>>> leader, due to the condition that both succeeded in adding the entry
>>>>>> without an error. So this can probably be fixed by checking back after a
>>>>>> bit of time if the current node is actually me, which probabilistically
>>>>>> will work well, if that time period is sufficient big enough than a 
>>>>>> typical
>>>>>> database transaction required by a node to do the earlier operations. Or
>>>>>> else, we should make sure the database transaction level used in this
>>>>>> scenario is at least REPEATABLE_READ, where when we read the record, it
>>>>>> will lock it throughout the transaction. So some DBMSs does not support
>>>>>> REPEATABLE_READ, where in that case, we should be able to use 
>>>>>> SERIALIZABLE,
>>>>>> which most of them support.
>>>>>>
>>>>>> Cheers,
>>>>>> Anjana.
>>>>>>
>>>>>> On Tue, Aug 9, 2016 at 11:11 AM, Maninda Edirisooriya <
>>>>>> mani...@wso2.com> wrote:
>>>>>>
>>>>>>> Hi Anjana,
>>>>>>>
>>>>>>> After having an offline chat with Asanka what I understood was that
>>>>>>> the leader election was done completely via the database but with no
>>>>>>> network communication. The leader is mentioned in the database first. 
>>>>>>> Then
>>>>>>> the leader updates the node data periodically in the database. If some 
>>>>>>> node
>>>>>>> realizes the data in the DB are outdated that means the leader was
>>>>>>> disconnected. Then that node will look at the created timestamp of the
>>>>>>> leader entry. If that is not very recent that means there was no new 
>>>>>>> leader
>>>>>>> elected recently. So he will try to update the leader entry with his 
>>>>>>> ID. As
>>>>>>> I understand there the leader entry i

Re: [Architecture] RDBMS based coordinator election algorithm for MB

2016-11-02 Thread Anjana Fernando
Hi,

On Wed, Nov 2, 2016 at 3:14 PM, Asanka Abeyweera <asank...@wso2.com> wrote:

> Hi Anjana,
>
> Currently, the implementation is part of the MB code (not a common
> component).
>

Okay, can we please get it as a common component.

Cheers,
Anjana.


>
> On Wed, Nov 2, 2016 at 3:00 PM, Anjana Fernando <anj...@wso2.com> wrote:
>
>> Hi Asanka/Ramith,
>>
>> So for C5 based Streaming Analytics solution, we need coordination
>> functionality there as well. Is the functionality mentioned here created as
>> a common component or baked in to the MB code? .. if so, can we please get
>> it implemented it as a generic component, so other products can also use
>> it.
>>
>> Cheers,
>> Anjana.
>>
>> On Tue, Aug 9, 2016 at 3:10 PM, Anjana Fernando <anj...@wso2.com> wrote:
>>
>>> Great! ..
>>>
>>> Cheers,
>>> Anjana.
>>>
>>> On Tue, Aug 9, 2016 at 1:49 PM, Asanka Abeyweera <asank...@wso2.com>
>>> wrote:
>>>
>>>> Hi Anjana,
>>>>
>>>> Thank you for the suggestion. We have already done a similar thing. We
>>>> have added a backoff time after creating the leader entry and check if the
>>>> leader entry is the entry created by self before informing the leader
>>>> change.
>>>>
>>>> On Tue, Aug 9, 2016 at 12:27 PM, Anjana Fernando <anj...@wso2.com>
>>>> wrote:
>>>>
>>>>> I see, thanks for the clarification, looks good! .. I think small
>>>>> thing to consider is, to avoid the situation where, the current leader 
>>>>> goes
>>>>> away, and two other competes to become the leader, and the first one and
>>>>> the second one checks (reads) the table to check the last heartbeat and
>>>>> figures out that the leader is outdated at the same time, and then first
>>>>> one delete the entry and puts his one, and after that, second one will 
>>>>> also
>>>>> delete the existing one and put his one, so both will think they became 
>>>>> the
>>>>> leader, due to the condition that both succeeded in adding the entry
>>>>> without an error. So this can probably be fixed by checking back after a
>>>>> bit of time if the current node is actually me, which probabilistically
>>>>> will work well, if that time period is sufficient big enough than a 
>>>>> typical
>>>>> database transaction required by a node to do the earlier operations. Or
>>>>> else, we should make sure the database transaction level used in this
>>>>> scenario is at least REPEATABLE_READ, where when we read the record, it
>>>>> will lock it throughout the transaction. So some DBMSs does not support
>>>>> REPEATABLE_READ, where in that case, we should be able to use 
>>>>> SERIALIZABLE,
>>>>> which most of them support.
>>>>>
>>>>> Cheers,
>>>>> Anjana.
>>>>>
>>>>> On Tue, Aug 9, 2016 at 11:11 AM, Maninda Edirisooriya <
>>>>> mani...@wso2.com> wrote:
>>>>>
>>>>>> Hi Anjana,
>>>>>>
>>>>>> After having an offline chat with Asanka what I understood was that
>>>>>> the leader election was done completely via the database but with no
>>>>>> network communication. The leader is mentioned in the database first. 
>>>>>> Then
>>>>>> the leader updates the node data periodically in the database. If some 
>>>>>> node
>>>>>> realizes the data in the DB are outdated that means the leader was
>>>>>> disconnected. Then that node will look at the created timestamp of the
>>>>>> leader entry. If that is not very recent that means there was no new 
>>>>>> leader
>>>>>> elected recently. So he will try to update the leader entry with his ID. 
>>>>>> As
>>>>>> I understand there the leader entry is using the leader ID and the
>>>>>> timestamp as the primary key. Even several nodes try to do it
>>>>>> simultaneously only one node will successfully be able to update the 
>>>>>> entry
>>>>>> with the help of atomicity provided by the DB. Others members will note 
>>>>>> the
>>>>>> timestamp of the leader was updated so will accept the first one who
>>>>>> upd

Re: [Architecture] RDBMS based coordinator election algorithm for MB

2016-11-02 Thread Anjana Fernando
Hi Asanka/Ramith,

So for C5 based Streaming Analytics solution, we need coordination
functionality there as well. Is the functionality mentioned here created as
a common component or baked in to the MB code? .. if so, can we please get
it implemented it as a generic component, so other products can also use
it.

Cheers,
Anjana.

On Tue, Aug 9, 2016 at 3:10 PM, Anjana Fernando <anj...@wso2.com> wrote:

> Great! ..
>
> Cheers,
> Anjana.
>
> On Tue, Aug 9, 2016 at 1:49 PM, Asanka Abeyweera <asank...@wso2.com>
> wrote:
>
>> Hi Anjana,
>>
>> Thank you for the suggestion. We have already done a similar thing. We
>> have added a backoff time after creating the leader entry and check if the
>> leader entry is the entry created by self before informing the leader
>> change.
>>
>> On Tue, Aug 9, 2016 at 12:27 PM, Anjana Fernando <anj...@wso2.com> wrote:
>>
>>> I see, thanks for the clarification, looks good! .. I think small thing
>>> to consider is, to avoid the situation where, the current leader goes away,
>>> and two other competes to become the leader, and the first one and the
>>> second one checks (reads) the table to check the last heartbeat and figures
>>> out that the leader is outdated at the same time, and then first one delete
>>> the entry and puts his one, and after that, second one will also delete the
>>> existing one and put his one, so both will think they became the leader,
>>> due to the condition that both succeeded in adding the entry without an
>>> error. So this can probably be fixed by checking back after a bit of time
>>> if the current node is actually me, which probabilistically will work well,
>>> if that time period is sufficient big enough than a typical database
>>> transaction required by a node to do the earlier operations. Or else, we
>>> should make sure the database transaction level used in this scenario is at
>>> least REPEATABLE_READ, where when we read the record, it will lock it
>>> throughout the transaction. So some DBMSs does not support REPEATABLE_READ,
>>> where in that case, we should be able to use SERIALIZABLE, which most of
>>> them support.
>>>
>>> Cheers,
>>> Anjana.
>>>
>>> On Tue, Aug 9, 2016 at 11:11 AM, Maninda Edirisooriya <mani...@wso2.com>
>>> wrote:
>>>
>>>> Hi Anjana,
>>>>
>>>> After having an offline chat with Asanka what I understood was that the
>>>> leader election was done completely via the database but with no network
>>>> communication. The leader is mentioned in the database first. Then the
>>>> leader updates the node data periodically in the database. If some node
>>>> realizes the data in the DB are outdated that means the leader was
>>>> disconnected. Then that node will look at the created timestamp of the
>>>> leader entry. If that is not very recent that means there was no new leader
>>>> elected recently. So he will try to update the leader entry with his ID. As
>>>> I understand there the leader entry is using the leader ID and the
>>>> timestamp as the primary key. Even several nodes try to do it
>>>> simultaneously only one node will successfully be able to update the entry
>>>> with the help of atomicity provided by the DB. Others members will note the
>>>> timestamp of the leader was updated so will accept the first one who
>>>> updates as the leader. Even after the leader is elected, the leader will
>>>> only notify node data via updating DB instead of network calls. Other nodes
>>>> will just observe it and check the latest timestmps of the entry.
>>>>
>>>>
>>>> *Maninda Edirisooriya*
>>>> Senior Software Engineer
>>>>
>>>> *WSO2, Inc.*lean.enterprise.middleware.
>>>>
>>>> *Blog* : http://maninda.blogspot.com/
>>>> *E-mail* : mani...@wso2.com
>>>> *Skype* : @manindae
>>>> *Twitter* : @maninda
>>>>
>>>> On Tue, Aug 9, 2016 at 10:13 AM, Anjana Fernando <anj...@wso2.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I just noticed this thread. I've some concerns on this
>>>>> implementations. First of all, I don't think the statement mentioned here
>>>>> saying an external service such as ZooKeeper doesn't work, is correct.
>>>>> Because, if you have a ZK cluster (it is suppose to be used as a cluster),
>>>>> you will not have a

Re: [Architecture] Multidimensional Space Search with Lucene 6 - Possible Scenarios and the API

2016-09-13 Thread Anjana Fernando
e specified
>as polygons.
>- Get the number of points in each bucket where buckets are specified
>by the distance from a given location.
>
> * Composite polygons are possible.
> *Scenarios*
>
> *Airport Scenario *
> If we index the set of airports in the world as GeoPoints. Following
> queries are possible examples. (Here is the test code I implemented as an
> example.)
> <https://github.com/janakact/test_lucene/blob/master/src/test/java/TestMultiDimensionalQueries.java>
>
>- Find closest set of airports to a given town.
>- Find the set of airports within a given radius from a particular
>town.
>- Find the set of airports inside a country. (Country can be given as
>a polygon)
>- Find the set of airports within a given range of Latitudes and
>Longitudes. It is a Latitude, Longitude box query. (For a examples:
>Airports closer to the equatorial)
>- Find the set of airports closer to a given path. (Path can be
>something like a road. Find the airports which are less than 50km away from
>a given highway)
>- Count the airports in each country by giving country maps as
>polygons.
>
> *Indexing airplane paths*
>
>- It is possible to query for paths which goes through an interesting
>area.
>
> Above example covers most of the functionalities that Lucene Space search
> provides.
> Here are some other examples,
>
>- Number of television users a satellite can cover.(by indexing
>receivers' locations)
>- To find the number of stationary telescopes that can be used to
>observe a solar eclipse. (by indexing telescope locations. Area the solar
>eclipse is visible, can be represented as a polygon
>http://eclipse.gsfc.nasa.gov/SEplot/SEplot2001/SE2016Sep01A.GIF
><http://eclipse.gsfc.nasa.gov/SEplot/SEplot2001/SE2016Sep01A.GIF>)
>
> So, that's it.
> Thank you.
>
> Regards,
> Janaka Chathuranga
>
> --
> Janaka Chathuranga
> *Software Engineering Intern*
> Mobile : *+94 (**071) 3315 725*
> jana...@wso2.com
>
> <https://wso2.com/signature>
>
>


-- 
*Anjana Fernando*
Associate Director / Architect
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] DB event listener for ESB

2016-09-05 Thread Anjana Fernando
t;>>> LinkedIn : http://www.linkedin.com/pub/malaka-silva/6/33/77
>>>> Blog : http://mrmalakasilva.blogspot.com/
>>>>
>>>> WSO2, Inc.
>>>> lean . enterprise . middleware
>>>> https://wso2.com/signature
>>>> http://www.wso2.com/about/team/malaka-silva/
>>>> <http://wso2.com/about/team/malaka-silva/>
>>>> https://store.wso2.com/store/
>>>>
>>>> Don't make Trees rare, we should keep them with care
>>>>
>>>> ___
>>>> Architecture mailing list
>>>> Architecture@wso2.org
>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>
>>>>
>>>
>>>
>>> --
>>> 
>>> Srinath Perera, Ph.D.
>>>http://people.apache.org/~hemapani/
>>>http://srinathsview.blogspot.com/
>>>
>>> ___
>>> Architecture mailing list
>>> Architecture@wso2.org
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>> ___
>> Architecture mailing list
>> Architecture@wso2.org
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
> 
> Srinath Perera, Ph.D.
>http://people.apache.org/~hemapani/
>http://srinathsview.blogspot.com/
>
> ___
> Architecture mailing list
> Architecture@wso2.org
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
*Anjana Fernando*
Associate Director / Architect
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] How do we get DAS server location?

2016-07-06 Thread Anjana Fernando
chat earlier, the initial plan is to locate
>>>> the Thrift endpoint  through mDNS service discovery, considering the host
>>>> and port first.
>>>>
>>>> I have used the JmDNS library pointed by Nirmal to do a PoC on this
>>>> scenario, and I've also already incorporated the logic into the databridge
>>>> Thrift server to enable service registration through a system property the
>>>> users could set (-DenableDiscovery). The corresponding client code goes
>>>> into the publisher OSGi service initialisation. This too is controllable by
>>>> the same system property the user could set on the Thrift client (which
>>>> will be the product talking to DAS/CEP).
>>>>
>>>> I'm doing some testing on the entire scenario, and once completed, I'll
>>>> commit the changes into the relevant repos and send an update to this
>>>> thread.
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> On Thursday, 30 June 2016, Srinath Perera <srin...@wso2.com> wrote:
>>>>
>>>>> Resending as it hits a filter rule.
>>>>>
>>>>> Gokul, please give an update on this?
>>>>>
>>>>> --Srinath
>>>>>
>>>>
>>>>
>>>> --
>>>> Gokul Balakrishnan
>>>> Senior Software Engineer,
>>>> WSO2, Inc. http://wso2.com
>>>> M +94 77 5935 789 | +44 7563 570502
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> 
>>> Srinath Perera, Ph.D.
>>>http://people.apache.org/~hemapani/
>>>http://srinathsview.blogspot.com/
>>>
>>
>>
>>
>> --
>> Gokul Balakrishnan
>> Senior Software Engineer,
>> WSO2, Inc. http://wso2.com
>> M +94 77 5935 789 | +44 7563 570502
>>
>>
>
>
> --
> Gokul Balakrishnan
> Senior Software Engineer,
> WSO2, Inc. http://wso2.com
> M +94 77 5935 789 | +44 7563 570502
>
>


-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] In-Tenant Data restriction in the DAS platform

2016-06-30 Thread Anjana Fernando
Anyways, yet again, as per the other discussions we had also, we cannot
promise on these features, we would have to carefully check on the
feasibility and make the decisions.

Cheers,
Anjana.

On Thu, Jun 30, 2016 at 2:24 PM, Anjana Fernando <anj...@wso2.com> wrote:

> Hi Dulitha,
>
> Your points are valid, we will check on these for an upcoming release,
> most probably DAS v3.2.0, we just have to carefully check for all the
> scenarios on how this will work out, there can be some scenarios that can
> be tricky, but we should be able to figure them out.
>
> Cheers,
> Anjana.
>
> On Thu, Jun 30, 2016 at 12:40 PM, Sinthuja Ragendran <sinth...@wso2.com>
> wrote:
>
>> Hi Dulitha,
>>
>> On Wed, Jun 29, 2016 at 10:24 PM, Dulitha Wijewantha <duli...@wso2.com>
>> wrote:
>>
>>> Hi guys,
>>> Below are somethings I noted when I was writing dashboards for an
>>> analytics solution.
>>>
>>> 1) oAuth protected APIs should be used to retrieve data for gadgets
>>>
>>> 2) There should be a way to restrict data for users inside a tenant
>>>
>>
>> +1 for above two. And I too think we should bring more fine grained
>> authorization model for DAS layer, at least in the table/stream level such
>> that only role-A should be able to access it not all. And again there could
>> be different level of access per stream/table, some users can only fetch
>> the data, some can only send, and only some can delete it.
>>
>> We had similar requirement on dashboard server to protect a dashboard,
>> and then we came up with a model to create some internal roles per
>> dashboard during the dashboard creation time, and assign the user who is
>> creating the dashboard for those internal role by default. Hence only
>> he/she can perform any actions on the dashboard and it's private for
>> him/her. If the user would like to share the dashboard, then he/she assign
>> users independently for the internal roles created or assign a new role for
>> the particular action.
>>
>> I think similarly we can handle for the tables as well.
>>
>>>
>>> 3) If the user doesn't have authorization to view the data - he
>>> shouldn't be able to view the corresponding visualization on the dashboard
>>> server and vice versa.
>>>
>>
>> This is bit tricky, as the authorization from dashboard page is something
>> only required if there are any analytics related gadgets have been included
>> in the dashboard page, and for others this will not be an issue. We need to
>> properly handle this case if we include such feature.
>>
>> Thanks,
>> Sinthuja.
>>
>>
>>>
>>> Cheers~
>>> --
>>> Dulitha Wijewantha (Chan)
>>> Software Engineer - Mobile Development
>>> WSO2 Inc
>>> Lean.Enterprise.Middleware
>>>  * ~Email   duli...@wso2.com <duli...@wso2mobile.com>*
>>> *  ~Mobile +94712112165 <%2B94712112165>*
>>> *  ~Website   dulitha.me <http://dulitha.me>*
>>> *  ~Twitter @dulitharw <https://twitter.com/dulitharw>*
>>>   *~Github @dulichan <https://github.com/dulichan>*
>>>   *~SO @chan <http://stackoverflow.com/users/813471/chan>*
>>>
>>
>>
>>
>> --
>> *Sinthuja Rajendran*
>> Technical Lead
>> WSO2, Inc.:http://wso2.com
>>
>> Blog: http://sinthu-rajan.blogspot.com/
>> Mobile: +94774273955
>>
>>
>>
>
>
> --
> *Anjana Fernando*
> Senior Technical Lead
> WSO2 Inc. | http://wso2.com
> lean . enterprise . middleware
>



-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] In-Tenant Data restriction in the DAS platform

2016-06-30 Thread Anjana Fernando
Hi Dulitha,

Your points are valid, we will check on these for an upcoming release, most
probably DAS v3.2.0, we just have to carefully check for all the scenarios
on how this will work out, there can be some scenarios that can be tricky,
but we should be able to figure them out.

Cheers,
Anjana.

On Thu, Jun 30, 2016 at 12:40 PM, Sinthuja Ragendran <sinth...@wso2.com>
wrote:

> Hi Dulitha,
>
> On Wed, Jun 29, 2016 at 10:24 PM, Dulitha Wijewantha <duli...@wso2.com>
> wrote:
>
>> Hi guys,
>> Below are somethings I noted when I was writing dashboards for an
>> analytics solution.
>>
>> 1) oAuth protected APIs should be used to retrieve data for gadgets
>>
>> 2) There should be a way to restrict data for users inside a tenant
>>
>
> +1 for above two. And I too think we should bring more fine grained
> authorization model for DAS layer, at least in the table/stream level such
> that only role-A should be able to access it not all. And again there could
> be different level of access per stream/table, some users can only fetch
> the data, some can only send, and only some can delete it.
>
> We had similar requirement on dashboard server to protect a dashboard, and
> then we came up with a model to create some internal roles per dashboard
> during the dashboard creation time, and assign the user who is creating the
> dashboard for those internal role by default. Hence only he/she can perform
> any actions on the dashboard and it's private for him/her. If the user
> would like to share the dashboard, then he/she assign users independently
> for the internal roles created or assign a new role for the particular
> action.
>
> I think similarly we can handle for the tables as well.
>
>>
>> 3) If the user doesn't have authorization to view the data - he shouldn't
>> be able to view the corresponding visualization on the dashboard server and
>> vice versa.
>>
>
> This is bit tricky, as the authorization from dashboard page is something
> only required if there are any analytics related gadgets have been included
> in the dashboard page, and for others this will not be an issue. We need to
> properly handle this case if we include such feature.
>
> Thanks,
> Sinthuja.
>
>
>>
>> Cheers~
>> --
>> Dulitha Wijewantha (Chan)
>> Software Engineer - Mobile Development
>> WSO2 Inc
>> Lean.Enterprise.Middleware
>>  * ~Email   duli...@wso2.com <duli...@wso2mobile.com>*
>> *  ~Mobile +94712112165 <%2B94712112165>*
>> *  ~Website   dulitha.me <http://dulitha.me>*
>> *  ~Twitter @dulitharw <https://twitter.com/dulitharw>*
>>   *~Github @dulichan <https://github.com/dulichan>*
>>   *~SO @chan <http://stackoverflow.com/users/813471/chan>*
>>
>
>
>
> --
> *Sinthuja Rajendran*
> Technical Lead
> WSO2, Inc.:http://wso2.com
>
> Blog: http://sinthu-rajan.blogspot.com/
> Mobile: +94774273955
>
>
>


-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] [Analytics] Allowing analytics data to be published to super tenant space

2016-06-23 Thread Anjana Fernando
Hi Amila,

On Thu, Jun 23, 2016 at 11:52 AM, Amila Maha Arachchi <ami...@wso2.com>
wrote:

> All,
>
> 1. We should allow to decide whether to publish data in super tenant mode
> or tenant mode
>

This is possible, but the problem is, it complicates the ESB analytics
solution, where we will have to maintain two different versions which would
implement the two scenarios. So IMO, it would be better to follow a single
approach which would be overall best flexibility, which we discussed
earlier, where we publish and manage data in tenants, but execute the Spark
scripts in the super tenant.


> 2. If its the ST mode, we deploy the car in ST space.
> 3. Data gets published and stored in one table. i.e. no table per tenant
>

The current connectors, e.g. RDBMS / HBase etc.. use the mechanism of
creating a table per analytics table/tenant. In those connectors, this
behavior cannot be changed, where mainly there are technical difficulties
in doing so also, when filtering out different tenant's data and all.
Anyways, usually in database systems, there is no limit in the number of
physical tables created and all. And also, you will not access these tables
directly, but will communicate via the REST APIs if required.


> 4. Spark runs against that table
>

With the new improvements, we anyway get similar type of an interface where
the Spark script will automatically read data from all the tenants and
process the data in one go.


> 5. Dashboard should be a SaaS app which filters data from the analyzed
> table.
>

I guess that maybe possible, but that will need some changes in the current
dashboards. @Dunith, can you comment on this please. Also, is there any
issue in deploying a dashboard per tenant?, which is the current situation.

Cheers,
Anjana.


>
> Can above be facilitated?
>
> Regards,
> Amila.
>
> --
> *Amila Maharachchi*
> Senior Technical Lead
> WSO2, Inc.; http://wso2.com
>
> Blog: http://maharachchi.blogspot.com
> Mobile: +94719371446
>
>


-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Cross Tenant Data Reading from Spark Queries in DAS

2016-06-22 Thread Anjana Fernando
Hi,

I've improved the functionality here, in order to write to tenant tables
also using the super tenant space. With this, I've changed the earlier
flag's name to "globalTenantAccess". So with this, if you define a table
with this flag enabled, when you write records to it, by looking at the
incoming record's "_tenantId" field value, it will route the record to it's
respective tenant's tables. Now with both read and write functionality, we
can seamlessly read the data from each tenant and write the results to that
tenant itself, using a script residing in the super tenant.

An example on how this work is shown below:-

* Create several tenant's, create a stream called "S1" in each, and add
some records to each

* Execute the following the super tenant:-

create temporary table S1 using CarbonAnalytics options (tableName "S1",
schema "name STRING, count INTEGER, _tenantId INTEGER", globalTenantAccess
"true");

Reading from this "S1" table, e.g. "select * from S1" will show all the
records from all the tenants.

Now, create another table "S2" in super tenant space with
globalTenantAccess flag:-

create temporary table S2 using CarbonAnalytics options (tableName "S2",
schema "name STRING, count INTEGER, _tenantId INTEGER", globalTenantAccess
"true");

Now we run the command "insert into table S2 select * from S1".

The above command will make the system create table S2 in all the tenants
that are available (basically tenants mentioned in the data from S1's
_tenantId field), and write the data to it. At the end, each tenant will
have a two tables, "S1" and "S2" with identical data. This basically
explains how the full data set is collected together, and how the same
data, tenant wise can be written back.

Cheers,
Anjana.

On Mon, Apr 18, 2016 at 4:55 AM, Anjana Fernando <anj...@wso2.com> wrote:

> Hi Chan,
>
> On Mon, Apr 18, 2016 at 4:47 AM, Dulitha Wijewantha <duli...@wso2.com>
> wrote:
>
>>
>> There is a new analytics provider property introduced, which is
>>> "globalTenantRead", where when this is set to "true", it will go through
>>> all the tenants in aggregating records of a table named "T1" in that
>>> tenant. Also a new special table schema attribute "_tenantId" is
>>> introduced, which is an automatically populated value for a record based on
>>> the actual origin tenant of the record. So this "_tenantId" field can be
>>> used for further filtering/grouping in the Spark queries.
>>>
>>
>> ​So the tenant id of the event is persisted on the record store ​when
>> events are recieved to DAS. Does this happen through the authorization? In
>> case of thrift the login username has to be prefixed with the tenant
>> domain?
>>
>
> So DAS anyway had proper tenant isolation already. And yes, it is handled
> with the authorization (so yeah, the username has to have the domain also
> for tenants). Where, when we are storing events, a tenant has its own space
> in our data layer, and now it is just retrieving data, we just specially
> expose whose tenant the record belongs to in the result, since the results
> can have data from multiple tenants.
>
>
>>
>>
>> ​Does this impact indexes that have been setup?​
>>
>
> No it does not, it does not affect the existing indexes nor the raw data
> stored.
>
> Cheers,
> Anjana.
>
>
>>
>>
>>>
>>> [1] https://docs.wso2.com/pages/viewpage.action?pageId=50505847
>>> [2] https://docs.wso2.com/pages/viewpage.action?pageId=50505762
>>> [3]
>>> https://docs.wso2.com/display/DAS310/Spark+Query+Language#SparkQueryLanguage-WSO2DASSQLguide
>>>
>>> Cheers,
>>> Anjana.
>>> --
>>> *Anjana Fernando*
>>> Senior Technical Lead
>>> WSO2 Inc. | http://wso2.com
>>> lean . enterprise . middleware
>>>
>>> ___
>>> Architecture mailing list
>>> Architecture@wso2.org
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>>
>> --
>> Dulitha Wijewantha (Chan)
>> Software Engineer - Mobile Development
>> WSO2 Inc
>> Lean.Enterprise.Middleware
>>  * ~Email   duli...@wso2.com <duli...@wso2mobile.com>*
>> *  ~Mobile +94712112165 <%2B94712112165>*
>> *  ~Website   dulitha.me <http://dulitha.me>*
>> *  ~Twitter @dulitharw <https://twitter.com/dulitharw>*
>>   *~Github @dulichan <https://github.com/dulichan>*
>>   *~SO @chan <http://stackoverflow.com/users/813471/chan>*
>>
>> ___
>> Architecture mailing list
>> Architecture@wso2.org
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
> *Anjana Fernando*
> Senior Technical Lead
> WSO2 Inc. | http://wso2.com
> lean . enterprise . middleware
>



-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Caching Support for Analytics Event Tables

2016-06-20 Thread Anjana Fernando
Hi,

Not sure, if it'll be easy to merge the code like that, specially
considering the two implementations are very different from the code level,
as in, the points used to cache the data would be a bit different. In
future, a better option would be to cache the data from CEP itself, rather
than from individual event table implementation. Anyways, let's check the
two implementations and see, at least from the configuration level.

Cheers,
Anjana.

On Mon, Jun 20, 2016 at 8:36 PM, Mohanadarshan Vivekanandalingam <
mo...@wso2.com> wrote:

>
>
> On Mon, Jun 20, 2016 at 8:17 PM, Sriskandarajah Suhothayan <s...@wso2.com>
> wrote:
>
>> Is this in line with the RDBMS implementation? Else it will be confusing
>> to the users.
>> Shall we have a chat and merge the caching code?
>>
>
> Yes, let's have a chat..
>
>>
>> @Mohan can you work with Anjana
>>
>
> sure...
>
>
>>
>> Regards
>> Suho
>>
>> On Mon, Jun 20, 2016 at 12:49 PM, Anjana Fernando <anj...@wso2.com>
>> wrote:
>>
>>> Hi,
>>>
>>> With a chat we had with Srinath, we've decided to set the default cache
>>> timeout to 10 seconds, so from this moment, it is set to 10 seconds by
>>> default in the code.
>>>
>>> Cheers,
>>> Anjana.
>>>
>>> On Wed, Jun 15, 2016 at 1:57 PM, Nirmal Fernando <nir...@wso2.com>
>>> wrote:
>>>
>>>> Great! Thanks Anjana!
>>>>
>>>> On Wed, Jun 15, 2016 at 11:26 AM, Anjana Fernando <anj...@wso2.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We've added the $subject. Basically, a local cache is now maintained
>>>>> in each event table, where it will store the most recently used data items
>>>>> in the cache, up to a certain given cache size, for a maximum given
>>>>> lifetime. The format is as follows:-
>>>>>
>>>>>  @from(eventtable = 'analytics.table' , table.name = 'name', *caching*
>>>>> = 'true', *cache.timeout.seconds* = '10', *cache.size.bytes* =
>>>>> '10')
>>>>>
>>>>> The cache.timeout.seconds and cache.size.bytes values are optional,
>>>>> with default values of 60 (1 minute) and 1024 * 1024 * 10 (10 MB)
>>>>> respectively.
>>>>>
>>>>> Also, there are some debug logs available in the component, if you
>>>>> want to check for explicit cache hit/miss situations and record lookup
>>>>> timing, basically enable debug logs for the class
>>>>> "org.wso2.carbon.analytics.eventtable.AnalyticsEventTable".
>>>>>
>>>>> So basically, if you use analytics event tables in performance
>>>>> sensitive areas in your CEP execution plans, do consider using caching if
>>>>> it is possible to do so.
>>>>>
>>>>> The unit tests are updated with caching, and the updated docs can be
>>>>> found here [1].
>>>>>
>>>>> [1]
>>>>> https://docs.wso2.com/display/DAS310/Understanding+Event+Streams+and+Event+Tables#UnderstandingEventStreamsandEventTables-AnalyticseventtableAnalyticseventtable
>>>>>
>>>>> Cheers,
>>>>> Anjana.
>>>>> --
>>>>> *Anjana Fernando*
>>>>> Senior Technical Lead
>>>>> WSO2 Inc. | http://wso2.com
>>>>> lean . enterprise . middleware
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Thanks & regards,
>>>> Nirmal
>>>>
>>>> Team Lead - WSO2 Machine Learner
>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>> Mobile: +94715779733
>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> *Anjana Fernando*
>>> Senior Technical Lead
>>> WSO2 Inc. | http://wso2.com
>>> lean . enterprise . middleware
>>>
>>
>>
>>
>> --
>>
>> *S. Suhothayan*
>> Technical Lead & Team Lead of WSO2 Complex Event Processor
>> *WSO2 Inc. *http://wso2.com
>> * <http://wso2.com/>*
>> lean . enterprise . middleware
>>
>>
>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog:
>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/>twitter:
>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | linked-in:
>> http://lk.linkedin.com/in/suhothayan <http://lk.linkedin.com/in/suhothayan>*
>>
>
>
>
> --
> *V. Mohanadarshan*
> *Associate Tech Lead,*
> *Data Technologies Team,*
> *WSO2, Inc. http://wso2.com <http://wso2.com> *
> *lean.enterprise.middleware.*
>
> email: mo...@wso2.com
> phone:(+94) 771117673
>



-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Caching Support for Analytics Event Tables

2016-06-20 Thread Anjana Fernando
Hi,

With a chat we had with Srinath, we've decided to set the default cache
timeout to 10 seconds, so from this moment, it is set to 10 seconds by
default in the code.

Cheers,
Anjana.

On Wed, Jun 15, 2016 at 1:57 PM, Nirmal Fernando <nir...@wso2.com> wrote:

> Great! Thanks Anjana!
>
> On Wed, Jun 15, 2016 at 11:26 AM, Anjana Fernando <anj...@wso2.com> wrote:
>
>> Hi,
>>
>> We've added the $subject. Basically, a local cache is now maintained in
>> each event table, where it will store the most recently used data items in
>> the cache, up to a certain given cache size, for a maximum given lifetime.
>> The format is as follows:-
>>
>>  @from(eventtable = 'analytics.table' , table.name = 'name', *caching* =
>> 'true', *cache.timeout.seconds* = '10', *cache.size.bytes* = '10')
>>
>> The cache.timeout.seconds and cache.size.bytes values are optional, with
>> default values of 60 (1 minute) and 1024 * 1024 * 10 (10 MB) respectively.
>>
>> Also, there are some debug logs available in the component, if you want
>> to check for explicit cache hit/miss situations and record lookup timing,
>> basically enable debug logs for the class
>> "org.wso2.carbon.analytics.eventtable.AnalyticsEventTable".
>>
>> So basically, if you use analytics event tables in performance sensitive
>> areas in your CEP execution plans, do consider using caching if it is
>> possible to do so.
>>
>> The unit tests are updated with caching, and the updated docs can be
>> found here [1].
>>
>> [1]
>> https://docs.wso2.com/display/DAS310/Understanding+Event+Streams+and+Event+Tables#UnderstandingEventStreamsandEventTables-AnalyticseventtableAnalyticseventtable
>>
>> Cheers,
>> Anjana.
>> --
>> *Anjana Fernando*
>> Senior Technical Lead
>> WSO2 Inc. | http://wso2.com
>> lean . enterprise . middleware
>>
>
>
>
> --
>
> Thanks & regards,
> Nirmal
>
> Team Lead - WSO2 Machine Learner
> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
> Mobile: +94715779733
> Blog: http://nirmalfdo.blogspot.com/
>
>
>


-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


[Architecture] Caching Support for Analytics Event Tables

2016-06-14 Thread Anjana Fernando
Hi,

We've added the $subject. Basically, a local cache is now maintained in
each event table, where it will store the most recently used data items in
the cache, up to a certain given cache size, for a maximum given lifetime.
The format is as follows:-

 @from(eventtable = 'analytics.table' , table.name = 'name', *caching* =
'true', *cache.timeout.seconds* = '10', *cache.size.bytes* = '10')

The cache.timeout.seconds and cache.size.bytes values are optional, with
default values of 60 (1 minute) and 1024 * 1024 * 10 (10 MB) respectively.

Also, there are some debug logs available in the component, if you want to
check for explicit cache hit/miss situations and record lookup timing,
basically enable debug logs for the class
"org.wso2.carbon.analytics.eventtable.AnalyticsEventTable".

So basically, if you use analytics event tables in performance sensitive
areas in your CEP execution plans, do consider using caching if it is
possible to do so.

The unit tests are updated with caching, and the updated docs can be found
here [1].

[1]
https://docs.wso2.com/display/DAS310/Understanding+Event+Streams+and+Event+Tables#UnderstandingEventStreamsandEventTables-AnalyticseventtableAnalyticseventtable

Cheers,
Anjana.
-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] What should be the default MySQL engine to be used in DAS?

2016-05-31 Thread Anjana Fernando
>>>>>>>> incoming records and processed records, i.e., EVENT_STORE and
>>>>>>>>>>> PROCESSED_DATA_STORE.
>>>>>>>>>>>
>>>>>>>>>>> For ESB Analytics, we can configure to use MyISAM for
>>>>>>>>>>> EVENT_STORE and InnoDB for PROCESSED_DATA_STORE. It is because in 
>>>>>>>>>>> ESB
>>>>>>>>>>> analytics, summarizing up to minute level is done by real time 
>>>>>>>>>>> analytics
>>>>>>>>>>> and Spark queries will read and process data using minutely (and 
>>>>>>>>>>> higher)
>>>>>>>>>>> tables which we can keep in PROCESSED_DATA_STORE. Since raw 
>>>>>>>>>>> table(which
>>>>>>>>>>> data receiver writes data) is not being used by Spark queries, the 
>>>>>>>>>>> receiver
>>>>>>>>>>> performance will not be affected.
>>>>>>>>>>>
>>>>>>>>>>> However, in most cases, Spark queries may written to read data
>>>>>>>>>>> directly from raw tables. As mentioned above, with MyISAM this 
>>>>>>>>>>> could lead
>>>>>>>>>>> to performance issues if data publishing and spark analytics 
>>>>>>>>>>> happens in
>>>>>>>>>>> parallel. So considering that I think we should change the default
>>>>>>>>>>> configuration to use InnoDB. WDYT?
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Thanks & Regards,
>>>>>>>>>>>
>>>>>>>>>>> Inosh Goonewardena
>>>>>>>>>>> Associate Technical Lead- WSO2 Inc.
>>>>>>>>>>> Mobile: +94779966317
>>>>>>>>>>>
>>>>>>>>>>> ___
>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ___
>>>>>>>>>> Architecture mailing list
>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> W.G. Gihan Anuruddha
>>>>>>>>> Senior Software Engineer | WSO2, Inc.
>>>>>>>>> M: +94772272595
>>>>>>>>>
>>>>>>>>> ___
>>>>>>>>> Architecture mailing list
>>>>>>>>> Architecture@wso2.org
>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Thanks & Regards,
>>>>>>>>
>>>>>>>> Inosh Goonewardena
>>>>>>>> Associate Technical Lead- WSO2 Inc.
>>>>>>>> Mobile: +94779966317
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Thanks & regards,
>>>>>>> Nirmal
>>>>>>>
>>>>>>> Team Lead - WSO2 Machine Learner
>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>> Mobile: +94715779733
>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks & Regards,
>>>>>>
>>>>>> Inosh Goonewardena
>>>>>> Associate Technical Lead- WSO2 Inc.
>>>>>> Mobile: +94779966317
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Thanks & regards,
>>>>> Nirmal
>>>>>
>>>>> Team Lead - WSO2 Machine Learner
>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>> Mobile: +94715779733
>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks & Regards,
>>>>
>>>> Inosh Goonewardena
>>>> Associate Technical Lead- WSO2 Inc.
>>>> Mobile: +94779966317
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Thanks & regards,
>>> Nirmal
>>>
>>> Team Lead - WSO2 Machine Learner
>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>> Mobile: +94715779733
>>> Blog: http://nirmalfdo.blogspot.com/
>>>
>>>
>>>
>>> ___
>>> Architecture mailing list
>>> Architecture@wso2.org
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>>
>> --
>> Gokul Balakrishnan
>> Senior Software Engineer,
>> WSO2, Inc. http://wso2.com
>> M +94 77 5935 789 | +44 7563 570502
>>
>>
>> ___
>> Architecture mailing list
>> Architecture@wso2.org
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
>
> Thanks & regards,
> Nirmal
>
> Team Lead - WSO2 Machine Learner
> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
> Mobile: +94715779733
> Blog: http://nirmalfdo.blogspot.com/
>
>
>
> ___
> Architecture mailing list
> Architecture@wso2.org
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] How do we get DAS server location?

2016-05-29 Thread Anjana Fernando
Hi Srinath,

Yeah, we were doing some work on this, first Malith, and then Gokul. But
due to other priorities we had with the work, we couldn't work on it much.
And we were faced with some issues in how the configuration was done with
it. Where we were thinking of doing some further discussions with it, on
how practical it would be. For example, if we auto discover servers, we
would only get the server locations, and obviously not the user credentials
to talk with those server. We can only maybe provide default admin
credentials we put to the server out of the box, and make the user edit a
configuration file, which will make the concept of making the setup easier
diminish a bit.

Anyways, @Gokul, can you please give an update onto which extent we did
this work.

Cheers,
Anjana.

On Mon, May 30, 2016 at 10:48 AM, Srinath Perera <srin...@wso2.com> wrote:

> Anjana, Have we done this?
>
> I think Gokul started working on this.
>
> --Srinath
>
> On Sat, Feb 20, 2016 at 6:03 PM, Nirmal Fernando <nir...@wso2.com> wrote:
>
>> There's a library called JmDNS http://jmdns.sourceforge.net/index.html
>> which would probably help us here.
>>
>> JmDNS is a Java implementation of multi-cast DNS and can be used for
>> service registration and discovery in local area networks. JmDNS is fully
>> compatible with Apple's Bonjour
>> <http://developer.apple.com/macosx/rendezvous/>.
>>
>> The Zeroconf <http://www.zeroconf.org/> working group is working towards
>> zero configuration IP networking. Multi-cast DNS
>> <http://www.multicastdns.org/> and DNS service discovery
>> <http://www.dns-sd.org/> provide a convient ways for devices and
>> services to register themselves, and to discover other network-based
>> services without relying on centrally administered services.
>>
>> Java as a language is not appropriate for low-level network
>> configuration, but it is very useful for service registration and
>> discovery. JmDNS provides easy-to-use pure-Java mDNS implementation that
>> runs on most JDK1.6 compatible VMs.
>>
>> The code is released under the Apache 2.0 license so that it can be
>> freely incorporated into other products and services.
>>
>> On Sat, Feb 20, 2016 at 10:11 AM, Sanjiva Weerawarana <sanj...@wso2.com>
>> wrote:
>>
>>> No no not using Hazelcast
>>>
>>> On Sat, Feb 20, 2016 at 10:07 AM, Srinath Perera <srin...@wso2.com>
>>> wrote:
>>>
>>>> Hi Sanjiva,
>>>>
>>>> I think we did though Hazelcast. AFAIK, we have not done it for DAS
>>>> discovery yet.
>>>>
>>>> If we use Hazelcast, it is trivial to do. But that will add Hazelcast
>>>> to all our products. ( or Maybe we can find and borrow that part of the
>>>> code).
>>>>
>>>> --Srinath
>>>>
>>>> On Sat, Feb 20, 2016 at 10:00 AM, Sanjiva Weerawarana <sanj...@wso2.com
>>>> > wrote:
>>>>
>>>>> Guys we also need the servers to discover each other when on the same
>>>>> machine or LAN. Have we done that yet? That's very easy to do [1] and IIRC
>>>>> we used it before for something.
>>>>>
>>>>> [1] https://en.wikipedia.org/wiki/Zero-configuration_networking
>>>>>
>>>>> On Fri, Feb 19, 2016 at 7:05 PM, Malith Dhanushka <mal...@wso2.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Feb 19, 2016 at 5:00 PM, Anjana Fernando <anj...@wso2.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Fri, Feb 19, 2016 at 4:54 PM, Srinath Perera <srin...@wso2.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Kasun, Nuwan
>>>>>>>>
>>>>>>>> All product needs to get DAS server location from one place.
>>>>>>>>
>>>>>>>>
>>>>>>>>1. Do we have a place for that? Otherwise, we need something
>>>>>>>>like conf/das-client.xml and create a component to read it and use 
>>>>>>>> it with
>>>>>>>>API and ESB when they want to send events to DAS ( Anjana, can ESB
>>>>>>>>analytics guys do it?)
>>>>>>>>
>>>>>>>> Yeah, we can check on that. As I remember, there were some
>>>>>>> discussions

Re: [Architecture] What should be the default MySQL engine to be used in DAS?

2016-05-26 Thread Anjana Fernando
Hi,

So actually, we need to solve the case of, even though by default, we can
use the "write_read_optimized" mode of the record store (which will
automatically switch the queries used to create the database tables from
the templates), but for some cases, the default event store we use, we need
it to run in "write_optimized" mode, (where in MySQL, it uses MyISAM), for
example, in ESB analytics case, for the raw event storing, we can use this,
since there aren't many continuous reads done on it, like running a Spark
job on it (it's done by CEP now). So if someone installs ESB analytics
features to a base DAS distribution, as of now, it will be using the
"EVENT_STORE" record store, which is by default set to
"write_read_optimized" mode.

So what I suggest is, creating two record stores to represent the current
single "EVENT_STORE" one, where we can say like, "EVENT_STORE_WO" and
"EVENT_STORE_WRO", which would represent "write_optimized" and
"write_read_optimized" backed configurations, ("PROCESSED_STORE" will
anyway be "write_read_optimized"). So in a MySQL setup, this would actually
come into affect, when creating database tables, in a setup like HBase, the
data source would possibly be pointing to a single database server, and
same type of tables will be created. So basically what we achieve at the
end is, we can write all our analytics scenarios in a portable way, without
worrying about the behavior of the data storing/retrieval, as long as, we
use the default record store names, which comes with a typical DAS, and
only data source level changes would be done when needed.

P.S. Also can we rename "write_read_optimized" in the configurations to
"read_write_optimized", where the second one is more natural.

Cheers,
Anjana.

On Wed, May 25, 2016 at 8:10 PM, Inosh Goonewardena <in...@wso2.com> wrote:

> Hi,
>
> At the moment DAS support both MyISAM and InnoDB, but configured to use
> MyISAM by default.
>
> There are several differences between MYISAM and InnoDB, but what is most
> relevant with regard to DAS is the difference in concurrency. Basically,
> MyISAM uses table-level locking and InnoDB uses row-level locking. So, with
> MyISAM, if we are running Spark queries while publishing data to DAS, in
> higher TPS it can lead to issues due to the inability of obtaining the
> table lock by DAL layer to insert data to the table while Spark reading
> from the same table.
>
> However, on the other hand, with InnoDB write speed is considerably slow
> (because it is designed to support transactions), so it will affect the
> receiver performance.
>
> One option we have in DAS is, we can use two DBs to to keep incoming
> records and processed records, i.e., EVENT_STORE and PROCESSED_DATA_STORE.
>
> For ESB Analytics, we can configure to use MyISAM for EVENT_STORE and
> InnoDB for PROCESSED_DATA_STORE. It is because in ESB analytics,
> summarizing up to minute level is done by real time analytics and Spark
> queries will read and process data using minutely (and higher) tables which
> we can keep in PROCESSED_DATA_STORE. Since raw table(which data receiver
> writes data) is not being used by Spark queries, the receiver performance
> will not be affected.
>
> However, in most cases, Spark queries may written to read data directly
> from raw tables. As mentioned above, with MyISAM this could lead to
> performance issues if data publishing and spark analytics happens in
> parallel. So considering that I think we should change the default
> configuration to use InnoDB. WDYT?
>
> --
> Thanks & Regards,
>
> Inosh Goonewardena
> Associate Technical Lead- WSO2 Inc.
> Mobile: +94779966317
>



-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] [Analytics] Removing FACET from Indexing data types

2016-04-23 Thread Anjana Fernando
On Fri, Apr 22, 2016 at 12:30 AM, Gimantha Bandara <giman...@wso2.com>
wrote:

> Hi Isuru,
>
> Older FACET keyword is also supported. Yes, we are planing to add -f to
> denote facet attribute.
>
> @Anjana/Niranda WDYT?
>

+1.

Cheers,
Anjana.


>
>
> On Friday, April 22, 2016, Isuru Wijesinghe <isur...@wso2.com> wrote:
>
>> Hi Gimantha,
>>
>> How can we denote a given field in any data type as a facet in
>> *spark-sql.* Lets say as an example I have a field called
>> processDefinitionId (string data-type) and I need to define it as a facet
>> as well (see below example).
>>
>> CREATE TEMPORARY TABLE PROCESS_USAGE_SUMMARY USING CarbonAnalytics
>> OPTIONS (tableName "PROCESS_USAGE_SUMMARY_DATA",
>> schema "processDefinitionId string -i *-f*,
>> processVersion string -i,
>> processInstanceId string -i,,
>> primaryKeys "processInstanceId"
>> );
>>
>> is this the way that we can define it in newer version ?
>>
>>
>> On Fri, Apr 22, 2016 at 2:39 AM, Gimantha Bandara <giman...@wso2.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> We are planning to remove "FACET" (this type is used to
>>> categorize/group, to get unique values and to drill-down) from indexing
>>> data types and we will introduce an attribute to mark other data types as a
>>> FACET or not.  Earlier FACETs can be defined only for STRING fields and
>>> even if we define a STRING as a FACET, then we will not be able to search
>>> it as a STRING field. With this change, any data type field can be marked
>>> as a FACET and then the field can be used as a FACET and as the usual data
>>> type as well.
>>> This change will not affect the older DAS capps or event-store
>>> configurations; It will be backward compatible with previous DAS versions
>>> (3.0.0 and 3.0.1). However if you try to get the Schema of a table using JS
>>> APIs, REST APIs or the Webservice, FACET type will not be there. A
>>> attribute called "isFacet" is used to identify the FACETed fields. See
>>> below for an example.
>>>
>>>
>>>
>>> *Older schema*
>>> {
>>> "columns" : {
>>>"logFile" : { "type" : "STRING", "isIndex" : true,
>>> "isScoreParam" : false },
>>>"level" : { "type" : "DOUBLE", "isIndex" : true,
>>> "isScoreParam" : false },
>>>"location" : { "type" : "FACET", "isIndex" : true,
>>> "isScoreParam" : false } },
>>> "primaryKeys" : ["logFile", "level"]
>>> }
>>>
>>>
>>> *Equivalent new schema*
>>>
>>>
>>> *{ "columns" : {   "logFile" : { "type" : "STRING",
>>> "isIndex" : true, "isScoreParam" : false, **, isFacet : *false
>>> * },   "*level*" : { "type" : "DOUBLE", "isIndex" : true,
>>> "isScoreParam" : false, **, isFacet : *false* },*
>>> *   "location" : { "type" : "*STRING*", "isIndex" : true,
>>> "isScoreParam" : false, isF*acet : true
>>> * } },//FACET field is removed "primaryKeys" : ["logFile", "*
>>> level
>>>
>>>
>>> *"] }*
>>> --
>>>
>>>
>>> ___
>>> Architecture mailing list
>>> Architecture@wso2.org
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>>
>> --
>> Isuru Wijesinghe
>> *Software Engineer*
>> WSO2 inc : http://wso2.com
>> lean.enterprise.middleware
>> Mobile: 0710933706
>> isur...@wso2.com
>>
>
>
> --
> Gimantha Bandara
> Software Engineer
> WSO2. Inc : http://wso2.com
> Mobile : +94714961919
>
>


-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] [analytics-esb] Summary Stat Generation Mechanism

2016-04-20 Thread Anjana Fernando
Hi,

Good progress Supun! .. do keep pushing the parameters to find the limits
we can go to.

@Suho, the idea was to all together eliminate the batch script and just
store/index the data for later lookup, and do the computation purely in
Siddhi. I don't think we will get a big scaling problem, since the data
needs to be stored in-memory when we go to upper layers of summarization is
smaller, and stops at yearly granularity. So it would be at that time, we
having data in-memory for last years worth of data, in a way of last 12
records of summary data for 12 months for a specific artifact, last day's
worth, that is 30 entries etc.. so growing of data slows immensely, and
also it has a upper limit, which I guess should be comfortability within
usual memory capacity.

So if we can get a proper checkpoint and replay mechanism figured out for
data processed, we can do all the things in CEP, then we just don't have
the complexity of maintaining two mechanism of doing the processing.

Cheers,
Anjana.

On Wed, Apr 20, 2016 at 12:11 PM, Sriskandarajah Suhothayan <s...@wso2.com>
wrote:

> I think it will make more sense to run seconds and minutes from siddhi,
> and run the spark every hour, when there are lots of date on the system
> this will be much more scalable.
>
> WDYT?
>
> Regards
> Suho
>
> On Wed, Apr 20, 2016 at 11:50 AM, Supun Sethunga <sup...@wso2.com> wrote:
>
>> Hi,
>>
>> This is a follow-up mail of [1], to give an update on the status with the
>> performance issue [2] . So as mentioned in the previous mail, with
>> Spark-script doing the summary stat generation as a batch process, creates
>> a bottleneck at a higher TPS. More precisely, with our findings, it cannot
>> handle a throughput of more than 30 TPS as a batch process. (i.e: events
>> published to DAS within 10 mins with a TPS of 30, take more than 10 mins to
>> process. Means, if we schedule a script every 10 mins, the events to be
>> processed grows over time).
>>
>> To overcome this, thought of doing the summarizing up to a certain extent
>> (upto second-wise summary) using siddhi, and to generate remaining
>> stats (per-minute/hour/day/month), using spark. With this enhancement, ran
>> some load tests locally to evaluate this approach, and the results are as
>> follows.
>>
>> Backend DB : MySQL
>> ESB analytics nodes: 1
>>
>>  With InnoDB
>>
>>- With *80 TPS*: (script scheduled every 1 min) : Avg time taken for
>>completion of  the script  = ~ *20 sec*.
>>- With* 500 TPS* (script scheduled every 2 min) : Avg time taken for
>>completion of  the script  = ~ *45 sec*.
>>
>>
>> With MyISAM
>>
>>- With *80 TPS* (script scheduled every 1 min) : Avg time taken for
>>completion of  the script  = ~ *24 sec*.
>>- With *80 TPS *(script scheduled every 2 min) : Avg time taken for
>>completion of  the script  = ~ *20 sec*.
>>- With *500 TPS* (script scheduled every 2 min) : Avg time taken for
>>completion of  the script  = ~ *35 sec*.
>>
>> As a further improvement, we would be trying out to do summarizing upto
>> minute/hour level (eventually do all the summarizing using siddhi).
>>
>> [1] [Dev] ESB Analytics - Verifying the common production use cases
>> [2] https://wso2.org/jira/browse/ANLYESB-15
>>
>> Thanks,
>> Supun
>>
>> --
>> *Supun Sethunga*
>> Software Engineer
>> WSO2, Inc.
>> http://wso2.com/
>> lean | enterprise | middleware
>> Mobile : +94 716546324
>>
>> ___
>> Architecture mailing list
>> Architecture@wso2.org
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
>
> *S. Suhothayan*
> Technical Lead & Team Lead of WSO2 Complex Event Processor
> *WSO2 Inc. *http://wso2.com
> * <http://wso2.com/>*
> lean . enterprise . middleware
>
>
> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog:
> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/>twitter:
> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | linked-in:
> http://lk.linkedin.com/in/suhothayan <http://lk.linkedin.com/in/suhothayan>*
>



-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


[Architecture] Cross Tenant Data Reading from Spark Queries in DAS

2016-04-17 Thread Anjana Fernando
Hi,

We've implemented an approach to read data from all the tenants in the
system by the super tenant, where the table read from the tenants should
have the same table name.

So now, with the following syntax, you will be given an aggregated view of
all the data records from all the tenants.

create temporary table T1 using CarbonAnalytics OPTIONS (tableName "T1",
schema "d1 int, d2 string, _tenantId int", globalTenantRead "true");

There is a new analytics provider property introduced, which is
"globalTenantRead", where when this is set to "true", it will go through
all the tenants in aggregating records of a table named "T1" in that
tenant. Also a new special table schema attribute "_tenantId" is
introduced, which is an automatically populated value for a record based on
the actual origin tenant of the record. So this "_tenantId" field can be
used for further filtering/grouping in the Spark queries.

With this new feature, there is a change in the way DAS stores the metadata
of each analytics table. So because of this, there is a migration step when
going from DAS v3.0.x to v3.1.0+. Since it is just a table metadata format
change, not data itself, the migration process is a very quick one. The
migration process has been incorporated to the DAS data backup tool [1],
and the migration guide in the docs are updated here [2], and general docs
on $subject is updated here [3].

[1] https://docs.wso2.com/pages/viewpage.action?pageId=50505847
[2] https://docs.wso2.com/pages/viewpage.action?pageId=50505762
[3]
https://docs.wso2.com/display/DAS310/Spark+Query+Language#SparkQueryLanguage-WSO2DASSQLguide

Cheers,
Anjana.
-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Support Efficient Cross Tenant Analytics in DAS

2016-03-31 Thread Anjana Fernando
Hi Srinath,

I'm not sure if this is something we would have to "fix". It was a clear
design decision we took in order to isolate the tenant data, in order for
others not to access other tenant's data. So also in Spark virtual tables,
it will directly map to their own analytics tables. If we allow, maybe the
super tenant, to access other tenant's data, it can be seen as a security
threat. The idea should be, no single tenant should have any special access
to other tenant's data.

So setting aside the physical representation (which has other
complications, like adding another index for tenantId and so on, which
should be supported by all data sources), if we are to do this, we need a
special view for super tenant tables in Spark virtual tables, in order for
them to have access to the "tenantId" property of that table. And in other
tenant's tables, we need to hide this, and not let them use it of course.
This looks like bit of a hack to implement a specific scenario we have.

So this requirement as I know mainly came from APIM analytics, where its
in-built analytics publishes all tenant's data to super tenant's tables and
the data is processed from there. So if we are doing this, this data is
only used internally, and cannot be shown to each respective tenants for
their own analytics. If each tenant needs to do their own analytics, they
should configure to get data for their tenant space, and write their own
analytics scripts. This may at the end mean, some type of data duplication,
but it should happen, because two different users are doing their different
processing. And IMO, we should not try to share any possible common data
they may have and hack the system.

At the end, the point is, we should not take lightly what we try to achieve
in having multi-tenancy, and compromise its fundamentals. At the moment,
the idea should be, each tenant would have their own data, its own
analytics scripts, and if you need to scale accordingly, have separate
hardware for those tenants. And running separate queries for different
tenants does not necessarily make it very slow, since the data load will be
divided between the tenants, and only extra processing would be possible
ramp up times for query executions.

Cheers,
Anjana.

On Thu, Mar 31, 2016 at 11:45 AM, Srinath Perera <srin...@wso2.com> wrote:

> Hi Anjana,
>
> Currently we keep different Hbase/ RDBMS table per tenant. In
> multi-tenant, environment, this is very expensive as we will have to run a
> query per tenant.
>
> How can we fix this? e.g. if we keep tenant as field in the table, that
> let us do a "group by".
>
> --Srinath
>
> --
> 
> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
> Site: http://home.apache.org/~hemapani/
> Photos: http://www.flickr.com/photos/hemapani/
> Phone: 0772360902
>



-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] [Analytics] Improvements to Lucene based Aggregate functions (Installing Aggregates as OSGI components)

2016-03-20 Thread Anjana Fernando
m aggregate functions through Javascript API*
>>>
>>> var queryInfo = {
>>> tableName:"Students", //table name on which the aggregation is
>>> performed
>>> searchParams : {
>>> groupByField:"location", //grouping field if any
>>> query : "Grade:10" //additional filtering query
>>> aggregateFields:[
>>> {
>>> fields:["Height", "Weight"], //fields necessary for
>>> aggregate function
>>> aggregate:"CUSTOM_AGGREGATE", //unique name of the aggregate
>>> function, this is what we return using "getAggregateName" method above.
>>> alias:"aggregated_result" //Alias for the result of the
>>> aggregate function
>>> }]
>>> }
>>> }
>>>
>>> client.searchWithAggregates(queryInfo, function(data) {
>>>   console.log (data["message"]);
>>> }, function(error) {
>>>   console.log("error occured: " + error["message"]);
>>> });
>>>
>>>
>>> *Note that the order  elements in attribute "fields" will be the same
>>> order of aggregateFields parameter's element order in above process method.
>>> That is Height will be aggregateFields[0] and Weight will be
>>> aggregateFields[1] in process method. Based on that order,
>>> "CUSTOM_AGGREGATE" should be implemented.*
>>>
>>>
>>>
>>> *Aggregates REST APIs*This is as same as the Javascript API.
>>>
>>> POST https://localhost:9443/analytics/aggregates
>>> {
>>>  "tableName":"Students",
>>>  "groupByField":"location",
>>>  "aggregateFields":[
>>>{
>>>  "fields":["Height", "Weight"],
>>>  "aggregate":"CUSTOM_AGGREGATE",
>>>  "alias":"aggregated_result"
>>>}]
>>> }
>>>
>>> [1]
>>> https://docs.wso2.com/display/DAS301/Retrieving+Aggregated+Values+of+Given+Records+via+REST+API
>>> [2]
>>> https://docs.wso2.com/display/DAS301/Retrieving+Aggregated+Values+of+Given+Records+via+JS+API
>>> --
>>> Gimantha Bandara
>>> Software Engineer
>>> WSO2. Inc : http://wso2.com
>>> Mobile : +94714961919
>>>
>>> ___
>>> Architecture mailing list
>>> Architecture@wso2.org
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>>
>> --
>> *Sinthuja Rajendran*
>> Associate Technical Lead
>> WSO2, Inc.:http://wso2.com
>>
>> Blog: http://sinthu-rajan.blogspot.com/
>> Mobile: +94774273955
>>
>>
>>
>> ___
>> Architecture mailing list
>> Architecture@wso2.org
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
> Gimantha Bandara
> Software Engineer
> WSO2. Inc : http://wso2.com
> Mobile : +94714961919
>
> ___
> Architecture mailing list
> Architecture@wso2.org
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Embedding Log Analyzer with Analytics distribution of Products

2016-02-01 Thread Anjana Fernando
Hi,

The initial use case I had with Log Analyzer was, as a general log analysis
tool, where users can just point to a log location, can be WSO2/non-WSO2
logs, and run queries against it / create dashboards. The concern I've with
integrating log analyzer also with our new analytics distributions is,
whether we will have some considering overlapping functionality between the
two. The DAS4X analytics effort is to basically create mostly the static
dashboards that would be there (maybe with alerts), which can be
successfully done by internally publishing all the events required for
those. But then, if we also say, you can/should use log analyzer (which is
a different UI/experience altogether) to create dashboards/queries, that we
missed from the earlier effort, that does not sound right.

So the point is, as I see, if we do the pure DAS4X solution right for a
product, they do not have an immediate need to use the log analysis
features again to do any custom analysis. But of course, if they want to
process the logs also nevertheless, they can setup the log analyzer product
and do it, for example, as a replacement to syslog, for centralized log
storage.

Cheers,
Anjana.

On Mon, Feb 1, 2016 at 2:04 PM, Srinath Perera <srin...@wso2.com> wrote:

> Hi All,
>
> I believe we should integrate Log Analyzer with analytics distributions of
> the products.
>
> It is true some of the information you can take from Log analyzer is
> already available under normal analytics. For those, we do not need to use
> Log analyzer.
>
> However, log analyzer let us find and understand use cases that is not
> already instrumented. For example, when we see a error, we might check has
> a similar error happened before. Basically we can check ad-hoc dynamic use
> cases via log analyzer. Example of this is analytics done by our Cloud
> team.
>
> In general, log analyzer will be used by advanced users who will
> understand inner workings for the product. It will be a very powerful
> debugging tool.
>
> However, if we want to embed the log analyzer, then it is challenging due
> to ruby based log stash we use with log analyzer. I think in that case, we
> also need a java based log agent.
>
> Please comment.
>
> Thanks
> Srinath
> --
> 
> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
> Site: http://people.apache.org/~hemapani/
> Photos: http://www.flickr.com/photos/hemapani/
> Phone: 0772360902
>



-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Embedding Log Analyzer with Analytics distribution of Products

2016-02-01 Thread Anjana Fernando
On Mon, Feb 1, 2016 at 2:35 PM, Srinath Perera <srin...@wso2.com> wrote:

>
>
> On Mon, Feb 1, 2016 at 2:21 PM, Anjana Fernando <anj...@wso2.com> wrote:
>
>> Hi,
>>
>> The initial use case I had with Log Analyzer was, as a general log
>> analysis tool, where users can just point to a log location, can be
>> WSO2/non-WSO2 logs, and run queries against it / create dashboards. The
>> concern I've with integrating log analyzer also with our new analytics
>> distributions is, whether we will have some considering overlapping
>> functionality between the two. The DAS4X analytics effort is to basically
>> create mostly the static dashboards that would be there (maybe with
>> alerts), which can be successfully done by internally publishing all the
>> events required for those. But then, if we also say, you can/should use log
>> analyzer (which is a different UI/experience altogether) to create
>> dashboards/queries, that we missed from the earlier effort, that does not
>> sound right.
>>
>
> Anjana, point is dynamic/ad-hoc query use cases. E.g.
> 1) You see a new error, and want to check has it happend before.
> 2) You see two error happening together. You need to know it has happend
> together before.
>

True. the use cases are there. I was just thinking, if it will fit the flow
with the other analytics operations we do. Anyways, on second thought, even
if it's totally separate also, having searchable (analyzable) logs readily
available, after we install the full analytics solution for a product,
would be useful.

Cheers,
Anjana.


>
>
>>
>> So the point is, as I see, if we do the pure DAS4X solution right for a
>> product, they do not have an immediate need to use the log analysis
>> features again to do any custom analysis. But of course, if they want to
>> process the logs also nevertheless, they can setup the log analyzer product
>> and do it, for example, as a replacement to syslog, for centralized log
>> storage.
>>
>> Cheers,
>> Anjana.
>>
>> On Mon, Feb 1, 2016 at 2:04 PM, Srinath Perera <srin...@wso2.com> wrote:
>>
>>> Hi All,
>>>
>>> I believe we should integrate Log Analyzer with analytics distributions
>>> of the products.
>>>
>>> It is true some of the information you can take from Log analyzer is
>>> already available under normal analytics. For those, we do not need to use
>>> Log analyzer.
>>>
>>> However, log analyzer let us find and understand use cases that is not
>>> already instrumented. For example, when we see a error, we might check has
>>> a similar error happened before. Basically we can check ad-hoc dynamic use
>>> cases via log analyzer. Example of this is analytics done by our Cloud
>>> team.
>>>
>>> In general, log analyzer will be used by advanced users who will
>>> understand inner workings for the product. It will be a very powerful
>>> debugging tool.
>>>
>>> However, if we want to embed the log analyzer, then it is challenging
>>> due to ruby based log stash we use with log analyzer. I think in that case,
>>> we also need a java based log agent.
>>>
>>> Please comment.
>>>
>>> Thanks
>>> Srinath
>>> --
>>> 
>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>> Site: http://people.apache.org/~hemapani/
>>> Photos: http://www.flickr.com/photos/hemapani/
>>> Phone: 0772360902
>>>
>>
>>
>>
>> --
>> *Anjana Fernando*
>> Senior Technical Lead
>> WSO2 Inc. | http://wso2.com
>> lean . enterprise . middleware
>>
>
>
>
> --
> 
> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
> Site: http://people.apache.org/~hemapani/
> Photos: http://www.flickr.com/photos/hemapani/
> Phone: 0772360902
>



-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Adding streams and scripts to DAS using an API

2016-01-07 Thread Anjana Fernando
Great! ..

Cheers,
Anjana.

On Thu, Jan 7, 2016 at 12:57 PM, Chathura Ekanayake <chath...@wso2.com>
wrote:

> Thanks Anjana. Yes, we can use admin services.
>
> On Thu, Jan 7, 2016 at 12:10 PM, Anjana Fernando <anj...@wso2.com> wrote:
>
>> Hi Chathura,
>>
>> We don't have like any special external APIs for this. But we do have the
>> admin services that does these operations. So is it possible to use the
>> admin services for these operations? .. You will of course need to store
>> the credentials for these services in a configuration file in Process
>> Center, and use them with the admin service calls.
>>
>> Cheers,
>> Anjana.
>>
>> On Thu, Jan 7, 2016 at 11:33 AM, Chathura Ekanayake <chath...@wso2.com>
>> wrote:
>>
>>> Process Center needs to add new streams and scripts to DAS when users
>>> configure new KPIs on processes. These KPI configurations can be performed
>>> by process center users at runtime, therefore I think the best method is to
>>> add corresponding streams/scripts using an API. For example, users can
>>> select which process variables to publish and how to summarize them to
>>> construct KPIs, so that an event stream and required scripts have to be
>>> added at runtime.
>>>
>>> Is this supported by DAS? If not, what is the best approach to do this?
>>>
>>> Regards,
>>> Chathura
>>>
>>
>>
>>
>> --
>> *Anjana Fernando*
>> Senior Technical Lead
>> WSO2 Inc. | http://wso2.com
>> lean . enterprise . middleware
>>
>
>


-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Notebook Support Use cases for DAS

2015-12-08 Thread Anjana Fernando
Hi Srinath,

I'm afraid, we couldn't do any work on this yet, because at the moment,
everyone is occupied on working for the DAS 3.0.1 release and the Log
Analyzer work. I just had a chat with Miyuru, he mentioned he is checking
CEP specific functionality for notebooks. I guess, the batch analytics
integration with the notebook approach is somewhat straightforward, where
what we basically have in Spark Console is a subset of that approach. So
according to the current plan we made for next year, we planned on checking
that for DAS 3.1.0, with the change to C5, where we would be changing all
the UIs, which would be removing all the current functionality from the
admin console and unifying the UIs. So in that effort, we can integrate
this aspect too. Miyuru suggested that we'll have a quick chat on Friday,
let's talk more then.

Cheers,
Anjana.

On Tue, Dec 8, 2015 at 9:18 AM, Srinath Perera <srin...@wso2.com> wrote:

> Anjana, how is this thread progressing? Who is looking at/ thinking about
> notebooks?
>
> On Thu, Nov 26, 2015 at 9:19 AM, Anjana Fernando <anj...@wso2.com> wrote:
>
>> Hi Srinath,
>>
>> On Thu, Nov 26, 2015 at 9:08 AM, Srinath Perera <srin...@wso2.com> wrote:
>>
>>> Hi Anjana,
>>>
>>> Great!! I think the next step is deciding whether we do this with
>>> Zeppelin and or we build it from scratch.
>>>
>>> Pros of Zeppelin
>>>
>>>1. We get lot of features OOB
>>>2. Code maintained by community, patches etc.
>>>3. New features will get added and it will evolve
>>>4. We get to contribute to an Apache project and build recognition
>>>
>>> Cons
>>>
>>>1. Real deep integration might be lot of work ( we get initial
>>>version very fast, but integrating details .. e.g. make our UIs work
>>>in Zeppelin, or get Zeppelin to post to UES) might be tricky.
>>>2. Zeppelin is still in incubator
>>>3. Need to assess community
>>>
>>> I suggest you guys have a detailed chat with MiyuruD, who looked at it
>>> in detail, try out things, thing about it and report back.
>>>
>>
>> +1, we'll work with Miyuru also and see how to go forward.
>>
>>
>>>
>>>
>>> On Thu, Nov 26, 2015 at 3:12 AM, Anjana Fernando <anj...@wso2.com>
>>> wrote:
>>>
>>>> Hi Srinath,
>>>>
>>>> The story looks good. For that part about, the "user can play with the
>>>> data interactively", to make it more functional, we should probably
>>>> consider integration of Scala scripts to the mix, rather than only having
>>>> Spark SQL. Spark SQL maybe limited in functionality on certain data
>>>> operations, and with Scala, we should be able to use all the functionality
>>>> of Spark. For example, it would be easier to integrate ML operations with
>>>> other batch operations etc.. to create a more natural flow of operations.
>>>> The implementation may be tricky though, considering clustering,
>>>> multi-tenancy etc..
>>>>
>>> Lets keep Scala version post MVP.
>>>
>>
>> Sure.
>>
>>
>>>
>>>
>>>>
>>>> Also, I would like to also bring up the question on, are most batch
>>>> jobs actually meant to be scheduled as such repeatedly, for a data set that
>>>> actually grows always? .. or is it mostly a thing where we execute
>>>> something once and get the results and that's it. Maybe this is a different
>>>> discussion though. But, for scheduled batch jobs as such, I guess
>>>> incremental processing would be critical, which no one seems to bother that
>>>> much though.
>>>>
>>> I think it is mostly scheduled batches as we have. Shall we take this up
>>> in a different thread?
>>>
>>
>> Yep, sure.
>>
>>
>>>
>>>
>>>>
>>>> Cheers,
>>>> Anjana.
>>>>
>>>> On Mon, Nov 23, 2015 at 2:57 PM, Srinath Perera <srin...@wso2.com>
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I tried to write down the use cases, to start thinking about this
>>>>> starting from what we discussed in the meeting. Please comment. ( doc is 
>>>>> at
>>>>> https://docs.google.com/document/d/1355YEXbhcd2fvS-zG_CiMigT-iTncxYn3DTHlJRTYyo/edit#
>>>>> ( same content is below).
>>>>>
>>>>> Thanks
>>>>> Srinath
&

Re: [Architecture] Notebook Support Use cases for DAS

2015-11-25 Thread Anjana Fernando
Hi Srinath,

On Thu, Nov 26, 2015 at 9:08 AM, Srinath Perera <srin...@wso2.com> wrote:

> Hi Anjana,
>
> Great!! I think the next step is deciding whether we do this with Zeppelin
> and or we build it from scratch.
>
> Pros of Zeppelin
>
>1. We get lot of features OOB
>2. Code maintained by community, patches etc.
>3. New features will get added and it will evolve
>4. We get to contribute to an Apache project and build recognition
>
> Cons
>
>1. Real deep integration might be lot of work ( we get initial version
>very fast, but integrating details .. e.g. make our UIs work in Zeppelin,
>or get Zeppelin to post to UES) might be tricky.
>2. Zeppelin is still in incubator
>3. Need to assess community
>
> I suggest you guys have a detailed chat with MiyuruD, who looked at it in
> detail, try out things, thing about it and report back.
>

+1, we'll work with Miyuru also and see how to go forward.


>
>
> On Thu, Nov 26, 2015 at 3:12 AM, Anjana Fernando <anj...@wso2.com> wrote:
>
>> Hi Srinath,
>>
>> The story looks good. For that part about, the "user can play with the
>> data interactively", to make it more functional, we should probably
>> consider integration of Scala scripts to the mix, rather than only having
>> Spark SQL. Spark SQL maybe limited in functionality on certain data
>> operations, and with Scala, we should be able to use all the functionality
>> of Spark. For example, it would be easier to integrate ML operations with
>> other batch operations etc.. to create a more natural flow of operations.
>> The implementation may be tricky though, considering clustering,
>> multi-tenancy etc..
>>
> Lets keep Scala version post MVP.
>

Sure.


>
>
>>
>> Also, I would like to also bring up the question on, are most batch jobs
>> actually meant to be scheduled as such repeatedly, for a data set that
>> actually grows always? .. or is it mostly a thing where we execute
>> something once and get the results and that's it. Maybe this is a different
>> discussion though. But, for scheduled batch jobs as such, I guess
>> incremental processing would be critical, which no one seems to bother that
>> much though.
>>
> I think it is mostly scheduled batches as we have. Shall we take this up
> in a different thread?
>

Yep, sure.


>
>
>>
>> Cheers,
>> Anjana.
>>
>> On Mon, Nov 23, 2015 at 2:57 PM, Srinath Perera <srin...@wso2.com> wrote:
>>
>>> Hi All,
>>>
>>> I tried to write down the use cases, to start thinking about this
>>> starting from what we discussed in the meeting. Please comment. ( doc is at
>>> https://docs.google.com/document/d/1355YEXbhcd2fvS-zG_CiMigT-iTncxYn3DTHlJRTYyo/edit#
>>> ( same content is below).
>>>
>>> Thanks
>>> Srinath
>>> Batch, interactive, and Predictive Story
>>>
>>>1.
>>>
>>>Data is uploaded to the system or send as a data stream and
>>>collected for some time ( in DAS)
>>>2.
>>>
>>>Data Scientist come in and select a data set, and look at schema of
>>>data and do standard descriptive statistics like Mean, Max, Percentiles 
>>> and
>>>standard deviation about the data.
>>>3.
>>>
>>>Data Scientist cleans up the data using series of transformations.
>>>This might include combining multiple data sets into one data set.
>>> [Notebooks]
>>>4.
>>>
>>>He can play with the data interactively
>>>5.
>>>
>>>He visualize the data in several ways [Notebooks]
>>>6.
>>>
>>>If he need descriptive statistics, he can export the data mutations
>>>in the notebooks as a script and schedule it.
>>>7.
>>>
>>>If what he needs is machine learning, he can initialize and run the
>>>ML Wizard from the Notebooks and create a model.
>>>8.
>>>
>>>He can export the model he created and any data mutation operations
>>>he did as a script and deploy both the model and data mutation operations
>>>in the CEP ( Realtime Pipeline). This is the actual transaction flow.
>>>9.
>>>
>>>He can export the data mutation operations and machine learning
>>>model building logic as a script and schedule it to run periodically. 
>>> This
>>>is the
>>>
>>>
>>>
>>> [image: NotebookPipeline.png]
>&

Re: [Architecture] Notebook Support Use cases for DAS

2015-11-25 Thread Anjana Fernando
Hi Srinath,

The story looks good. For that part about, the "user can play with the data
interactively", to make it more functional, we should probably consider
integration of Scala scripts to the mix, rather than only having Spark SQL.
Spark SQL maybe limited in functionality on certain data operations, and
with Scala, we should be able to use all the functionality of Spark. For
example, it would be easier to integrate ML operations with other batch
operations etc.. to create a more natural flow of operations. The
implementation may be tricky though, considering clustering, multi-tenancy
etc..

Also, I would like to also bring up the question on, are most batch jobs
actually meant to be scheduled as such repeatedly, for a data set that
actually grows always? .. or is it mostly a thing where we execute
something once and get the results and that's it. Maybe this is a different
discussion though. But, for scheduled batch jobs as such, I guess
incremental processing would be critical, which no one seems to bother that
much though.

Cheers,
Anjana.

On Mon, Nov 23, 2015 at 2:57 PM, Srinath Perera <srin...@wso2.com> wrote:

> Hi All,
>
> I tried to write down the use cases, to start thinking about this starting
> from what we discussed in the meeting. Please comment. ( doc is at
> https://docs.google.com/document/d/1355YEXbhcd2fvS-zG_CiMigT-iTncxYn3DTHlJRTYyo/edit#
> ( same content is below).
>
> Thanks
> Srinath
> Batch, interactive, and Predictive Story
>
>1.
>
>Data is uploaded to the system or send as a data stream and collected
>for some time ( in DAS)
>2.
>
>Data Scientist come in and select a data set, and look at schema of
>data and do standard descriptive statistics like Mean, Max, Percentiles and
>standard deviation about the data.
>3.
>
>Data Scientist cleans up the data using series of transformations.
>This might include combining multiple data sets into one data set.
> [Notebooks]
>4.
>
>He can play with the data interactively
>5.
>
>He visualize the data in several ways [Notebooks]
>6.
>
>If he need descriptive statistics, he can export the data mutations in
>the notebooks as a script and schedule it.
>7.
>
>If what he needs is machine learning, he can initialize and run the ML
>Wizard from the Notebooks and create a model.
>8.
>
>He can export the model he created and any data mutation operations he
>did as a script and deploy both the model and data mutation operations in
>the CEP ( Realtime Pipeline). This is the actual transaction flow.
>9.
>
>He can export the data mutation operations and machine learning model
>building logic as a script and schedule it to run periodically. This is the
>
>
>
> [image: NotebookPipeline.png]
>
>
>
> Realtime Story
>
> Realtime story also we can start with a data set, write realtime queries,
> test them by replaying the data, and then only we deploy queries. ( We do
> this event now). We can do the same.
>
>
>1.
>
>User start with a dataset.
>2.
>
>He write a set of queries using dataset as a stream. Streams and
>dataset shares the same record format. For example, consider the following
>data set.
>
>
> We can consider this as a batch data set by taking it as a whole or as a
> stream by taking record by record.
>
> For example, if we run query
>
> select * from CountryData where GDP>35000
>
> it will provide following results.
>
>
>
>
>1.
>
>Tables created by replay data with CEP queries, we can visualize like
>other data. ( except that time is special)
>2.
>
>When Data Scientist is happy, Data Scientist can click a button and
>export the CEP queries as a execution plan and any charts as a realtime
>gadgets. ( one complication is time is special, and we need to transform
>from any visualization to time based visualization)
>
>
> --
> 
> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
> Site: http://people.apache.org/~hemapani/
> Photos: http://www.flickr.com/photos/hemapani/
> Phone: 0772360902
>



-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] [LogAnalyzer] How the user can configure log publishing agent

2015-11-11 Thread Anjana Fernando
Hi Anuruddha,

On Tue, Nov 10, 2015 at 7:04 PM, Anuruddha Premalal <anurud...@wso2.com>
wrote:

> Hi Anjana,
>
> What was meant by " log stash log configuration files" is its
> configuration format, not that we are making use of logstash to publish
> data, of course we are writing our own agent based on similar config format.
>

Yeah, I know, it is the configuration format I told to review carefully, to
see if the semantics defined there is enough for our use cases.

Cheers,
Anjana.


>
> On Wed, Nov 11, 2015 at 1:54 AM, Anjana Fernando <anj...@wso2.com> wrote:
>
>>
>>>1. If log stash log configuration files are well done, can we do the
>>>same formats?
>>>
>>> Yes,  this has already been discussed in  architecture mail "Component
>> level description of the log analyzer tool"
>>
>> Please check this with Sachith also, he has some experience in working
>> with logstash (he did a logstash adapter earlier), and he will know the
>> limitations/benefits in using it to map to our events, starting from
>> arbitrary field support etc.. We should check the balance of creating
>> something on our own vs living with the limitations/annoyances of logstash
>> would have which would not directly map to our use cases.
>>
>> Cheers,
>> Anjana.
>>
>> Thanks
>>> Srinath
>>>
>>> p.s. above are opinions only, please shout if disagree.
>>>
>>>
>>>
>>>
>>> On Fri, Nov 6, 2015 at 6:33 PM, Malith Dhanushka <mal...@wso2.com>
>>> wrote:
>>> >
>>> > Yes I agree with the complication on applying agent configs in large
>>> clusters. But centralized config management using a message broker is a
>>> critical decision to take as it weighs maintenance effort. That decision
>>> depends on how big the cluster is and how frequently the log configs are
>>> getting changed.
>>> >
>>> > On Fri, Nov 6, 2015 at 3:22 PM, Inosh Goonewardena <in...@wso2.com>
>>> wrote:
>>> >>
>>> >> Hi Anurudda,
>>> >>
>>> >>
>>> >> On Fri, Nov 6, 2015 at 3:06 PM, Anuruddha Premalal <
>>> anurud...@wso2.com> wrote:
>>> >>>
>>> >>> Hi Inosh,
>>> >>>
>>> >>> Can you be specific on the added complexities of managed
>>> configuration mode? I have explained in the sequence diagram how this will
>>> function. Manage configuration mode is actually a user choice, if the
>>> deployment is quite simple user can use default agent side configurations
>>> (as in logstash).
>>> >>
>>> >>
>>> >> As Malith pointed out, my idea was to avoiding configuring the log
>>> agent remotely and publishing the config. But yes, in a larger cluster,
>>> configuring each of the agent won't be practical and managed config mode is
>>> the better approach. If the user has the choice he/she can select depending
>>> on his/her preference.
>>> >>
>>> >>>
>>> >>>
>>> >>> Managed config mode addresses a major lacking feature which agent
>>> config mode doesn't have; If a user needs to change/ update configs for a
>>> large cluster, configuring them each won't be practical.
>>> >>>
>>> >>> In terms of the overhead concern of splitting an event at the agent
>>> side over master side, since a single log event usually have less amount of
>>> characters, it won't cost much to perform the filtering; if we consider
>>> master side, there won't only be a single log stream so it obviously adds
>>> more overhead to the master. Because of this we shouldn't do filtering
>>> never on master side.
>>> >>>
>>> >>> We are writing the agent using python, which doesn't consume more
>>> resources as a jvm, and it will absolutely be an advantage for a smooth run.
>>> >>>
>>> >>>
>>> >>> On Fri, Nov 6, 2015 at 2:43 PM, Inosh Goonewardena <in...@wso2.com>
>>> wrote:
>>> >>>>
>>> >>>> Hi,
>>> >>>>
>>> >>>> On Fri, Nov 6, 2015 at 1:48 PM, Sachith Withana <sach...@wso2.com>
>>> wrote:
>>> >>>>>
>>> >>>>> Hi Malith,
>>> >>>>>
>>> >>>>> In terms of the 1st option,
>>> >>>>> - the o

Re: [Architecture] [DAS] Java Agent to monitor server activities

2015-11-03 Thread Anjana Fernando
Hi Udani,

Can you please explain a bit more on, how the field names of the streams
will be derived. That is, for example, how will an event look like, when a
method before scenario gets hit, method after, insert at and so on.
Basically, give some sample event payloads for each scenario.

Also, ideally later on, we should be able to copy new configuration files
for new scenarios to a specific folder of the agent, and the agent should
pick up all the configuration files, load up all the scenario in the agent
startup and execute them. So we can create these configuration files for
specific scenarios and install them when needed. For example, database
monitoring scenario, JMS event monitoring scenario configuration files
etc..

Cheers,
Anjana.

On Wed, Oct 28, 2015 at 1:28 AM, Udani Weeraratne <ud...@wso2.com> wrote:

> Hi,
>
> I am working on a java agent which can be used to monitor different
> activities carried out within DAS. Main concept of java agent is to modify
> bytecode of classes before they load onto JVM (bytecode instrumentation).
> This provide the ability to inject code into classes according to our
> requirement.
>
> Currently we are trying to implement a simple agent, which can monitor
> method calls and parameters passed under a given scenario and publish them
> to a stream in DAS. The architecture of this approach will be as follows.
>
> [image: Inline image 1]
>
>
> We will provide a simple configuration file, where user has to specify the
> class name, method name with signature, parameters to monitor and the
> location to be inserted (using javassist we can insert code at the top, at
> bottom and at a specific line of the method). Then the agent will be
> initialized based on the user requirement and instrument the requested
> methods before respective classes load onto JVM. (Javassist will be the
> library used in the instrumentation process) Once the classes are
> instrumented before the server start running, we will be able to publish
> events containing the intercepted data to a stream in DAS. Using the
> ability to publishing arbitrary fields in DAS, we are trying to provide the
> ability to index and store events with intercepted data. This can be used
> as a profiler to monitor the activities of the server.
>
> Layout of configuration file
>
> 
>
> 
>
> 
>
> 
>
>
>
> signature="(Ljava/lang/String;)Ljava/sql/PreparedStatement;">
>
>
>
>
>
>
>
>$1
>
>
>
>
>
>
>
>
>
>
>
>
>
> 
>
> This is the overall idea about the java agent we are working on. Hope this
> will be able to add value to the product. Appreciate any suggestions on
> this.
>
>
> Thanks,
>
> Udani
>
> --
> *Udani Weeraratne*
> Software Engineer - Intern
> WSO2 Inc.: www.wso2.com
> lean.enterprise.middleware
>
> Email: ud...@wso2.com
> Mobile: +94 775437714
> LinkedIn: *https://lk.linkedin.com/in/udaniweeraratne
> <https://lk.linkedin.com/in/udaniweeraratne>*
> Blog : https://udaniweeraratne.wordpress.com/
>
> ___
> Architecture mailing list
> Architecture@wso2.org
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] [Dev] [VOTE] Release WSO2 DSS 3.5.0 RC2

2015-10-25 Thread Anjana Fernando
Hi,

I tested the following:-

* OData functionality
  - Read/write/update/delete
  - Reading metadata
  - Reading with conditions

* New boxcarring functionality (request_box)
  - Multiple operation execution
  - Transaction commit/rollback on success/error

* Verified RC1 blocker.

[X] Stable - go ahead and release

Cheers,
Anjana.

On Sat, Oct 24, 2015 at 1:33 AM, Rajith Vitharana <raji...@wso2.com> wrote:

> Hi,
>
> This is the second release candidate of WSO2 DSS 3.5.0
>
> This release fixes the following issues:
> *https://wso2.org/jira/issues/?filter=12469
> <https://wso2.org/jira/issues/?filter=12469>*
>
> Please download, test and vote. The vote will be open for 72 hours or as
> needed.
>
> Source & binary distribution files:
> https://svn.wso2.org/repos/wso2/scratch/DSS/3.5.0/RC2/wso2dss-3.5.0.zip
> <https://svn.wso2.org/repos/wso2/scratch/DSS/3.5.0/RC1/wso2dss-3.5.0.zip>
>
> JavaDocs
> https://svn.wso2.org/repos/wso2/scratch/DSS/3.5.0/RC2/javaDocs/index.html
>
> Maven staging repo:
> *http://maven.wso2.org/nexus/content/repositories/orgwso2dss-058/
> <http://maven.wso2.org/nexus/content/repositories/orgwso2dss-058/>*
>
> The tag to be voted upon:
> *https://github.com/wso2/product-dss/tree/v3.5.0-RC2
> <https://github.com/wso2/product-dss/tree/v3.5.0-RC2>*
>
>
> [ ] Broken - do not release (explain why)
> [ ] Stable - go ahead and release
>
> Thanks,
> The WSO2 DSS Team
>
> _______
> Architecture mailing list
> Architecture@wso2.org
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] [Dev] [VOTE] Release WSO2 DSS 3.5.0 RC2

2015-10-25 Thread Anjana Fernando
Hi,

Please note that, earlier mail's distribution link's target wrong, it
actually points to the RC1 link (which is what you get when you click it,
you will have to copy and paste the link text to get the correct one),
anyways, the correct one again can be found below:-

https://svn.wso2.org/repos/wso2/scratch/DSS/3.5.0/RC2/wso2dss-3.5.0.zip

Cheers,
Anjana.

On Sat, Oct 24, 2015 at 1:33 AM, Rajith Vitharana <raji...@wso2.com> wrote:

> Hi,
>
> This is the second release candidate of WSO2 DSS 3.5.0
>
> This release fixes the following issues:
> *https://wso2.org/jira/issues/?filter=12469
> <https://wso2.org/jira/issues/?filter=12469>*
>
> Please download, test and vote. The vote will be open for 72 hours or as
> needed.
>
> Source & binary distribution files:
> https://svn.wso2.org/repos/wso2/scratch/DSS/3.5.0/RC2/wso2dss-3.5.0.zip
> <https://svn.wso2.org/repos/wso2/scratch/DSS/3.5.0/RC1/wso2dss-3.5.0.zip>
>
> JavaDocs
> https://svn.wso2.org/repos/wso2/scratch/DSS/3.5.0/RC2/javaDocs/index.html
>
> Maven staging repo:
> *http://maven.wso2.org/nexus/content/repositories/orgwso2dss-058/
> <http://maven.wso2.org/nexus/content/repositories/orgwso2dss-058/>*
>
> The tag to be voted upon:
> *https://github.com/wso2/product-dss/tree/v3.5.0-RC2
> <https://github.com/wso2/product-dss/tree/v3.5.0-RC2>*
>
>
> [ ] Broken - do not release (explain why)
> [ ] Stable - go ahead and release
>
> Thanks,
> The WSO2 DSS Team
>
> _______
> Architecture mailing list
> Architecture@wso2.org
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] [Dev] [VOTE] Release WSO2 DSS 3.5.0 RC1

2015-10-23 Thread Anjana Fernando
-1.

Discovered the following issue [1]. Even though, we can workaround it, it
is a significant user experience issue, so we must fix it.

[1] https://wso2.org/jira/browse/DS-1128

Cheers,
Anjana.

On Fri, Oct 23, 2015 at 2:38 AM, Rajith Vitharana <raji...@wso2.com> wrote:

> Hi,
>
> This is the first release candidate of WSO2 DSS 3.5.0
>
> This release fixes the following issues:
> https://wso2.org/jira/browse/DS-1126?filter=12469
>
> Please download, test and vote. The vote will be open for 72 hours or as
> needed.
>
> Source & binary distribution files:
> https://svn.wso2.org/repos/wso2/scratch/DSS/3.5.0/RC1/wso2dss-3.5.0.zip
>
> JavaDocs
> https://svn.wso2.org/repos/wso2/scratch/DSS/3.5.0/RC1/javaDocs/index.html
>
> Maven staging repo:
> http://maven.wso2.org/nexus/content/repositories/orgwso2dss-045/
>
> The tag to be voted upon:
> https://github.com/wso2/product-dss/tree/v3.5.0-RC1
>
>
> [ ] Broken - do not release (explain why)
> [ ] Stable - go ahead and release
>
> Thanks,
> The WSO2 DSS Team
>
> ___
> Architecture mailing list
> Architecture@wso2.org
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] [DAS][BPS]Business process monitoring dashboard

2015-10-01 Thread Anjana Fernando
Hi Srinath,

We should not generally recommend to output data to a standard RDBMS, and
to query using SQL, then we lose the portability of the functionality we
have with DAS, when using it's DAL. That is, if you change it to some other
database server, e.g. Cassandra, HBase etc.. we would not be able to do
that, where if we use the standard APIs exposed by DAS, they will always be
available.

Cheers,
Anjana.

On Thu, Oct 1, 2015 at 2:42 PM, Srinath Perera <srin...@wso2.com> wrote:

> I chatted with Chathura.
>
> We can use spark to aggregate data grouped by user and task-id and save it
> a SQL DB. Then we can use SQL query (called from the UI) to get the data
> for a specific task-id.
>
> Thanks
> Srinath
>
> On Thu, Oct 1, 2015 at 1:21 PM, Anjana Fernando <anj...@wso2.com> wrote:
>
>> Hi Chathura,
>>
>> The only way you can pass a parameter to a query as such in a script
>> would be to use an UDF. This is mentioned in the docs on how to do it. But
>> I'm wondering, if this would also be proper. Since, these are scheduled
>> batch scripts, and will most probably take some time to again start
>> executing and finish it. So a user setting from a UI setting this
>> parameters, not sure if it's practical. Like, it cannot be used in a
>> dashboard, where the results are expected quickly. You may also want to
>> check out indexing functionality, where you can most probably use a static
>> query for the batch operation, and when inserting the resultant summarized
>> data, you can index it, so you can quickly look it up using time ranges and
>> so on. Also, there is a possibility to bypass Spark SQL altogether using
>> our aggregates features in our indexing functionality.
>>
>> @Gimantha, is [1] the only documentation we have on the indexing
>> aggregation features? .. if so, please update it to be more comprehensive.
>> It is better if we can give side by side solutions onto how we do
>> aggregates in SQL, and the comparable approach we would do in our indexing
>> features.
>>
>> [1]
>> https://docs.wso2.com/display/DAS300/Retrieving+Aggregated+Values+of+Given+Records+via+JS+API
>>
>> Cheers,
>> Anjana.
>>
>> On Wed, Sep 30, 2015 at 10:43 PM, Chathura Ekanayake <chath...@wso2.com>
>> wrote:
>>
>>> Process monitoring graphs in [1] were proposed to give some level of top
>>> to bottom analysis. For example, a business analyst may first identify slow
>>> performing processes using the graph number 2. Then he can analyze
>>> bottleneck tasks of those slow processes from the graph number 10, where he
>>> has to generate graph 10 for each slow process. Then he can further analyze
>>> the users who performed bottleneck tasks frequently by generating graph
>>> number 11 for each slow task. Therefore, ability to execute parameterized
>>> queries is critical for these process monitoring features.
>>>
>>> a.)  Is that possible in DAS side ?
>>>>
>>>> eg: SELECT processDefinitionId, COUNT(processInstanceId) AS
>>>> processInstanceCount, AVG(duration) AS avgExecutionTime FROM
>>>> BPMNProcessInstances WHERE date BETWEEN *"fromDate" *AND* "toDate" *GROUP
>>>> BY processDefinitionId;
>>>> (here *fromDate* and *toDate* are variables that need to be passed at
>>>> runtime)
>>>>
>>>> b.) If not we can store the summarized data with primary and secondary
>>>> filters which mentioned in [1]  on DAS and then we can fetch them through
>>>> DAS REST API by passing appropriate parameters.
>>>>
>>>
>>> Isuru, I think the approach (b) does not scale.  There can be hundreds
>>> of processes and thousands of tasks (in all processes). Therefore, it is
>>> not practical to pre-compute data for all graphs.
>>>
>>> Ability to execute parameterized queries or to provide queries at
>>> runtime through an API would be helpful to solve this problem.
>>>
>>> [1]
>>> https://docs.google.com/a/wso2.com/spreadsheets/d/1pQAK6x4-rL-hQA7-NOaoT2llyjxv_nfc_vUarwUr74w/edit?usp=sharing
>>>
>>> Regards,
>>> Chathura
>>>
>>>
>>>
>>>
>>
>>
>> --
>> *Anjana Fernando*
>> Senior Technical Lead
>> WSO2 Inc. | http://wso2.com
>> lean . enterprise . middleware
>>
>
>
>
> --
> 
> Srinath Perera, Ph.D.
>http://people.apache.org/~hemapani/
>http://srinathsview.blogspot.com/
>



-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Feature to send multiple operation requests in a single request

2015-09-14 Thread Anjana Fernando
Hi Rajith,

Let's use the same "enableBoxcarring" flag for this improvement, since we
already have that, and just note that, begin_boxcar etc.. operations are
deprecated. The new operation, how about "request_box"?, since the "box"
term is anywhere there in "boxcarring".

Cheers,
Anjana.

On Fri, Sep 11, 2015 at 5:35 PM, Rajith Vitharana <raji...@wso2.com> wrote:

> Hi All,
>
> We thought of using "request_batch" as the reserved operation name and
> "enableRequestBatch" as the parameter in dbs, But this may confuse end
> users as we already have "enableBatchRequest" parameter in the dbs. So it
> would be better if we can change this to suitable name. Appreciate any
> feedback on this.
>
> Thanks,
>
> On Fri, Sep 11, 2015 at 4:17 PM, Rajith Vitharana <raji...@wso2.com>
> wrote:
>
>> Hi Vidura,
>>
>>
>> On Fri, Sep 11, 2015 at 4:07 PM, Vidura Gamini Abhaya <vid...@wso2.com>
>> wrote:
>>
>>> Thanks Rajith.
>>>
>>> Would we still keep the semantics the same? i.e. client calls,
>>>
>>> stub.begin_requestbox();
>>> stub.operation1(foo, bar);
>>> stub.operation2(bar);
>>> stub.end_requestbox();
>>>
>> No it'll going to be a single call, which will be the one in the initial
>> mail. It will contain all the operations required within that.
>>
>>>
>>> How are we planning to get the code that does the collating on to the
>>> client?  Would the users be forced to use a our tools to generate the stubs?
>>>
>> I don't think there will be any issue since we are providing WSDL with
>> the required operations, which will also contain new "request_box"
>> operation as well.
>>
>> Thanks,
>>
>> --
>> Rajith Vitharana
>>
>> Software Engineer,
>> WSO2 Inc. : wso2.com
>> Mobile : +94715883223
>> Blog : http://lankavitharana.blogspot.com/
>>
>
>
>
> --
> Rajith Vitharana
>
> Software Engineer,
> WSO2 Inc. : wso2.com
> Mobile : +94715883223
> Blog : http://lankavitharana.blogspot.com/
>



-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] How to Ship Fraud Solution?

2015-08-13 Thread Anjana Fernando
Hi Srinath,

Yeah, we should be able to do that. The dashboard have the capability to
add a static web page to the dashboard, so we can put it in like that. So
yeah, we can test it now and see, how it will work, and we can host the
toolbox separately. That is, it doesn't necessarily have to go with the
product itself.

Cheers,
Anjana.

On Thu, Aug 13, 2015 at 10:47 AM, Srinath Perera srin...@wso2.com wrote:

 Hi Anjana,

 Can we ship it as a car file that people can just download and install to
 DAS? Would that work in coming release?

 Thanks
 Srinath

 --
 
 Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
 Site: http://people.apache.org/~hemapani/
 Photos: http://www.flickr.com/photos/hemapani/
 Phone: 0772360902




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Carbon datasource implementation for Cassandra

2015-07-24 Thread Anjana Fernando
Hi Gokul,

Thanks, the pull request is merged now.

Cheers,
Anjana.

On Fri, Jul 24, 2015 at 12:31 PM, Gokul Balakrishnan go...@wso2.com wrote:

 Hi Devs,

 I've completed implementation of $subject based on the DataStax Java
 driver. This component enables connection to a Cassandra cluster through
 its CQL interface, and provides the client with a
 com.datastax.driver.core.Cluster reference based on which
 com.datastax.driver.core.Session instances could be created for use by the
 client. CQL native protocol versions v1 through v3 are supported.

 Provisions have been made for specifying most connection parameters
 through the datasource configuration, including protocol, pool, socket and
 query options. A sample configuration would look like the following:

 provider
 org.wso2.carbon.datasource.reader.cassandra.CassandraDataSourceReader/
 provider

 datasource
 nameWSO2_ANALYTICS_EVENT_STORE_CASSANDRA/name
 descriptionThe datasource used for analytics record store/
 description
 definition type=CASSANDRA
 configuration
 contactPoints192.168.1.1, 192.168.1.2/contactPoints
 port9042/port
 usernameadmin/username
 passwordadmin/password
 clusterNamecluster1/clusterName
 compressiongzip/compression
 poolingOptions
 coreConnectionsPerHost hostDistance=LOCAL8/
 coreConnectionsPerHost
 maxSimultaneousRequestsPerHostThreshold hostDistance
 =REMOTE256/maxSimultaneousRequestsPerHostThreshold
 /poolingOptions
 queryOptions
 fetchSize100/fetchSize
 consistencyLevelLOCAL_ONE/consistencyLevel
 serialConsistencyLevelSERIAL/serialConsistencyLevel
 /queryOptions
 socketOptions
 keepAlivetrue/keepAlive
 tcpNoDelaytrue/tcpNoDelay
 sendBufferSize15/sendBufferSize
 connectTimeoutMillis12000/connectTimeoutMillis
 readTimeoutMillis12000/readTimeoutMillis
 /socketOptions
 /configuration
 /definition
 /datasource

 I've sent the pull request for $subject at [1]. @DSS team, please review
 and merge.

 [1] https://github.com/wso2/carbon-data/pull/19

 Thanks,
 Gokul.

 --
 Gokul Balakrishnan
 Senior Software Engineer,
 WSO2, Inc. http://wso2.com
 Mob: +94 77 593 5789 | +1 650 272 9927




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] [DAS] Changing the name of Message Console

2015-07-12 Thread Anjana Fernando
Hi,

+1 for Data Explorer for message console. The name Spark Console is
fine the way it is now.

Cheers,
Anjana.

On Sun, Jul 12, 2015 at 7:59 AM, Niranda Perera nira...@wso2.com wrote:

 Hi all,

 DAS currently ships a UI component named 'message console'. it can be used
 to browse data inside the DAS tables.
 IMO this name message console, is misleading. for a person who's new to
 DAS would not know the exact use of it just by reading the name.

 I suggest a more self-explanatory name such as, 'data explorer', 'data
 navigator' etc

 WDYT?

 --
 *Niranda Perera*
 Software Engineer, WSO2 Inc.
 Mobile: +94-71-554-8430
 Twitter: @n1r44 https://twitter.com/N1R44
 https://pythagoreanscript.wordpress.com/




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] [NTask] What are the distinct purposes of setProperties(), init() and execute() methods of Task interface ?

2015-06-03 Thread Anjana Fernando
Actually, Madhawa, shall we fix it with a new version of ntask component.
As Sagara mentioned, put just a single method called execute(MapString,
String properties), and remove all the other methods. Please create a
JIRA for it, and fix it.

Cheers,
Anjana.

On Wed, Jun 3, 2015 at 8:58 AM, Anjana Fernando anj...@wso2.com wrote:

 Hi Sagara,

 Yes, you're correct, earlier when this was designed first, thinking, it
 would work in the way that, properties set first, init called once, and
 execute called multiple times. But later on, it was discovered actually,
 that Quartz creates instances of the task implementations and calls this
 every time. At that time, I didn't properly make the changes to reflect
 this behavior, and I agree it is a bit misleading. This has to be fixed
 properly eventually. For now, we explicitly has to remember that is how the
 flow will work.

 Cheers,
 Anjana.

 On Wed, Jun 3, 2015 at 6:42 AM, Sagara Gunathunga sag...@wso2.com wrote:


 org.wso2.carbon.ntask.core.Task interface has defined following 3
 methods.

 setProperties(Map map)
 init()
 execute()

 According to my understanding it's obvious to think setProperties() and
 init() as task's lifecycle methods and call only one time during
 initialization while  execute() method is call by scheduler several times
 depend on cron expression.

 I wrote very simple Registry Task [1] and tested, it seems all 3 methods
 runs several times.  I only expect to run execute() method N times but
 actual result is all 3 methods run N times.  Little debugging revealed
 during the TaskQuartzJobAdapter:execute()[2]  method it calls above 3
 methods one after another as follows.


 *task.setProperties(properties);*
 int tenantId =
 Integer.parseInt(properties.get(TaskInfo.TENANT_ID_PROP));
 try {
 PrivilegedCarbonContext.startTenantFlow();

 PrivilegedCarbonContext.getThreadLocalCarbonContext().setTenantId(tenantId,
 true);
 *task.init();*
 *task.execute();*
 }

 With this I have following questions.

 1.) What are the distinct design objectives of above 3 methods ?

 2.) If TaskQuartzJobAdapter implementation is correct then why we need 3
 distinct methods ? IMO *execute(properties) * can provide all these
 capabilities ?


 [1] - https://docs.wso2.com/display/Governance460/Scheduled+Task+Sample
 [2] -
 https://github.com/wso2/carbon-commons/blob/master/components/ntask/org.wso2.carbon.ntask.core/src/main/java/org/wso2/carbon/ntask/core/impl/TaskQuartzJobAdapter.java


 Thanks !
 --
 Sagara Gunathunga

 Architect; WSO2, Inc.;  http://wso2.com
 V.P Apache Web Services;http://ws.apache.org/
 Linkedin; http://www.linkedin.com/in/ssagara
 Blog ;  http://ssagara.blogspot.com




 --
 *Anjana Fernando*
 Senior Technical Lead
 WSO2 Inc. | http://wso2.com
 lean . enterprise . middleware




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Searching registry artifacts in the enterprise store

2015-05-24 Thread Anjana Fernando
Hi,

Yeah, we simply use the Lucene query syntax. There was no reason for us to
create our own on top of it, because it provides a very powerful syntax to
query the data. For example, Elastic also use Lucene query language for
there solution. I'm not sure, for registry if this is suitable or not, as
in, by giving the full power to the user to query all the attributes
indexed, and whether some should be filtered/hidden from the end user.

Cheers,
Anjana.

On Mon, May 25, 2015 at 8:53 AM, Srinath Perera srin...@wso2.com wrote:

 Shazni, is backend our code? if so we can fix it. Or we can translate from
 simpler version to complex version automatically in our code. I also think
 it should be country=usa.

 Also, BAM had the same problem and gone with Solr syntax. I am not sure
 what is the right answer, but pretty use it should be same for both.
 Sagara, Anjana please talk.

 --Srinath



 On Fri, May 22, 2015 at 5:58 PM, Shazni Nazeer sha...@wso2.com wrote:

 @Manuranga - Fair question. But that's the way the search attribute
 service in the backend expects. Further, the query I have given is
 specifically to query a property in the artifact. So specifying
 country=usa, we should internally find out that it's a property that the
 user is querying. And for your concern that convenient method is not that
 convenient, that's what the question is all about; whether to keep the
 query as it's or use a different syntax and pass the attribute map to the
 search service within the method.

 Shazni Nazeer
 Mob : +94 37331
 LinkedIn : http://lk.linkedin.com/in/shazninazeer
 Blog : http://shazninazeer.blogspot.com

 On Fri, May 22, 2015 at 5:29 PM, Manuranga Perera m...@wso2.com wrote:

 That convenient method is not that convenient.

 Why
 propertyName=countryrightOp=eqrightPropertyValue=usa
 Instead
 country=usa
 ?

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture



 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 
 Srinath Perera, Ph.D.
http://people.apache.org/~hemapani/
http://srinathsview.blogspot.com/




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Carbon Datasource Reader Implementation for Apache Hadoop

2015-05-14 Thread Anjana Fernando
Hi Srinath,

Yeah, I'd a chat with Gokul yesterday, we are changing this to HDFS and
also having another HBase one as well, I think he has already done the
changes. @Gokul, please send the updated information.

Cheers,
Anjana.

On Thu, May 14, 2015 at 1:10 PM, Srinath Perera srin...@wso2.com wrote:

 Can we call type HDFS instead of Hadoop? ( if we can change that without
 much trouble)

 On Tue, May 12, 2015 at 8:38 PM, Gokul Balakrishnan go...@wso2.com
 wrote:

 Hi all,

 As part of the HBase analytics datasource implementation for DAS 3.0, we
 have come up with $subject which is envisioned to offer a standardised way
 to specify connectivity parameters for a remote Hadoop-based instance in a
 Carbon datasource configuration.

 The datasource reader will expect the configuration to be specified in a
 similar format which is used for standard Apache Commons Configuration [1],
 as used by both HDFS and HBase. An example datasource definition would look
 like:

 datasource
 nameWSO2_ANALYTICS_FS_DB_HDFS/name
 descriptionThe datasource used for analytics file system/
 description
 jndiConfig
 namejdbc/WSO2HDFSDB/name
 /jndiConfig
 definition type=HADOOP
 configuration
 property
 namefs.default.name/name
 valuehdfs://localhost:9000/value
 /property
 property
 namedfs.data.dir/name
 value/dfs/data/value
 /property
 property
 namefs.hdfs.impl/name
 valueorg.apache.hadoop.hdfs.DistributedFileSystem/
 value
 /property
 property
 namefs.file.impl/name
 valueorg.apache.hadoop.fs.LocalFileSystem/value
 /property
 /configuration
 /definition
 /datasource

 The definition type for the above is set as HADOOP. The datasource
 reader implementation is currently hosted at [2], and would be merged with
 the carbon-data git repo once reviewed.

 Appreciate your thought and suggestions.

 Thanks,
 Gokul.

 [1] http://commons.apache.org/proper/commons-configuration/

 [2]
 https://github.com/gokulbs/carbon-data/tree/master/components/data-sources/org.wso2.carbon.datasource.reader.hadoop

 --
 Balakrishnan Gokulakrishnan
 Senior Software Engineer,
 WSO2, Inc. http://wso2.com
 Mob: +94 77 593 5789 | +1 650 272 9927

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 
 Srinath Perera, Ph.D.
http://people.apache.org/~hemapani/
http://srinathsview.blogspot.com/

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Analytics Facets APIs in AnalyticsDataService

2015-03-24 Thread Anjana Fernando
Gimantha, it is better if you can give some possible use cases for each of
the features we have listed here, so people can get an better understanding
of what to use where.

Cheers,
Anjana.

On Thu, Mar 19, 2015 at 8:23 PM, Gimantha Bandara giman...@wso2.com wrote:

 Hi all,

 Analytics facets APIs provide indexing capabilities for hierarchical
 categorization of table entries in New analytics data service (Please refer
 to [Architecture] BAM 3.0 REST APIs for AnalyticsDataService / Indexing /
 Search for more information). Using facet APIs, an user can define
 multiple categories as indices for a table and later can be used to search
 table entries based on categories. These APIs will be generic, so the user
 can assign a weight for each category when indexing, combine a mathematical
 function to calculate weights,

 *Facet Counts*

 As an example in log analysis, consider the following
 E.g. log-time : 2015/mar/12/ 20:30:23, 2015/jan/16 13:34:76, 2015/jan/11
 01:34:76 ( in 3 different log lines)

 In the above example the log time can be defined as a hierarchical facet
 as year/month/date. Later if the user wants to get the counts of log
 entries by year/month, API would return

 2015/jan  - Count :2
 2015/mar  - Count 1

 If the user wants to get the total count of log entries by year, API
 would return

 2015 - Count :3

 If the user wants to get the count of log entries by year/month/date,
 API returns,

 2015/jan/11 - Count :1
 2015/jan/16 -  Count :1
 2015/mar/12 - Count : 1

 *Drill-Down capabilities*

 Dill down capabilities are provided by Facets APIs. User can drill down
 through the facet hierarchy of the index and search table entries. User
 also can combine a search query so he can filter out the table entries. As
 an example, in above example, User queries for the total count of log lines
 in 2015/jan/11 ( he gets 1 as the count) and then he wants to view the
 other attributes of the log line ( TID, Component name, log level, ..etc).


 *REST APIs for Facets*

 Users will be able to use facets API through REST APIs. Users can create
 facets indices via the usual Analytics indexing REST APIs and insert
 hierarchical category information through Analytics REST APIs, Following
 are the updated Analytics REST APIs.

 1. Drill-down through a facets hierarchy

 /analytics/drilldown or /analytics/drilldown-count

 {
tableName :
categories : [{
   name : hierarchy name  e.g. Publish date
   categoryPath : [ ], hierarchy as an array e.g.
 [2001, March, 02]
   }],
language :  lucene or regex
query  :  lucene query or regular expression
scoreFunction : Javascript function to define scoring function
scoreParams : [] Array of docvalue fields used as parameters for
 scoring function
 }


 2. Querying for Ranges (Additional to facets)

 /analytics/searchrange or /analytics/rangecount

  {
tableName : sample-table-name,
ranges : [{
 label:
 from :
 to:
 minInclusive:
 maxInclusive:
  }],
 language :
 query :
 }


 In addition to the existing index types two more are introduced. They are
 FACET and SCOREPARAM. FACET is used to define a hierarchical facet
 field and SCOREPARAM is used to define scoring parameters for score
 function.

 *Adding Facet fields and score fields*

 *to a table/tables*
 Facet fields and score fields need to be defined using indexing APIs.

 /analytics/tables/table-name/indices

 {
   field : STRING,
   facetField : FACET,
   scoreField : SCOREPARAM
 }

 Later user can add facet and score fields using POST to,

  /analytics/tables/table-name
 [
   {
  values : {
field : value,
facetField : {
 weight :
 categoryPath : [ ]
   },
 scoreField : numeric-value
}
   }
 ]

 or /analytics/records

 [
   {
  tableName :
  values : {
field : value,
facetField : {
 weight :
 categoryPath : [ ]
   },
 scoreField : numeric-value
}
   }
 ]

 Feedback and suggestions are appreciated.

 --
 Gimantha Bandara
 Software Engineer
 WSO2. Inc : http://wso2.com
 Mobile : +94714961919




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


[Architecture] WSO2 BAM 3.0 M2 Released!

2015-03-21 Thread Anjana Fernando
The WSO2 BAM team is pleased to announce the second milestone release of
WSO2 BAM v3.0. The distribution is available at [1]. The release includes
the following new features.
New Features

   - [BAM-1957 https://wso2.org/jira/browse/BAM-1957] - Spark Script
   Scheduling
   - [BAM-1959 https://wso2.org/jira/browse/BAM-1959] - Support Spark
   Clustering


The documentation for BAM v3.0 can be found at [2]. Your feedback is most
welcome, and any issues can be reported to the project at [3].

[1] https://svn.wso2.org/repos/wso2/people/anjana/BAM30/wso2bam-3.0.0-M2.zip
[2]
https://docs.wso2.com/display/BAM300/WSO2+Business+Activity+Monitor+Documentation
[3] https://wso2.org/jira/browse/BAM

- WSO2 BAM Team

-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


[Architecture] WSO2 BAM 3.0 M1 Released!

2015-02-26 Thread Anjana Fernando
The WSO2 BAM team is pleased to announce the first milestone release of
WSO2 BAM v3.0. The distribution is available at [1]. The release includes
the following new features.
New Features

   - [BAM-1948 https://wso2.org/jira/browse/BAM-1948] - Data Abstraction
   Layer for Analytics
   - [BAM-1949 https://wso2.org/jira/browse/BAM-1949] - Spark SQL based
   Analytics Query Execution
   - [BAM-1950 https://wso2.org/jira/browse/BAM-1950] - DataPublisher
   Rewrite
   - [BAM-1951 https://wso2.org/jira/browse/BAM-1951] - RDBMS Datasource
   Support
   - [BAM-1952 https://wso2.org/jira/browse/BAM-1952] - REST APIs for
   Analytics Data Service
   - [BAM-1953 https://wso2.org/jira/browse/BAM-1953] - CLI like UI
   interface for Spark Integration

The documentation for BAM v3.0 can be found at [2]. Your feedback is most
welcome, and any issues can be reported to the project at [3].

[1] https://svn.wso2.org/repos/wso2/people/gihan/wso2bam-3.0.0-M1.zip
[2]
https://docs.wso2.com/display/BAM300/WSO2+Business+Activity+Monitor+Documentation
[3] https://wso2.org/jira/browse/BAM

- *WSO2 BAM Team*
-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] [BAM] [Security] Securing REST API

2015-02-04 Thread Anjana Fernando
On Wed, Feb 4, 2015 at 5:15 AM, Prabath Siriwardena prab...@wso2.com
wrote:

 If you say Basic Auth is easy - then there is no difference in using OAuth
 too:-)

 Basically the resource owner credentials grant type was introduced in
 OAuth to migrate clients from Basic/Digest authentication into OAuth...

 By looking at the use case - its clearly something to do with the access
 delegation. One server needs to access a resource (API) on behalf another
 user.. it clearly something to do with OAuth.


Yes, that's true :) .. guess the simple username/password scenario also can
be covered with OAuth, if the requirement comes.

Cheers,
Anjana.



 Thanks  regards,
 -Prabath


 On Tue, Feb 3, 2015 at 3:21 AM, Anjana Fernando anj...@wso2.com wrote:

 Yes, I guess, we should anyway give the ability for users to use the API
 with something simple like basic auth (if it makes sense for a specific
 scenario), and then also support something like OAuth for other scenarios,
 like here, we are talking about, internally using it from our dashboards
 etc.. for accessing the backend APIs.

 Cheers,
 Anjana.

 On Tue, Feb 3, 2015 at 4:44 PM, Isabelle Mauny isabe...@wso2.com wrote:

 All,

 Who is going to use those REST APIs ? And from where ? While I agree
 with all the discussion about making the APIs secure, it's kind of
 pointless without a usage context.
 Generating/managing an OAuth token is not easy from the client side, if
 the REST APIs are used from a script for example, OAuth might not be
 optimal. Would the APIs be exposed externally for any reason ( to the
 general public ? ) - We had that problem with G-Reg before, with users
 incapable to integrate with G.REG due to the requirement of an OAuth token.
 Shouldn't we leave people a choice ?

 Isabelle.
 __


 *Isabelle Mauny*VP, Product Management; WSO2, Inc.;  http://wso2.com/

 On Feb 3, 2015, at 11:53 AM, Manuranga Perera m...@wso2.com wrote:

 Hi Johann,
 so if a user is logged is using SAML, is there a way we call a OAuth2
 API form the front end js (via REST) directly without going through a proxy?

 On Tue, Feb 3, 2015 at 11:22 PM, Johann Nallathamby joh...@wso2.com
 wrote:

 The discussion is about how to secure APIs, and OAuth2 is the popular
 choice here.

 How to do SSO to the web front end is a separate question and OpenID
 Connect can be one possibility. Like others have mentioned in this thread
 above, there can be other ways to login to the web front end, e.g. SAML2
 SSO, username/password, etc. Depending on the login mechanism there are
 other grant types you may be able to use to secure APIs using OAuth2 such
 as SAML2 Bearer, Resource Owner Password, self-issued tokens, etc.

 OpenID Connect might be the ideal choice, but right now the limitation
 we have with OpenID Connect is that we don't support the session management
 protocol which is required for single logout.

 On Tue, Feb 3, 2015 at 5:18 AM, Manuranga Perera m...@wso2.com wrote:

 Hi Johann,

 As I understand (form Dulanja) we need OpenID Connect [1] to fully
 integrate with web front-end. so we can keep the token in fount end (in 
 JS)
 and do the call using REST. isn't that the way to go?

 [1] http://openid.net/connect/




 --
 Thanks  Regards,

 *Johann Dilantha Nallathamby*
 Associate Technical Lead  Product Lead of WSO2 Identity Server
 Integration Technologies Team
 WSO2, Inc.
 lean.enterprise.middleware

 Mobile - *+9476950*
 Blog - *http://nallaa.wordpress.com http://nallaa.wordpress.com/*




 --
 With regards,
 *Manu*ranga Perera.

 phone : 071 7 70 20 50
 mail : m...@wso2.com
  ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture



 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 *Anjana Fernando*
 Senior Technical Lead
 WSO2 Inc. | http://wso2.com
 lean . enterprise . middleware

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 Thanks  Regards,
 Prabath

 Twitter : @prabath
 LinkedIn : http://www.linkedin.com/in/prabathsiriwardena

 Mobile : +1 650 625 7950

 http://blog.facilelogin.com
 http://blog.api-security.org

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] [BAM] [Security] Securing REST API

2015-01-27 Thread Anjana Fernando
Hi,

I guess our admin services are also accessible via basic auth, isn't it? ..
We just thought, as a convenience method for the end user, they can use
their username/password to access our API if required. So basically, if
using OAuth, other than using SAML2 bearer token grant type or anything
similar, is it possible to use the login username/password to our dashboard
UI to generate the access token with resource owner credentials grant type
maybe? ..

Cheers,
Anjana.

On Tue, Jan 27, 2015 at 2:42 PM, Supun Malinga sup...@wso2.com wrote:

 Hi Gihan,

 IMO using basic auth will make it vulnerable for dos attacks and less
 secure. So you need to think this thru.

 There is a possibility of authenticating already logged in users via the
 cookie data. But we will need to write a new cookie based oauth grant type
 for this. AFAIK we don't have such a grant type yet (Correct me if I'm
 wrong).

 On your latest note I think you can use the SAML2 grant type [0].

 [0]
 https://docs.wso2.com/display/AM170/Token+API#TokenAPI-ExchangingSAML2bearertokenswithOAuth2(SAMLextensiongranttype)

 thanks,

 On Tue, Jan 27, 2015 at 1:48 PM, Gihan Anuruddha gi...@wso2.com wrote:

 No. We thought, it might convenient for the end user if we provide basic
 auth capabilities. We will integrate OAuth functionalities for our REST
 APIs.

 Regarding our requirement,  We have multiple dashboards that validate the
 user through single login page. How can we do the backend API
 communication?

 Regards,
 Gihan

 On Tue, Jan 27, 2015 at 12:02 PM, Sumedha Rubasinghe sume...@wso2.com
 wrote:

 Any particular reason for securing product APIs using Basic Auth?

 Products like G-Reg, CDM are using OAuth 2.0 tokens for this instead.

 On Tue, Jan 27, 2015 at 11:53 AM, Gihan Anuruddha gi...@wso2.com
 wrote:

 Hi All,

 We are going to use a set of REST API [1] to communicate with the data
 layer.  Basically, we are securing these REST APIs with basic auth. But we
 wanted to communicate with these REST APIs with already logged in user as
 well. Reason is we have a plan to use  these REST API in our Message
 console dashboard and we want to have SSO kind of a logging solution for
 these dashboards without any individual login pages.

 So is it possible to use existing HTTP session cookie and authenticate
 REST API calls or do we have to use OAuth with some specific grant types?

 Appreciate your inputs here?



 ​[1] - [Architecture] BAM 3.0 REST APIs for AnalyticsDataService /
 Indexing / Search
 --
 W.G. Gihan Anuruddha
 Senior Software Engineer | WSO2, Inc.
 M: +94772272595




 --
 /sumedha
 m: +94 773017743
 b :  bit.ly/sumedha




 --
 W.G. Gihan Anuruddha
 Senior Software Engineer | WSO2, Inc.
 M: +94772272595

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 Supun Malinga,

 Senior Software Engineer,
 WSO2 Inc.
 http://wso2.com
 email: sup...@wso2.com sup...@wso2.com
 mobile: +94 (0)71 56 91 321

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] BAM 3.0 Data Layer Implementation / RDBMS / Distributed Indexing / Search

2015-01-25 Thread Anjana Fernando
Hi Nirmal,

Yeah, it can be re-used, if it only meets your criteria though. There are
specific functionality we expect from this data layer, for example, look at
the AnalyticsRecordStore interface, which contains the basic record
storage, with timestamp and pagination support. Basically, we can't make it
too generic also. We can discuss more and see.

Cheers,
Anjana.

On Mon, Jan 26, 2015 at 11:10 AM, Nirmal Fernando nir...@wso2.com wrote:

 Hi Anjana,

 Isn't this a generic interface to talk to a back-end data store? If so, do
 you think this can be reused in other products?  In ML, we have a similar
 use-case where we need to talk to a generic data layer to store the models
 that are generated.

 On Wed, Dec 10, 2014 at 1:37 PM, Anjana Fernando anj...@wso2.com wrote:

 Hi,

 I've finished the initial implementation of $subject. This basically
 contains the standard interfaces we use to plug-in different data sources
 as the back-end record storage, and for indexing purposes. These pluggable
 data sources are called Analytics Data Sources here, where from a
 configuration file, you can give the implementation class and the
 properties required for the initialization. The first implementation of
 this is done, which is the RDBMS implementation. It basically stores all
 the records and other data in a relational database, and any type of
 database can be supported via a configuration file, which gives the query
 templates used to define a standard set of actions. At the moment, H2 and
 MySQL query templates have been tested, and we will be adding the rest of
 popular RDBMS templates as well. The RDBMS AnalyticsDataSource
 implementation detects the query template by looking at the database
 connection information, retrieved from the data source (e.g. mentioned in
 master-datasources.xml), and automatically switches to that mode, so the
 user basically doesn't have to do anything when configuring.

 Also, inside the AnalyticsDataSource interface, there is a FileSystem
 interface you need to implement for your data source implementation, which
 is basically used for indexing, which is done by Lucene. We use Lucene
 indexes as index shards for a distributed index and search. So with the
 sharding approach, we can add more nodes to our cluster to improve the
 indexing performance, and for storage addition. Basically, provided the
 backend storage is scalable, the index operations also would be scalable in
 the same manner. But the limit we first hit is the processing requirements,
 and the random data access and locking requirements for each shard, so for
 a typical database system, just by adding new BAM nodes, I'm hoping the
 indexing performance will almost increase linearly.

 The AnalyicsDataSource implementations are finally used by a component
 called AnalyticsDataService, which is the interface seen by clients, and
 has the indexing related operations with the record store functionality
 exposed through AnalyticsDataSource. This interface can be looked up as an
 OSGi service, and we plan on also exposing these functionality as a JAX-RS
 service.

 The general design, and documentation on the test cases can be found here
 at [1] and [2], and the source code at [3]. I will be doing some further
 performance tests, by integrating this to the product properly, specially
 the distributed search, and will provide the results here. For the moment,
 we have a few performance tests as unit tests in the modules. This
 implementation will be first used by the log analysis implementation done
 by Gimantha. And we are planning on writing further AnalyticsDataSource
 implementations for this, such as MongoDB, HBase etc.. There will be
 separate notes on those.

 [1]
 https://docs.google.com/a/wso2.com/spreadsheets/d/10mHRE6FEgF6wDZ-LSBx18zL8ZcIay5ZIhb8MIk7pfeg/edit#gid=0
 [2]
 https://docs.google.com/a/wso2.com/spreadsheets/d/1iXoZ8BzaefN3EGOL05y5aUX6SLZH7Bu8YM4bF3xOSvQ/edit#gid=0
 [3]
 https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics

 Cheers,
 Anjana.
 --
 *Anjana Fernando*
 Senior Technical Lead
 WSO2 Inc. | http://wso2.com
 lean . enterprise . middleware

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --

 Thanks  regards,
 Nirmal

 Senior Software Engineer- Platform Technologies Team, WSO2 Inc.
 Mobile: +94715779733
 Blog: http://nirmalfdo.blogspot.com/





-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] BAM 3.0 REST APIs for AnalyticsDataService / Indexing / Search

2015-01-20 Thread Anjana Fernando




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


[Architecture] Replacing BAM Toolbox Format with CAR

2015-01-16 Thread Anjana Fernando
Hi everyone,

From BAM 3.0, we are thinking of replacing the toolbox packaging to CAR
files. The main motive for this came with CEP also requiring a packaging
format for their artifacts. So either, they also needed to use our toolbox
format, or else, go to a CAR packaging format, which is used with other
artifacts in the platform.

So basically, as I feel, our artifacts like, stream definitions, analytics
scripts, UI pages are also in the same category as ESBs sequences, proxies,
endpoints etc.. so if they also don't use a new packaging format, but
rather use CAR, we also don't have a special reason to have a separate one.
So for these reasons, and also not to have too many packaging formats in
the platform, we also though of going with the standard model with CAR.

CEP have already suggested this for their artifacts in the thread [1].

If there are any concerns, please shout.

[1] [Architecture] cApp deployer support for WSO2 CEP

Cheers,
Anjana.
-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] RFC: Building a Generic Configurable UI Gadget for Analytics

2014-12-14 Thread Anjana Fernando
Hi,

I guess, for BAM 3.0, this can be the base for our eventual KPI
implementation as well. We will just need some additional functionality to
provide some limits to the data/visualizations we are having, and to show
it in an appropriate way, and to trigger alerts etc.. Looking forward to
checking out the initial implementation of this, so probably the BAM team
can enhance it with the other required features.

Cheers,
Anjana.

On Mon, Dec 8, 2014 at 7:42 PM, Srinath Perera srin...@wso2.com wrote:

 Currently to visualize the data, users have to write their own gadgets. If
 a advanced user this is OK, but not for all. Specially, things like drill
 downs need complicated planning.

 I believe it is possible to start with data in tabular form, and write a
 generic Gadget that let user configure and create his own data chart with
 filters and drill downs.

 Chart could look like following ( some of the controls can be hidden under
 configure button)

 ​
 Lets work though an example.

 1) Key idea is that we load data to the Gadget as a table (always).
 Following can be a example data.
 *Country* *Year* *GDP* *Population* *LifeExpect*  Sri Lanka 2004 20
 19435000 73  Sri Lanka 2005 24 19644000 73  Sri Lanka 2006 28 19858000 73  Sri
 Lanka 2007 32 20039000 73
 2) When Gadget is loaded, it shows the data as a table. User can select
 and add a data type and fields.  Following are some example.

1. Line - two Numerical  fields
2. Bar - one numerical, one categorical field
3. Scatter - two numerical fields
4. Map - Location field + categorical or numerical field
5. Graph - two categorical or string fields that provide links


 3) Let user add more information to the chart using other fields in the
 table

1. Add  color (Categorical field) or shade (numerical field) to the
plot (e.g. Use different color for each country)
2. Point Size - Numerical field (e.g. Adjust the point size in the
scatter plot according to the population)
3. Label - any field

 4) Then he can add filters based on a variable. Then the chart will have
 sliders (for numerical data) and tick buttons (for categorical data). When
 those sliders are changed they will change the chart.

 5) Final step is define drill downs. Drill downs are done using two
 columns in the table that has hierarchical relationships. (e.g. Country and
 State fields, Year and month fields) . We need users to select two of those
 fields and tell us about relationships and then we can code the support of
 drill downs.

 When above steps are done, user save configs and save it in the DataViz
 store as a visualisation, so others can pull it and use it.

 This will not cover all cases, but IMO it will cover 80% and also a very
 good tool for demos etc.

 Please comment

 --Srinath



 --
 
 Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
 Site: http://people.apache.org/~hemapani/
 Photos: http://www.flickr.com/photos/hemapani/
 Phone: 0772360902



-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] RFC: Building a Generic Configurable UI Gadget for Analytics

2014-12-14 Thread Anjana Fernando
More correctly, most probably BAM 3.1 plans :) ..

Cheers,
Anjana.

On Mon, Dec 15, 2014 at 7:16 AM, Anjana Fernando anj...@wso2.com wrote:

 Hi,

 I guess, for BAM 3.0, this can be the base for our eventual KPI
 implementation as well. We will just need some additional functionality to
 provide some limits to the data/visualizations we are having, and to show
 it in an appropriate way, and to trigger alerts etc.. Looking forward to
 checking out the initial implementation of this, so probably the BAM team
 can enhance it with the other required features.

 Cheers,
 Anjana.

 On Mon, Dec 8, 2014 at 7:42 PM, Srinath Perera srin...@wso2.com wrote:

 Currently to visualize the data, users have to write their own gadgets.
 If a advanced user this is OK, but not for all. Specially, things like
 drill downs need complicated planning.

 I believe it is possible to start with data in tabular form, and write a
 generic Gadget that let user configure and create his own data chart with
 filters and drill downs.

 Chart could look like following ( some of the controls can be hidden
 under configure button)

 ​
 Lets work though an example.

 1) Key idea is that we load data to the Gadget as a table (always).
 Following can be a example data.
 *Country* *Year* *GDP* *Population* *LifeExpect*  Sri Lanka 2004 20
 19435000 73  Sri Lanka 2005 24 19644000 73  Sri Lanka 2006 28 19858000 73  
 Sri
 Lanka 2007 32 20039000 73
 2) When Gadget is loaded, it shows the data as a table. User can select
 and add a data type and fields.  Following are some example.

1. Line - two Numerical  fields
2. Bar - one numerical, one categorical field
3. Scatter - two numerical fields
4. Map - Location field + categorical or numerical field
5. Graph - two categorical or string fields that provide links


 3) Let user add more information to the chart using other fields in the
 table

1. Add  color (Categorical field) or shade (numerical field) to the
plot (e.g. Use different color for each country)
2. Point Size - Numerical field (e.g. Adjust the point size in the
scatter plot according to the population)
3. Label - any field

 4) Then he can add filters based on a variable. Then the chart will have
 sliders (for numerical data) and tick buttons (for categorical data). When
 those sliders are changed they will change the chart.

 5) Final step is define drill downs. Drill downs are done using two
 columns in the table that has hierarchical relationships. (e.g. Country and
 State fields, Year and month fields) . We need users to select two of those
 fields and tell us about relationships and then we can code the support of
 drill downs.

 When above steps are done, user save configs and save it in the DataViz
 store as a visualisation, so others can pull it and use it.

 This will not cover all cases, but IMO it will cover 80% and also a very
 good tool for demos etc.

 Please comment

 --Srinath



 --
 
 Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
 Site: http://people.apache.org/~hemapani/
 Photos: http://www.flickr.com/photos/hemapani/
 Phone: 0772360902



 --
 *Anjana Fernando*
 Senior Technical Lead
 WSO2 Inc. | http://wso2.com
 lean . enterprise . middleware



-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


[Architecture] BAM 3.0 Data Layer Implementation / RDBMS / Distributed Indexing / Search

2014-12-10 Thread Anjana Fernando
Hi,

I've finished the initial implementation of $subject. This basically
contains the standard interfaces we use to plug-in different data sources
as the back-end record storage, and for indexing purposes. These pluggable
data sources are called Analytics Data Sources here, where from a
configuration file, you can give the implementation class and the
properties required for the initialization. The first implementation of
this is done, which is the RDBMS implementation. It basically stores all
the records and other data in a relational database, and any type of
database can be supported via a configuration file, which gives the query
templates used to define a standard set of actions. At the moment, H2 and
MySQL query templates have been tested, and we will be adding the rest of
popular RDBMS templates as well. The RDBMS AnalyticsDataSource
implementation detects the query template by looking at the database
connection information, retrieved from the data source (e.g. mentioned in
master-datasources.xml), and automatically switches to that mode, so the
user basically doesn't have to do anything when configuring.

Also, inside the AnalyticsDataSource interface, there is a FileSystem
interface you need to implement for your data source implementation, which
is basically used for indexing, which is done by Lucene. We use Lucene
indexes as index shards for a distributed index and search. So with the
sharding approach, we can add more nodes to our cluster to improve the
indexing performance, and for storage addition. Basically, provided the
backend storage is scalable, the index operations also would be scalable in
the same manner. But the limit we first hit is the processing requirements,
and the random data access and locking requirements for each shard, so for
a typical database system, just by adding new BAM nodes, I'm hoping the
indexing performance will almost increase linearly.

The AnalyicsDataSource implementations are finally used by a component
called AnalyticsDataService, which is the interface seen by clients, and
has the indexing related operations with the record store functionality
exposed through AnalyticsDataSource. This interface can be looked up as an
OSGi service, and we plan on also exposing these functionality as a JAX-RS
service.

The general design, and documentation on the test cases can be found here
at [1] and [2], and the source code at [3]. I will be doing some further
performance tests, by integrating this to the product properly, specially
the distributed search, and will provide the results here. For the moment,
we have a few performance tests as unit tests in the modules. This
implementation will be first used by the log analysis implementation done
by Gimantha. And we are planning on writing further AnalyticsDataSource
implementations for this, such as MongoDB, HBase etc.. There will be
separate notes on those.

[1]
https://docs.google.com/a/wso2.com/spreadsheets/d/10mHRE6FEgF6wDZ-LSBx18zL8ZcIay5ZIhb8MIk7pfeg/edit#gid=0
[2]
https://docs.google.com/a/wso2.com/spreadsheets/d/1iXoZ8BzaefN3EGOL05y5aUX6SLZH7Bu8YM4bF3xOSvQ/edit#gid=0
[3]
https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics

Cheers,
Anjana.
-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Generic RDBMS Output adapter support

2014-12-01 Thread Anjana Fernando
 mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 Software Engineer
 WSO2 Inc.; http://wso2.com
 http://www.google.com/url?q=http%3A%2F%2Fwso2.comsa=Dsntz=1usg=AFQjCNEZvyc0uMD1HhBaEGCBxs6e9fBObg
 lean.enterprise.middleware

 mobile: *+94728671315 %2B94728671315*




 --
 Software Engineer
 WSO2 Inc.; http://wso2.com
 http://www.google.com/url?q=http%3A%2F%2Fwso2.comsa=Dsntz=1usg=AFQjCNEZvyc0uMD1HhBaEGCBxs6e9fBObg
 lean.enterprise.middleware

 mobile: *+94728671315 %2B94728671315*


 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Generic RDBMS Output adapter support

2014-11-28 Thread Anjana Fernando
Hi,

An approach we can follow is, to give query templates from a configuration
file, so we don't have to embed/hard-code any queries in the code. I'm
using this approach in our new BAM RDBMS connector implementation, where
I'm going to use an XML configuration to have separate sections for each
database type, the required queries. This implementation can be found here
[1], specifically look for something like
H2FileDBAnalyticsDataSourceTest, MySQLInnoDBAnalyticsDataSourceTest,
and QueryConfiguration classes. QueryConfiguration will actually be
converted to an JAXB mapping class in the future to represent a section in
the source XML file that has the mappings. Reading from the actual XML
configuration file is not yet done in my code.

[1]
https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics

Cheers,
Anjana.

On Fri, Nov 28, 2014 at 5:23 PM, Sriskandarajah Suhothayan s...@wso2.com
wrote:

 The main issue is that we have to solve is the ability to handle different
 syntax.
 Please have a look at DSS and Hive Table definitions (from BAM) they may
 help.

 Suho

 On Fri, Nov 28, 2014 at 4:49 PM, Damith Wickramasinghe dami...@wso2.com
 wrote:

 Hi,

 Currently we have the support only for Mysql and it is decided to
 implement a generic adapter to support any RDBMS database. For now adapter
 implementation will be focused on supporting Oracle, Mysql and H2.

 I will update the thread on decided architecture for the said requirement
 soon. Any feedbacks on the requirement will be greatly appreciated.

 Regards,
 Damith.

 --
 Software Engineer
 WSO2 Inc.; http://wso2.com
 http://www.google.com/url?q=http%3A%2F%2Fwso2.comsa=Dsntz=1usg=AFQjCNEZvyc0uMD1HhBaEGCBxs6e9fBObg
 lean.enterprise.middleware

 mobile: *+94728671315 %2B94728671315*




 --

 *S. Suhothayan*
 Technical Lead  Team Lead of WSO2 Complex Event Processor
  *WSO2 Inc. *http://wso2.com
 * http://wso2.com/*
 lean . enterprise . middleware


 *cell: (+94) 779 756 757 %28%2B94%29%20779%20756%20757 | blog:
 http://suhothayan.blogspot.com/ http://suhothayan.blogspot.com/twitter:
 http://twitter.com/suhothayan http://twitter.com/suhothayan | linked-in:
 http://lk.linkedin.com/in/suhothayan http://lk.linkedin.com/in/suhothayan*

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] RFC: Doing Bulk Events Updates to HDFS instead of Cassandra

2014-11-06 Thread Anjana Fernando
Hi Sanjiva,

On Thu, Nov 6, 2014 at 4:01 PM, Sanjiva Weerawarana sanj...@wso2.com
wrote:

 Anjana I think the idea was for the file system - HDFS upload to happen
 via a simple cron job type thing.


Even so, we will be just moving the problem to another area, the overall
effort done by that hardware is still the same (writing to disk, reading it
back, write it to network). That is, even though we can goto very a high
throughput initially by writing it to the local disk at first, later on we
have to read it back and write it to HDFS via the network, which is the
slower part of our operation. So if we continue to load the machine with an
extreme throughput, you will eventually lose space in that disk.

Cheers,
Anjana.



 On Wed, Nov 5, 2014 at 9:19 AM, Anjana Fernando anj...@wso2.com wrote:

 Hi Srinath,

 Wouldn't it better, if we just make the batch size bigger, that is, lets
 just have a sizable local in-memory store, something probably close to
 64MB, which is the default HDFS block size, and only after this is filled,
 or if the receiver is idle maybe, we can flush the buffer. I was just
 thinking, writing to the file system first itself will be expensive, where
 there are additional steps of writing all the records to the local file
 system and again reading it back, and then finally writing it to HDFS, and
 of course, again having a network file system would be an overhead, and not
 to mention the implementation/configuration complications that will come
 with this. IMHO, we should try to make these scenarios as simple as
 possible.

 I'm doing our new BAM data layer implementations here [1], where I'm
 almost done with an RDBMS implementation, doing some refactoring now (mail
 on this yet to come :)), I can also do an HDFS one after that and check it.

 [1]
 https://github.com/wso2/carbon-analytics/tree/master/components/xanalytics

 Cheers,
 Anjana.

 On Tue, Nov 4, 2014 at 6:56 PM, Srinath Perera srin...@wso2.com wrote:

 Hi All,

 Following came out of chat with Sanjiva on a scenario involve very large
 number of events coming into BAM.

 Currently we use Cassandra to store the events and number we got out of
 it has not been great and Cassandra need too much attention to get to those
 number.

 With Cassandra (or any DB) we write data as records. We can batch it,
 but still amount of data in one IO operation is small. In comparison,  file
 transfers are much much faster and that is fastest way to get some data
 from A to B.

 So I am proposing to write the events that comes into a local file in
 the Data Receiver, and periodically append them to a HDFS file. We can
 arrange data in a folder by stream and files by timestamp (e.g. 1h data go
 to a new file), so we can selectively pull and process data using Hive. (We
 can use something like https://github.com/OpenHFT/Chronicle-Queue to
 write data to disk).

 If user needs avoid losing any messages at all in case of a disk
 failure, either he can have a SAN or NTFS or can run two replicas of
 receivers  (we should write some code so only one of the receivers will
 actually put data to HDFS).

 Coding wise, this should not be too hard. I am sure this will be factor
 of time faster than Cassandra (of course we need to do a PoC and verify).

 WDYT?

 --Srinath






 --
 
 Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
 Site: http://people.apache.org/~hemapani/
 Photos: http://www.flickr.com/photos/hemapani/
 Phone: 0772360902




 --
 *Anjana Fernando*
 Senior Technical Lead
 WSO2 Inc. | http://wso2.com
 lean . enterprise . middleware




 --
 Sanjiva Weerawarana, Ph.D.
 Founder, Chairman  CEO; WSO2, Inc.;  http://wso2.com/
 email: sanj...@wso2.com; office: (+1 650 745 4499 | +94  11 214 5345)
 x5700; cell: +94 77 787 6880 | +1 408 466 5099; voip: +1 650 265 8311
 blog: http://sanjiva.weerawarana.org/; twitter: @sanjiva
 Lean . Enterprise . Middleware




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] RFC: Doing Bulk Events Updates to HDFS instead of Cassandra

2014-11-06 Thread Anjana Fernando
Hi Srinath,

I think that example is a bit flawed :) .. I didn't mean to compare
Cassandra with the HDFS case here, I know Cassandra is far more complicated
than the HDFS operations, where the data operations in HDFS is very simple,
and I've a feeling, that with that much small events, it may have turned
into an CPU bound operation rather than I/O bound, because of the
processing required for each event (maybe their batch impl. is crappy),
that maybe why even the bigger batch is also slow. OS level buffers you
said, yeah, so they efficiently batch the physical disk writes, in the
memory, and flush it out later. But that's a different thing, here, we are
just writing to the disk and reading it back again, so as I see, we are
just using the local disk as a buffer, where we could just do this in the
RAM. Basically, build up sizable chunks in memory, and write to HDFS. So we
lose the, even though comparably little, overhead of writing and reading to
the local disk, where still, the bottleneck would be writing the data out
of the network, to a remote server's disk somewhere. Simply put, this
direct HDFS operation should be able to saturate the network link we have,
even if we can't, we can ask ourself, how can writing it to the local disk
and reading it again, optimize it more.

Cheers,
Anjana.

On Thu, Nov 6, 2014 at 6:15 PM, Srinath Perera srin...@wso2.com wrote:

 Of course we need to try it out and verify, I am just making a case that
 we should try it out :)

 Also, RDBMS should be default as most scenarios can be handled with DBs
 and those is no reason to make everyone's life complicated.

 --Srinath

 On Fri, Nov 7, 2014 at 7:44 AM, Srinath Perera srin...@wso2.com wrote:

 1) Anjana you assuming the bandwidth is the bottleneck. Let me give an
 example.

 With sequential reads and writes, a HDD can do  100MB/sec and 1G network
 can do  50 MB/sec
 But BAM best number we have seen is about 40k event/sec (that with 4
 machines or so, lets assume one machine). Lets assume 20 bytes events. Then
 it will be doing 1MB/sec.

 Problem is Cassandra break data to lot of small operations losing OS
 level buffer to buffer transfers files transfers can do. I have tried
 increasing batch size for cassandra, which help with smaller batches. But
 after about few thousand operations in the same batch, things start get
 much slower.

 Best numbers will come when we run two receivers instead of NFS.

 2) Frank, this is analytics data. So it is read only and most cases we
 need only time based queries with less resolution (15min smallest
 resolution is fine for most case). This to say run this batch query on last
 hour of data so on.

 However, we have some scenarios where we do Adhoc queries for things like
 activity monitoring. Those would not work for those and we will have to run
 a batch job to push that data to RDBMS or Solar etc. Anjana, we need to
 discuss this.

 But also there are lot of usecases to receive and write the event to disk
 as soon as possible and later run MapReduce on top them. For those above
 will work.

 --Srinath

















 On Fri, Nov 7, 2014 at 7:23 AM, Anjana Fernando anj...@wso2.com wrote:

 Hi Sanjiva,

 On Thu, Nov 6, 2014 at 4:01 PM, Sanjiva Weerawarana sanj...@wso2.com
 wrote:

 Anjana I think the idea was for the file system - HDFS upload to
 happen via a simple cron job type thing.


 Even so, we will be just moving the problem to another area, the overall
 effort done by that hardware is still the same (writing to disk, reading it
 back, write it to network). That is, even though we can goto very a high
 throughput initially by writing it to the local disk at first, later on we
 have to read it back and write it to HDFS via the network, which is the
 slower part of our operation. So if we continue to load the machine with an
 extreme throughput, you will eventually lose space in that disk.

 Cheers,
 Anjana.



 On Wed, Nov 5, 2014 at 9:19 AM, Anjana Fernando anj...@wso2.com
 wrote:

 Hi Srinath,

 Wouldn't it better, if we just make the batch size bigger, that is,
 lets just have a sizable local in-memory store, something probably close 
 to
 64MB, which is the default HDFS block size, and only after this is filled,
 or if the receiver is idle maybe, we can flush the buffer. I was just
 thinking, writing to the file system first itself will be expensive, where
 there are additional steps of writing all the records to the local file
 system and again reading it back, and then finally writing it to HDFS, and
 of course, again having a network file system would be an overhead, and 
 not
 to mention the implementation/configuration complications that will come
 with this. IMHO, we should try to make these scenarios as simple as
 possible.

 I'm doing our new BAM data layer implementations here [1], where I'm
 almost done with an RDBMS implementation, doing some refactoring now (mail
 on this yet to come :)), I can also do an HDFS one after that and check 
 it.

 [1]
 https

Re: [Architecture] RFC: Doing Bulk Events Updates to HDFS instead of Cassandra

2014-11-06 Thread Anjana Fernando
On Thu, Nov 6, 2014 at 7:19 PM, Srinath Perera srin...@wso2.com wrote:

 Ah sorry, I misunderstood.

 Buffering to memory and writing to HDFS will be faster. By writing to
 disk, you reduce a probability of losing that data by making it bit slower.


 However, if you are running two receivers, probability you will loose data
 is less anyway. So I guess buffer in memory and writing to HDFS would be
 OK.


Great!, yeah, true. In either approach, and even now, there's anyway a high
probability of losing some events in the case of a failure of the server,
because, most often there will be few events in the publisher queue, other
in-memory buffers, the OS I/O buffers for the file scenario etc.. To be
totally reliable, we will have to use a transport like JMS to archive that.

Cheers,
Anjana.


 --Srinath








 On Fri, Nov 7, 2014 at 8:24 AM, Anjana Fernando anj...@wso2.com wrote:

 Hi Srinath,

 I think that example is a bit flawed :) .. I didn't mean to compare
 Cassandra with the HDFS case here, I know Cassandra is far more complicated
 than the HDFS operations, where the data operations in HDFS is very simple,
 and I've a feeling, that with that much small events, it may have turned
 into an CPU bound operation rather than I/O bound, because of the
 processing required for each event (maybe their batch impl. is crappy),
 that maybe why even the bigger batch is also slow. OS level buffers you
 said, yeah, so they efficiently batch the physical disk writes, in the
 memory, and flush it out later. But that's a different thing, here, we are
 just writing to the disk and reading it back again, so as I see, we are
 just using the local disk as a buffer, where we could just do this in the
 RAM. Basically, build up sizable chunks in memory, and write to HDFS. So we
 lose the, even though comparably little, overhead of writing and reading to
 the local disk, where still, the bottleneck would be writing the data out
 of the network, to a remote server's disk somewhere. Simply put, this
 direct HDFS operation should be able to saturate the network link we have,
 even if we can't, we can ask ourself, how can writing it to the local disk
 and reading it again, optimize it more.

 Cheers,
 Anjana.

 On Thu, Nov 6, 2014 at 6:15 PM, Srinath Perera srin...@wso2.com wrote:

 Of course we need to try it out and verify, I am just making a case that
 we should try it out :)

 Also, RDBMS should be default as most scenarios can be handled with DBs
 and those is no reason to make everyone's life complicated.

 --Srinath

 On Fri, Nov 7, 2014 at 7:44 AM, Srinath Perera srin...@wso2.com wrote:

 1) Anjana you assuming the bandwidth is the bottleneck. Let me give an
 example.

 With sequential reads and writes, a HDD can do  100MB/sec and 1G
 network can do  50 MB/sec
 But BAM best number we have seen is about 40k event/sec (that with 4
 machines or so, lets assume one machine). Lets assume 20 bytes events. Then
 it will be doing 1MB/sec.

 Problem is Cassandra break data to lot of small operations losing OS
 level buffer to buffer transfers files transfers can do. I have tried
 increasing batch size for cassandra, which help with smaller batches. But
 after about few thousand operations in the same batch, things start get
 much slower.

 Best numbers will come when we run two receivers instead of NFS.

 2) Frank, this is analytics data. So it is read only and most cases we
 need only time based queries with less resolution (15min smallest
 resolution is fine for most case). This to say run this batch query on last
 hour of data so on.

 However, we have some scenarios where we do Adhoc queries for things
 like activity monitoring. Those would not work for those and we will have
 to run a batch job to push that data to RDBMS or Solar etc. Anjana, we need
 to discuss this.

 But also there are lot of usecases to receive and write the event to
 disk as soon as possible and later run MapReduce on top them. For those
 above will work.

 --Srinath

















 On Fri, Nov 7, 2014 at 7:23 AM, Anjana Fernando anj...@wso2.com
 wrote:

 Hi Sanjiva,

 On Thu, Nov 6, 2014 at 4:01 PM, Sanjiva Weerawarana sanj...@wso2.com
 wrote:

 Anjana I think the idea was for the file system - HDFS upload to
 happen via a simple cron job type thing.


 Even so, we will be just moving the problem to another area, the
 overall effort done by that hardware is still the same (writing to disk,
 reading it back, write it to network). That is, even though we can goto
 very a high throughput initially by writing it to the local disk at first,
 later on we have to read it back and write it to HDFS via the network,
 which is the slower part of our operation. So if we continue to load the
 machine with an extreme throughput, you will eventually lose space in that
 disk.

 Cheers,
 Anjana.



 On Wed, Nov 5, 2014 at 9:19 AM, Anjana Fernando anj...@wso2.com
 wrote:

 Hi Srinath,

 Wouldn't it better, if we just make the batch size bigger, that is,
 lets just have

Re: [Architecture] Integrating ntask component into ESB

2014-10-01 Thread Anjana Fernando
I hope you understood, what I told is, not what you mentioned earlier, you
do not have to store anything in the registry, and the ESB does not have to
load anything themselves. The tasks will be automatically loaded.

Cheers,
Anjana.

On Wed, Oct 1, 2014 at 12:00 PM, Malaka Silva mal...@wso2.com wrote:

 Hi Anjana,

 Yes that is the plan. Will be implementing this at the task adapter level.

 Best Regards,
 Malaka

 On Wed, Oct 1, 2014 at 11:23 AM, Anjana Fernando anj...@wso2.com wrote:

 Hi Malaka,

 Kasun sometime earlier asked me about this; And basically, from ntask,
 the tasks will automatically start up when the server is started up. It
 does not wait till a tenant is loaded or anything like that, it is
 automatically handled by ntask. If the task itself wants some tenant
 specific functionalities, the task implementation can load that. Basically,
 the ESB has an task adapter implementation, which bridges the ntask task
 interface and ESB task interfaces, in the adaptor, you can write the code
 to load any tenant information as needed.

 Cheers,
 Anjana.

 On Wed, Oct 1, 2014 at 8:58 AM, Malaka Silva mal...@wso2.com wrote:

 Hi All,

 At the time of inbound EP code review Azeez has identified an issue with
 ntask integration in tenant mode.

 The problem is when a task is schedules in tenant mode this will not run
 until the tenant is loaded.

 Following is the solution I'm planning to implement.

 When a task is scheduled it'll put a entry in the registry, under tenant
 specific structure. At the time ESB starts, we are going to load the
 tenant, if they have one or more tasks scheduled.

 Above will solve the task implementation and polling inbound EPs issue
 in tenant mode. But the issue will still exists for listening Inbound EPs.

 Let me know your feedback on this?

 Best Regards,
 Malaka

 On Tue, May 20, 2014 at 5:37 PM, Ishan Jayawardena is...@wso2.com
 wrote:

 We have implemented the $subject and it is available in the ESB's git
 repo. As we initially planned we will be releasing this new task manager
 with our next release.

 Thanks,
 Ishan.


 On Mon, Apr 21, 2014 at 5:27 PM, Ishan Jayawardena is...@wso2.com
 wrote:

 Today we had a discussion to review the current implementation of
 $subject.
  We have developed two task providers/managers to manage quartz and
 ntask based task types. The correct task manager gets registered according
 to the synapse configuration, during the startup. When a user deploys a 
 new
 task through the UI, Synapse schedules a task in the registered task
 manager.

 Although each task manager is capable of executing its own task type,
 currently none of the task managers can execute tasks of a different type.
 Due to this, the new ntask task manager cannot execute existing tasks such
 as Synapse MessageInjector. We cannot support this yet without Synapse
 having a dependency to ntask component. At the moment we are looking into 
 a
 solution to this problem.

 At the same time, we are working on the inbound endpoint (VFS) to make
 it reuse the same ntask provider that we developed.

 Thanks,
 Ishan.


 On Mon, Apr 21, 2014 at 9:42 AM, Ishan Jayawardena is...@wso2.com
 wrote:

 Hi Kasun,
 We managed to solve the issue and now we are working on the final
 stage of the development. We will complete this within this week.
 Thanks,
 Ishan.


 On Tue, Apr 15, 2014 at 9:48 AM, Kasun Indrasiri ka...@wso2.com
 wrote:

 Did you check whether the required packages  are osgi imported
 properly?
 On a separate note, what's the ETA of a working deliverable of this?


 On Sun, Apr 13, 2014 at 12:43 PM, Anjana Fernando anj...@wso2.com
 wrote:

 Obviously, check if that class is available and where it is
 referred from in the code. As I remember, there isn't a package called 
 ntaskint,
 so check where this is coming from.

 Cheers,
 Anjana.


 On Sat, Apr 12, 2014 at 6:46 AM, Ishan Jayawardena is...@wso2.com
 wrote:

 We developed the quartz task manager and we are currently working
 on the ntask task manager. While developing the task handling 
 component
 that uses ntask, we observed that we cannot schedule a task in it due 
 to a
 class not found error. See the below error message. The ntask 
 component
 (which is used by the component that we are currently writing) cannot 
 load
 the actual task implementation. Does anyone know how to get rid of 
 this?

 java.lang.ClassNotFoundException: class
 org.wso2.carbon.ntaskint.core.Task
  at
 org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:501)
  at
 org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:421)
 at
 org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:412)
  at
 org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.loadClass(DefaultClassLoader.java:107)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 at
 org.wso2.carbon.ntask.core.impl.TaskQuartzJobAdapter.execute(TaskQuartzJobAdapter.java:58

Re: [Architecture] Rule Based Task Location Resolver

2014-09-30 Thread Anjana Fernando
On Wed, Oct 1, 2014 at 9:32 AM, Chanika Geeganage chan...@wso2.com wrote:

 What will happen if the task is not matched with any of the rule mentioned
 in the configuration?


It will fall back to the first sever that is available, basically, the task
scheduling will not fail, just because a rule is not matched, it will make
the best effort.

Cheers,
Anjana.



 Thanks

 On Mon, Sep 29, 2014 at 5:37 AM, Anjana Fernando anj...@wso2.com wrote:

 Hi,

 I've added $subject to the ntask component, to give more control onto
 where scheduled tasks can be scheduled in a cluster. TaskLocationResolvers
 are used in ntask to basically to find a location in the available set of
 nodes, given the information about the environment. Earlier we had out of
 the box task location resolvers like RandomTaskLocationResolver and
 RoundRobinTaskLocationResolver. The new
 org.wso2.carbon.ntask.core.impl.RuleBasedLocationResolver has the
 following configuration to be used tasks-config.xml:-

 defaultLocationResolver

 locationResolverClassorg.wso2.carbon.ntask.core.impl.RuleBasedLocationResolver/locationResolverClass
 properties
 property
 name=rule-1HIVE_TASK,HTTP_SCRIPT*,192.168.1.*/property
 property name=rule-2HIVE_TASK,.*,192.168.2.*/property
 property name=rule-5.*,.*,.*/property
 /properties
 /defaultLocationResolver

 Basically, here, a rule section contains
 [task-type-pattern],[task-name-pattern],[address-pattern], and a specific
 task checked if its task type matches the task-type-pattern, then it's task
 name to task-name-pattern and then it checks the available nodes' addresses
 against address-pattern, and if it finds 1 or many, it selects on of those
 addresses in a round robin manner. The property names denotes the sequence
 the rules will be evaluated, i.e. rule-1 is checked before rule-2.

 With this task location resolver, we can implement scenarios such as
 executing tasks in a specific zone at first, then only fail-over to another
 zone, if the earlier one is not available. This code has been added to the
 4.2.0 branch and also to GitHub.

 Cheers,
 Anjana.
 --
 *Anjana Fernando*
 Senior Technical Lead
 WSO2 Inc. | http://wso2.com
 lean . enterprise . middleware

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 Best Regards..

 Chanika Geeganage
 Software Engineer
 WSO2, Inc.; http://wso2.com


 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Integrating ntask component into ESB

2014-09-30 Thread Anjana Fernando
Hi Malaka,

Kasun sometime earlier asked me about this; And basically, from ntask, the
tasks will automatically start up when the server is started up. It does
not wait till a tenant is loaded or anything like that, it is automatically
handled by ntask. If the task itself wants some tenant specific
functionalities, the task implementation can load that. Basically, the ESB
has an task adapter implementation, which bridges the ntask task interface
and ESB task interfaces, in the adaptor, you can write the code to load any
tenant information as needed.

Cheers,
Anjana.

On Wed, Oct 1, 2014 at 8:58 AM, Malaka Silva mal...@wso2.com wrote:

 Hi All,

 At the time of inbound EP code review Azeez has identified an issue with
 ntask integration in tenant mode.

 The problem is when a task is schedules in tenant mode this will not run
 until the tenant is loaded.

 Following is the solution I'm planning to implement.

 When a task is scheduled it'll put a entry in the registry, under tenant
 specific structure. At the time ESB starts, we are going to load the
 tenant, if they have one or more tasks scheduled.

 Above will solve the task implementation and polling inbound EPs issue in
 tenant mode. But the issue will still exists for listening Inbound EPs.

 Let me know your feedback on this?

 Best Regards,
 Malaka

 On Tue, May 20, 2014 at 5:37 PM, Ishan Jayawardena is...@wso2.com wrote:

 We have implemented the $subject and it is available in the ESB's git
 repo. As we initially planned we will be releasing this new task manager
 with our next release.

 Thanks,
 Ishan.


 On Mon, Apr 21, 2014 at 5:27 PM, Ishan Jayawardena is...@wso2.com
 wrote:

 Today we had a discussion to review the current implementation of
 $subject.
  We have developed two task providers/managers to manage quartz and
 ntask based task types. The correct task manager gets registered according
 to the synapse configuration, during the startup. When a user deploys a new
 task through the UI, Synapse schedules a task in the registered task
 manager.

 Although each task manager is capable of executing its own task type,
 currently none of the task managers can execute tasks of a different type.
 Due to this, the new ntask task manager cannot execute existing tasks such
 as Synapse MessageInjector. We cannot support this yet without Synapse
 having a dependency to ntask component. At the moment we are looking into a
 solution to this problem.

 At the same time, we are working on the inbound endpoint (VFS) to make
 it reuse the same ntask provider that we developed.

 Thanks,
 Ishan.


 On Mon, Apr 21, 2014 at 9:42 AM, Ishan Jayawardena is...@wso2.com
 wrote:

 Hi Kasun,
 We managed to solve the issue and now we are working on the final stage
 of the development. We will complete this within this week.
 Thanks,
 Ishan.


 On Tue, Apr 15, 2014 at 9:48 AM, Kasun Indrasiri ka...@wso2.com
 wrote:

 Did you check whether the required packages  are osgi imported
 properly?
 On a separate note, what's the ETA of a working deliverable of this?


 On Sun, Apr 13, 2014 at 12:43 PM, Anjana Fernando anj...@wso2.com
 wrote:

 Obviously, check if that class is available and where it is referred
 from in the code. As I remember, there isn't a package called ntaskint,
 so check where this is coming from.

 Cheers,
 Anjana.


 On Sat, Apr 12, 2014 at 6:46 AM, Ishan Jayawardena is...@wso2.com
 wrote:

 We developed the quartz task manager and we are currently working on
 the ntask task manager. While developing the task handling component 
 that
 uses ntask, we observed that we cannot schedule a task in it due to a 
 class
 not found error. See the below error message. The ntask component 
 (which is
 used by the component that we are currently writing) cannot load the 
 actual
 task implementation. Does anyone know how to get rid of this?

 java.lang.ClassNotFoundException: class
 org.wso2.carbon.ntaskint.core.Task
  at
 org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:501)
  at
 org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:421)
 at
 org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:412)
  at
 org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.loadClass(DefaultClassLoader.java:107)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 at
 org.wso2.carbon.ntask.core.impl.TaskQuartzJobAdapter.execute(TaskQuartzJobAdapter.java:58)
  at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
  at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
  at java.lang.Thread.run(Thread.java:662)
 Thanks,
 Ishan.



 On Mon, Apr 7, 2014 at 9:11 AM

[Architecture] Rule Based Task Location Resolver

2014-09-28 Thread Anjana Fernando
Hi,

I've added $subject to the ntask component, to give more control onto where
scheduled tasks can be scheduled in a cluster. TaskLocationResolvers are
used in ntask to basically to find a location in the available set of
nodes, given the information about the environment. Earlier we had out of
the box task location resolvers like RandomTaskLocationResolver and
RoundRobinTaskLocationResolver. The new
org.wso2.carbon.ntask.core.impl.RuleBasedLocationResolver has the
following configuration to be used tasks-config.xml:-

defaultLocationResolver

locationResolverClassorg.wso2.carbon.ntask.core.impl.RuleBasedLocationResolver/locationResolverClass
properties
property
name=rule-1HIVE_TASK,HTTP_SCRIPT*,192.168.1.*/property
property name=rule-2HIVE_TASK,.*,192.168.2.*/property
property name=rule-5.*,.*,.*/property
/properties
/defaultLocationResolver

Basically, here, a rule section contains
[task-type-pattern],[task-name-pattern],[address-pattern], and a specific
task checked if its task type matches the task-type-pattern, then it's task
name to task-name-pattern and then it checks the available nodes' addresses
against address-pattern, and if it finds 1 or many, it selects on of those
addresses in a round robin manner. The property names denotes the sequence
the rules will be evaluated, i.e. rule-1 is checked before rule-2.

With this task location resolver, we can implement scenarios such as
executing tasks in a specific zone at first, then only fail-over to another
zone, if the earlier one is not available. This code has been added to the
4.2.0 branch and also to GitHub.

Cheers,
Anjana.
-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] BAM Performance tests

2014-09-02 Thread Anjana Fernando
Looks good! .. so with this we basically gets like 19K TPS average I guess
.. In this setup, do we use separate partitions for the commit log and for
the data? .. that is what basically recommended for a Cassandra setup. And
also, how are we sending data, is it in a client load balanced manner? ..
or through a single node? .. please test those scenarios separately, to
check how it will affect the performance. And also, can you like reduce the
number of Cassandra nodes also and check how it can affect the performance.
Basically wants to see, if we can get higher TPS values if we add more
nodes, or by any chance, if the opposite effect happens. And also, do
mention the replication factor used for this.

Cheers,
Anjana.


On Tue, Sep 2, 2014 at 4:26 PM, Thayalan thaya...@wso2.com wrote:

 Hi All,

 Please find below the initial receiver perf test results depicted in a
 graph. I'm in the process of capturing the throughput with analyzer script
 running as well. I'll share the results soon.

 [image: Inline image 2]

 [image: Inline image 3]

 Notes:
 1. The perf test performed using the DEBS data contains 69008700 events
 (records)
 2. Throughput captured for every 100 events
 3. Environment Details: (Perf Cloud, openstack)
 Deployment Pattern: As per BAM Cluster guide
 https://docs.wso2.com/display/CLUSTER420/Fully-Distributed%2C+High-Availability+BAM+Setup#Fully-Distributed,High-AvailabilityBAMSetup-Hadoopcluster
  Machine Configuration:
 Node1  Node2: 8 Core, 16GB Mem, 160GB HDD
 Node3, 4  5: 8 Core, 16GB Mem, 160GB HDD  1TB Volume seperately attached
  mounted for Cassandra Data partition

 Node1  Node2: BAM Receiver  Analzer Noders, Hadoop Master  Secondary
 respect


 On Fri, Aug 15, 2014 at 6:59 PM, Sinthuja Ragendran sinth...@wso2.com
 wrote:

 Hi,

 I have completed upto puppetizing BAM Receiver configurations, and due to
 support priorities I wasn't able to work full time on this. Hence cassandra
 and analyzer confgurations are still pending. Anyhow since I have done with
 the base confgurations, I should be able to provide the complete puppet
 scripts by next week.

 Thanks,
 Sinthuja.


 On Fri, Aug 15, 2014 at 6:22 PM, Sanjiva Weerawarana sanj...@wso2.com
 wrote:

 Is the setup all automated with Puppet?


 On Fri, Aug 15, 2014 at 11:26 AM, Thayalan thaya...@wso2.com wrote:

 Hi Srinath,

 Due to support priority this has not been started yet. However I've
 already done the environment set-up in performance cloud. Probably I can
 share the initial findings by next week.

 Thanks,
 Thayalan


 On Fri, Aug 15, 2014 at 8:58 AM, Srinath Perera srin...@wso2.com
 wrote:

 How are we doing with the subject?
 --
  
 Director, Research, WSO2 Inc.
 Visiting Faculty, University of Moratuwa
 Member, Apache Software Foundation
 Research Scientist, Lanka Software Foundation
 Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
 Site: http://people.apache.org/~hemapani/
 Photos: http://www.flickr.com/photos/hemapani/
 Phone: 0772360902

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 Sanjiva Weerawarana, Ph.D.
 Founder, Chairman  CEO; WSO2, Inc.;  http://wso2.com/
 email: sanj...@wso2.com; office: (+1 650 745 4499 | +94  11 214 5345)
 x5700; cell: +94 77 787 6880 | +1 408 466 5099; voip: +1 650 265 8311
 blog: http://sanjiva.weerawarana.org/; twitter: @sanjiva
 Lean . Enterprise . Middleware

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 *Sinthuja Rajendran*
 Senior Software Engineer http://wso2.com/
 WSO2, Inc.:http://wso2.com

 Blog: http://sinthu-rajan.blogspot.com/
 Mobile: +94774273955



 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 Regards,
 Thayalan Sivapaleswararajah
 Associate Technical Lead - QA
 Mob: +94(0)777872485
 Tel : +94(0)(11)2145345
 Fax : +94(0)(11)2145300
 Email: thaya...@wso2.com


 *Disclaimer*: *This communication may contain privileged or other
 confidential information and is intended exclusively for the addressee/s.
 If you are not the intended recipient/s, or believe that you may have
 received this communication in error, please reply to the sender indicating
 that fact and delete the copy you received and in addition, you should not
 print, copy, retransmit, disseminate, or otherwise use the information
 contained in this communication. Internet communications cannot be
 guaranteed to be timely, secure, error or virus-free. The sender does not
 accept liability for any errors or omissions.*




-- 
*Anjana Fernando

Re: [Architecture] Invoke ServerStartupHandlers before start transports

2014-08-24 Thread Anjana Fernando
On Sat, Aug 23, 2014 at 6:16 AM, Afkham Azeez az...@wso2.com wrote:

 Some handlers would need to be called after transports are started. So, we
 could modify the interface to behave like the
 Axis2ConfigurationContextObserver, and have pre  post transport
 initialization methods.


+1, as I remember, ntask uses this to schedule the actual tasks at the very
last moment, and specific task implementations like our data services tasks
would require the transports to be available at that time.

Cheers,
Anjana.




 On Fri, Aug 22, 2014 at 8:15 PM, Sagara Gunathunga sag...@wso2.com
 wrote:


 According to current StartupFinalizerServiceComponent implementation, it
 calls registered ServerStartupHandlers after starting transports but IMHO
 it would be better to invoke ServerStartupHandlers before server start any
 transports.

 We have a requirement to perform few tasks just before server startup
 completion but before transport listeners get start. Further by looking at
 API-M APIManagerStartupPublisher class ( which is one of the
  implementation of ServerStartupHandler interface) I think it would be much
 better to add local APIs before start transports.

 Please refer the patch here[1]

 [1] - https://github.com/wso2-dev/carbon4-kernel/pull/84

 Thanks !

 --
 Sagara Gunathunga

 Senior Technical Lead; WSO2, Inc.;  http://wso2.com
 V.P Apache Web Services;http://ws.apache.org/
 Linkedin; http://www.linkedin.com/in/ssagara
 Blog ;  http://ssagara.blogspot.com




 --
 *Afkham Azeez*
 Director of Architecture; WSO2, Inc.; http://wso2.com
 Member; Apache Software Foundation; http://www.apache.org/
 * http://www.apache.org/*
 *email: **az...@wso2.com* az...@wso2.com
 * cell: +94 77 3320919 %2B94%2077%203320919 blog: *
 *http://blog.afkham.org* http://blog.afkham.org
 *twitter: **http://twitter.com/afkham_azeez*
 http://twitter.com/afkham_azeez
 * linked-in: **http://lk.linkedin.com/in/afkhamazeez
 http://lk.linkedin.com/in/afkhamazeez*

 *Lean . Enterprise . Middleware*




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Configuring transport and security policy in dataservice config

2014-08-21 Thread Anjana Fernando
Hi,

Earlier I had the idea that ESB is also doing the same thing, and thought
it is easier, where it needed lesser properties, and didn't really think
the policy could be re-used by other services, but later I also got to
know, that is not the case. So yeah, in that case, lets have another
property like policyKey or policyPath to give the path to policy.

Cheers,
Anjana.


On Thu, Aug 21, 2014 at 9:01 AM, Selvaratnam Uthaiyashankar 
shan...@wso2.com wrote:

 Why do you prefer convention over explicit policy location. For example, I
 have Data service 1, 2, 3. Data service 1 and 2 are using policy 1. Data
 service 3 is using policy 2.

 With using convention, either you can have 1 policy or 3 policy for above
 case. You will not be able to have only 2 policy.


 On Wednesday, August 20, 2014, Anjana Fernando anj...@wso2.com wrote:

 Hi Chanika,

 Lets just put enableSec as an attribute in the root element of the data
 service configuration. Like, data enableSec=true .. and as for the
 policy file location, I guess there is a standard location the ESB would
 look up if its not given explicitly, we will also just skip the policy
 location attribute and just go by convention where the policy file would be
 located.

 Cheers,
 Anjana.


 On Wed, Aug 20, 2014 at 2:17 PM, Chanika Geeganage chan...@wso2.com
 wrote:

 Hi,

 We recently came across a requirement to support QoS related
 configurations to .dbs file itself rather than adding a separate
 services.xml file. Therefore we are going to add the transport and security
 policy related configurations in the same way that in ESB proxy services
 configurations. The changes are:
 1. Adding transports=https http attribute to configure transport info
 2. Adding enableSec tag with the policy key to configure security
 i.e:
 policy key=path/to/policy/
 enableSec/

 In the deployment time these configurations will be extracted. Will this
 be a good approach to follow?

 Thanks

 --
 Best Regards..

 Chanika Geeganage
 Software Engineer
 WSO2, Inc.; http://wso2.com




 --
 *Anjana Fernando*
 Senior Technical Lead
 WSO2 Inc. | http://wso2.com
 lean . enterprise . middleware



 --
 S.Uthaiyashankar
 VP Engineering
 WSO2 Inc.
 http://wso2.com/ - lean . enterprise . middleware

 Phone: +94 714897591


 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] [POC] Performance evaluation of Hive vs Shark

2014-08-13 Thread Anjana Fernando
Hi Niranda,

Excellent analysis of Hive vs Shark! .. This gives a lot of insight into
how both operates in different scenarios. As the next step, we will need to
run this in an actual cluster of computers. Since you've used a subset of
the dataset of 2014 DEBS challenge, we should use the full data set in a
clustered environment and check this. Gokul is already working on the Hive
based setup for this, after that is done, you can create a Shark cluster in
the same hardware and run the tests there, to get a clear comparison on how
these two match up in a cluster. Until the setup is ready, do continue with
your next steps on checking the RDD support and Spark SQL use.

After these are done, we should also do a trial run of our own APIM Hive
scripts, migrated to Shark.

Cheers,
Anjana.


On Mon, Aug 11, 2014 at 12:21 PM, Niranda Perera nira...@wso2.com wrote:

 Hi all,

 I have been evaluating the performance of Shark (distributed SQL query
 engine for Hadoop) against Hive. This is with the objective of seeing the
 possibility to move the WSO2 BAM data processing (which currently uses
 Hive) to Shark (and Apache Spark) for improved performance.

 I am sharing my findings herewith.

  *AMP Lab Shark*
 Shark can execute Hive QL queries up to 100 times faster than Hive without
 any modification to the existing data or queries. It supports Hive's QL,
 metastore, serialization formats, and user-defined functions, providing
 seamless integration with existing Hive deployments and a familiar, more
 powerful option for new ones. [1]


 *Apache Spark*Apache Spark is an open-source data analytics cluster
 computing framework. It fits into the Hadoop open-source community,
 building on top of the HDFS and promises performance up to 100 times faster
 than Hadoop MapReduce for certain applications. [2]
 Official documentation: [3]


 I carried out the comparison between the following Hive and Shark releases
 with input files ranging from 100 to 1 billion entries.

 QL Engine

 Apache Hive 0.11

 Shark Shark 0.9.1 (Latest release) which uses,

-

Scala 2.10.3
-

Spark 0.9.1
-

AMPLab’s Hive 0.9.0


 Framework

 Hadoop 1.0.4
 Spark 0.9.1

 File system

 HDFS
 HDFS

 Attached herewith is a report which describes in detail about the
 performance comparison between Shark and Hive.
 ​
  hive_vs_shark
 https://docs.google.com/a/wso2.com/folderview?id=0B1GsnfycTl32QTZqUktKck1Ucjgusp=drive_web
 ​​
  hive_vs_shark_report.odt
 https://docs.google.com/a/wso2.com/file/d/0B1GsnfycTl32X3J5dTh6Slloa0E/edit?usp=drive_web
 ​​

 In summary,

 From the evaluation, following conclusions can be derived.

- Shark is indifferent to Hive in DDL operations (CREATE, DROP ..
TABLE, DATABASE). Both engines show a fairly constant performance as the
input size increases.
- Shark is indifferent to Hive in DML operations (LOAD, INSERT) but
when a DML operation is called in conjuncture of a data retrieval operation
(ex. INSERT TBL SELECT PROP FROM TBL), Shark significantly
over-performs Hive with a performance factor of 10x+ (Ranging from 10x to
80x in some instances). Shark performance factor reduces with the input
size increases, while HIVE performance is fairly indifferent.
- Shark clearly over-performs Hive in Data Retrieval operations
(FILTER, ORDER BY, JOIN). Hive performance is fairly indifferent in the
data retrieval operations while Shark performance reduces as the input size
increases. But at every instance Shark over-performed Hive with a minimum
performance factor of 5x+ (Ranging from 5x to 80x in some instances).

 Please refer the 'hive_vs_shark_report', it has all the information about
 the queries and timings pictographically.

 The code repository can also be found in
 https://github.com/nirandaperera/hiveToShark/tree/master/hiveVsShark

 Moving forward, I am currently working on the following.

- Apache Spark's resilient distributed dataset (RDD) abstraction
(which is a collection of elements partitioned across the nodes of the
cluster that can be operated on in parallel). The use of RDDs and its
impact to the performance.
- Spark SQL - Use of this Spark SQL over Shark on Spark framework


 [1] https://github.com/amplab/shark/wiki
 [2] http://en.wikipedia.org/wiki/Apache_Spark
 [3] http://spark.apache.org/docs/latest/



 Would love to have your feedback on this.

 Best regards

 --
  *Niranda Perera*
 Software Engineer, WSO2 Inc.
 Mobile: +94-71-554-8430
 Twitter: @n1r44 https://twitter.com/N1R44




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] [POC] Performance evaluation of Hive vs Shark

2014-08-13 Thread Anjana Fernando
On Wed, Aug 13, 2014 at 3:51 PM, Sumedha Rubasinghe sume...@wso2.com
wrote:

  After these are done, we should also do a trial run of our own APIM Hive
 scripts, migrated to Shark.

 Do we need to migrate?I thought existing Hive scripts can run as it is.
 First of all we need to create a large data set of API stats.

Oh yeah, wrong selection of words I guess :) .. we wouldn't have to migrate
.. I just referred to as just testing the same APIM Hive script in Shark.

Cheers,
Anjana.

 
  Cheers,
  Anjana.
 
 
  On Mon, Aug 11, 2014 at 12:21 PM, Niranda Perera nira...@wso2.com
 wrote:
 
  Hi all,
 
  I have been evaluating the performance of Shark (distributed SQL query
 engine for Hadoop) against Hive. This is with the objective of seeing the
 possibility to move the WSO2 BAM data processing (which currently uses
 Hive) to Shark (and Apache Spark) for improved performance.
 
  I am sharing my findings herewith.
 
  AMP Lab Shark
  Shark can execute Hive QL queries up to 100 times faster than Hive
 without any modification to the existing data or queries. It supports
 Hive's QL, metastore, serialization formats, and user-defined functions,
 providing seamless integration with existing Hive deployments and a
 familiar, more powerful option for new ones. [1]
 
  Apache Spark
  Apache Spark is an open-source data analytics cluster computing
 framework. It fits into the Hadoop open-source community, building on top
 of the HDFS and promises performance up to 100 times faster than Hadoop
 MapReduce for certain applications. [2]
  Official documentation: [3]
 
 
  I carried out the comparison between the following Hive and Shark
 releases with input files ranging from 100 to 1 billion entries.
 
  QL Engine
 
  Apache Hive 0.11
 
  Shark Shark 0.9.1 (Latest release) which uses,
 
  Scala 2.10.3
 
  Spark 0.9.1
 
  AMPLab’s Hive 0.9.0
 
 
  Framework
 
  Hadoop 1.0.4
 
  Spark 0.9.1
 
  File system
 
  HDFS
 
  HDFS
 
 
  Attached herewith is a report which describes in detail about the
 performance comparison between Shark and Hive.
  ​
   hive_vs_shark
  ​​
   hive_vs_shark_report.odt

  ​​
 
  In summary,
 
  From the evaluation, following conclusions can be derived.
  Shark is indifferent to Hive in DDL operations (CREATE, DROP .. TABLE,
 DATABASE). Both engines show a fairly constant performance as the input
 size increases.
  Shark is indifferent to Hive in DML operations (LOAD, INSERT) but when
 a DML operation is called in conjuncture of a data retrieval operation (ex.
 INSERT TBL SELECT PROP FROM TBL), Shark significantly over-performs
 Hive with a performance factor of 10x+ (Ranging from 10x to 80x in some
 instances). Shark performance factor reduces with the input size increases,
 while HIVE performance is fairly indifferent.
  Shark clearly over-performs Hive in Data Retrieval operations (FILTER,
 ORDER BY, JOIN). Hive performance is fairly indifferent in the data
 retrieval operations while Shark performance reduces as the input size
 increases. But at every instance Shark over-performed Hive with a minimum
 performance factor of 5x+ (Ranging from 5x to 80x in some instances).
  Please refer the 'hive_vs_shark_report', it has all the information
 about the queries and timings pictographically.
 
  The code repository can also be found in
  https://github.com/nirandaperera/hiveToShark/tree/master/hiveVsShark
 
  Moving forward, I am currently working on the following.
  Apache Spark's resilient distributed dataset (RDD) abstraction (which
 is a collection of elements partitioned across the nodes of the cluster
 that can be operated on in parallel). The use of RDDs and its impact to the
 performance.
  Spark SQL - Use of this Spark SQL over Shark on Spark framework
 
  [1] https://github.com/amplab/shark/wiki
  [2] http://en.wikipedia.org/wiki/Apache_Spark
  [3] http://spark.apache.org/docs/latest/
 
 
 
  Would love to have your feedback on this.
 
  Best regards
 
  --
  Niranda Perera
  Software Engineer, WSO2 Inc.
  Mobile: +94-71-554-8430
  Twitter: @n1r44
 
 
 
 
  --
  Anjana Fernando
  Senior Technical Lead
  WSO2 Inc. | http://wso2.com
  lean . enterprise . middleware
 
  ___
  Architecture mailing list
  Architecture@wso2.org
  https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
 


 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Proposal: Annotation Based BAM/CEP Data Publisher - Replacement for Async/LoadBalance Data Publishers

2014-07-24 Thread Anjana Fernando
+1, looks good to me, shall we just change the property order to
ordinal, I guess that is more suitable. And lets do a code review after
this is done, have to make sure, there won't be any performance overhead
because of this approach.

Cheers,
Anjana.


On Thu, Jul 24, 2014 at 6:53 AM, Srinath Perera srin...@wso2.com wrote:

 +1 from me. If everyone are OK, we can get this done soon so other
 toolboxes can be build on top of this.


 On Tue, Jul 22, 2014 at 5:55 PM, Chamil Jeewantha cha...@wso2.com wrote:

 +1 for optimization concern.

 In general annotation based systems uses a cache to avoid processing
 annotations again and again. First time when publish receives the Event, It
 process the annotations and put them into a cache. The cache key is the
 class name. Next time onwards no need to process annotations. just read the
 values from POJO and send to BAM.



 On Tue, Jul 22, 2014 at 5:16 PM, Sriskandarajah Suhothayan s...@wso2.com
  wrote:

 +1 it looks clean,
 we might need to do some optimisation at the publisher when converting
 the annotated class to the stream and Databridge Event.

 Suho



 On Tue, Jul 22, 2014 at 3:15 PM, Maninda Edirisooriya mani...@wso2.com
 wrote:

 +1. This is very clean to write a publisher.
 We have to generalize this annotations to become compatible with other
 publishers. How do we get the BAM/CEP server connection details?
  Where are we setting the loadbalancing URLs and other Async Publisher
 related settings? May be we can set them globally per product. (In this
 case specific for each AS cluster) WDYT?


 *Maninda Edirisooriya*
 Senior Software Engineer

 *WSO2, Inc. *lean.enterprise.middleware.

 *Blog* : http://maninda.blogspot.com/
 *E-mail* : mani...@wso2.com
 *Skype* : @manindae
 *Twitter* : @maninda


 On Tue, Jul 22, 2014 at 2:43 PM, Chamil Jeewantha cha...@wso2.com
 wrote:

 This is a proposal to develop an easy to use, readable Anootation
 based Data publisher for BAM.

 When using the AsyncDataPublisher / LoadBalancingDataPublisher, the
 programmer must do a significant amount of boilerplate work before he
 publish data to the stream. See [1].

 This will be really easy if we can have annotation based data
 publisher which can be used in the following way.

 We write a POJO, annotated with Some Stream Meta Data.

 *Example:*

 @DataStream(name=stat.data.stream, version=1.0.0, nickName=nick
 name, description=the description)
 public class StatDataStreamEvent{

 @Column(name=serverName, order=2)
 private String serverName;

 @Column(order = 1)   // no column name defined. so the name
 will be timestamp
 private long timestamp;

 @Column(name=id, type=DataType.STRING, order = 3)  // the column
 data type is String though the field is int. (example only)
 private int statId;

// getters and setters


 }

 *Publishing:*

 StatDataStreamEvent event = new StatDataStreamEvent();

 event.setServerName(The server Name);
 event.setTimestamp(System.currentTimeMillis());
 event.setStatId(5000);

 DataPublisher.publish(event);

 Please improve this with your valuable ideas.

 [1] 
 http://wso2.com/library/articles/2012/07/creating-custom-agents-publish-events-bamcep/

  --
 K.D. Chamil Jeewantha
 Associate Technical Lead
 WSO2, Inc.;  http://wso2.com
 Mobile: +94716813892


 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture



 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --

 *S. Suhothayan *
 Technical Lead  Team Lead of WSO2 Complex Event Processor
  *WSO2 Inc. *http://wso2.com
 * http://wso2.com/*
 lean . enterprise . middleware


 *cell: (+94) 779 756 757 %28%2B94%29%20779%20756%20757 | blog:
 http://suhothayan.blogspot.com/ http://suhothayan.blogspot.com/ twitter:
 http://twitter.com/suhothayan http://twitter.com/suhothayan | linked-in:
 http://lk.linkedin.com/in/suhothayan http://lk.linkedin.com/in/suhothayan*

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 K.D. Chamil Jeewantha
 Associate Technical Lead
 WSO2, Inc.;  http://wso2.com
 Mobile: +94716813892


 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 
 Srinath Perera, Ph.D.
http://people.apache.org/~hemapani/
http://srinathsview.blogspot.com/

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware

Re: [Architecture] Standardize defining of BAM server profiles for Carbon products

2014-07-23 Thread Anjana Fernando
On Wed, Jul 23, 2014 at 12:02 AM, Srinath Perera srin...@wso2.com wrote:

 How about event-pulblisher.xml? I think we do not put config to our config
 files usually? Need to be consistent about this.


Yeah, true, no need for the config part, +1 for event-publisher.xml.

Cheers,
Anjana.


 +1 Giving and id for each publisher and a default

 As Anjana said dataSourceName name should not be here.
 Shall we add a publisher class when a customer asks for it?

 +1 to create an OSGI service to find the current publisher. (Nandika also
 proposed this yesterday).

 --Srinath


 On Tue, Jul 22, 2014 at 11:07 PM, Sriskandarajah Suhothayan s...@wso2.com
  wrote:

 Hi

 IMHO we should not restrict data publishing to WSO2 BAM and CEP, and our
 servers should be able to publish other analytic servers as well. So I
 believe adding the PublisherClass will be a good option and this can be
 an optional field.

 Regards
 Suho


 On Tue, Jul 22, 2014 at 8:50 PM, Anjana Fernando anj...@wso2.com wrote:

 Hi Sagara,

 Maybe we can have default publisher that will be used by the products
 if a specific id is not given, and if needed, clients can give a specific
 ID, as you said, if we have separate BAM and CEP servers and so on. And we
 should not have datasSourceName, it's a implementation specific property
 for how someone does analytics, and shouldn't be part of the publisher
 config. And also, I'm not sure what this PublisherClass is, we shouldn't
 have that, I guess it's a APIM specific thing.

 Cheers,
 Anjana.


 On Tue, Jul 22, 2014 at 11:16 AM, Sagara Gunathunga sag...@wso2.com
 wrote:


 Please find draft format for analytics.xml or
 event-publisher-config.xml.

 event-publisher-config
 publisher
  idbam/id
 enabledtrue/enabled
 protocolthrift/protocol
  serverURLtcp://BAM host IP:7614//serverURL
 usernameadmin/username
  passwordadmin/password
 dataSourceNamejdbc/WSO2AM_STATS_DB/dataSourceName
  publisher
 event-publisher-config

 - It is possible to uniquely refer each publisher from product
 specific configurations such as mediator, Valve etc.

 - In a given product it is possible to configure both CEP and BAM
 servers separately ( or two BAM/CEP servers)

 - As we host dashboards with each product now I included
 dataSourceName to refer stat database.

 - API-M uses PublisherClass class to refer publisher implementation
 class, if same thing possible with all products we can add
 PublisherClass element too.


 Please suggest additions and removals for above format ?

 @Maninda, Can you please elaborate more on where do we
 configure Publisher throttling constraints today and current format ? may
 be we can leverage those settings as well.

 Thanks !





 On Tue, Jul 22, 2014 at 7:44 PM, Anjana Fernando anj...@wso2.com
 wrote:

 Now, since this is just to contain the publisher information,
 shouldn't it be something like event-publisher-config.xml? .. when we 
 say
 analytics.xml, it gives an idea like it's a configuration for whole of
 analytics operations, like a config for some analyzing operation settings.
 Anyways, this will just contain the settings required to connect to an
 event receiver, that is the hosts, the secure/non-secure ports etc.. After
 this, we can create an OSGi service, which will expose an API to just
 create a DataPublisher for you.

 Cheers,
 Anjana.


 On Tue, Jul 22, 2014 at 6:26 AM, Sagara Gunathunga sag...@wso2.com
 wrote:




 On Tue, Jul 22, 2014 at 2:06 PM, Afkham Azeez az...@wso2.com wrote:

 analytics.xml seems like a better name.


 +1



 On Tue, Jul 22, 2014 at 1:51 PM, Srinath Perera srin...@wso2.com
 wrote:

 These events can go to BAM or CEP.

 Shall we go with analytics.xml file instead of a bam.xml file?
 Sagara, can you send the content for current bam.xml file to this 
 thread so
 we can finalise the content.


 Current bam.xml files is only used with AS and contains following two
 lines to control AS service/web-app stat publishing in global level.

 WebappDataPublishingdisable/WebappDataPublishing
 ServiceDataPublishingdisable/ServiceDataPublishing

 I will send draft design for new analytics.xml file soon.

 Thanks !




 that will mean BPS, ESB, API-M needs to fix this (may be with BAM
 toolbox improvements). Also, when decided Shammi, MB training project 
 needs
 to use this too.

 WDYT?

 --Srinath








 On Tue, Jul 22, 2014 at 1:43 PM, Afkham Azeez az...@wso2.com
 wrote:

 The correct approach is to introduce a bam.xml config. BAM is
 optional, hence we should avoid BAM specific configs to the 
 carbon.xml.

 Azeez


 On Mon, Jul 21, 2014 at 9:52 PM, Sagara Gunathunga 
 sag...@wso2.com wrote:


 Right now each of our product use it's own way to define BAM
 server profiles, it would be nice if we can follow an unified 
 process when
 configuring BAM servers and to enable/disable server level data 
 publishing.
 FYI these are some of the approaches used by our products.


 ESB  - Through BAM server profile UI and no configuration file.

 AS - Use bam.xml

Re: [Architecture] Fwd: Create CQL data source from master-datasources.xml

2014-07-22 Thread Anjana Fernando
Yeah, the format looks good .. Hope you used JAXB to represent this model
in the code in the DataSourceReader, rather than parsing in raw DOM or
something. Also, what is the data source object you're using here, I guess
it would be the Session object that you need to return, to be used by the
clients.

Cheers,
Anjana.


On Tue, Jul 22, 2014 at 2:44 AM, Prabath Abeysekera praba...@wso2.com
wrote:

 Hi Dhanuka,

 This looks good and comprehensive!

 Let's delve further into this and see whether there's any other parameters
 available in CQL driver configurations, which one might find useful to be
 used in a production setup. If we come across any, can consider supporting
 them in the proposed datasource configuration structure too.

 Cheers,
 Prabath


 On Tue, Jul 22, 2014 at 12:02 PM, Dhanuka Ranasinghe dhan...@wso2.com
 wrote:

 looping architecture

 *Dhanuka Ranasinghe*

 Senior Software Engineer
 WSO2 Inc. ; http://wso2.com
 lean . enterprise . middleware

 phone : +94 715381915


 -- Forwarded message --
 From: Dhanuka Ranasinghe dhan...@wso2.com
 Date: Tue, Jul 22, 2014 at 12:00 PM
 Subject: Create CQL data source from master-datasources.xml
 To: WSO2 Developers' List d...@wso2.org
 Cc: Prabath Abeysekera praba...@wso2.com, Hasitha Hiranya 
 hasit...@wso2.com, Anjana Fernando anj...@wso2.com, Deependra
 Ariyadewa d...@wso2.com, Bhathiya Jayasekara bhath...@wso2.com,
 Shani Ranasinghe sh...@wso2.com, Poshitha Dabare poshi...@wso2.com,
 Harsha Kumara hars...@wso2.com


 Hi,

 While working on $Subject, found there are lot of configuration options
 available in CQL driver. Most of them are same as hector client
 configurations and we have identified some of them are critical for
 performance and reliability.

 Below describe the sample data source configuration that came up with the
 solution after analyzing CQL driver. Please let me know your thoughts
 regarding this.

 datasource
  nameWSO2_CASSANDRA_DB/name
 descriptionThe datasource used for cassandra/description
  jndiConfig
 nameCassandraRepo/name
 /jndiConfig
  definition type=CASSANDRA
 configuration
 asyncfalse/async
  clusterNameTestCluster/clusterName
 compressionSNAPPY/compression
  concurrency100/concurrency
 usernameadmin/username
 password encrypted=trueadmin/password
  port9042/port
 maxConnections100/maxConnections

  hosts
 host192.1.1.0/host
 host192.1.1.1/host
  /hosts
 loadBalancePolicy
 exclusionThreshold2.5/exclusionThreshold
  latencyAwaretrue/latencyAware
 minMeasure100/minMeasure
 policyNameRoundRobinPolicy/policyName
  retryPeriod10/retryPeriod
 scale2/scale
 /loadBalancePolicy

 poolOptions
 coreConnectionsForLocal10/coreConnectionsForLocal
  coreConnectionsForRemote10/coreConnectionsForRemote
 maxConnectionsForLocal10/maxConnectionsForLocal
  maxConnectionsForRemote10/maxConnectionsForRemote
 maxSimultaneousRequestsForLocal10/maxSimultaneousRequestsForLocal
  maxSimultaneousRequestsForRemote10/maxSimultaneousRequestsForRemote
 minSimultaneousRequestsForLocal10/minSimultaneousRequestsForLocal
  minSimultaneousRequestsForRemote10/minSimultaneousRequestsForRemote
 /poolOptions

 reconnectPolicy
 baseDelayMs100/baseDelayMs
 policyNameConstantReconnectionPolicy/policyName
  /reconnectPolicy
 socketOptions
 connectTimeoutMillis200/connectTimeoutMillis
  keepAlivetrue/keepAlive
 readTimeoutMillis200/readTimeoutMillis
  tcpNoDelaytrue/tcpNoDelay
 /socketOptions

 /configuration

 /definition
 /datasource





 Cheers,
 *Dhanuka Ranasinghe*

 Senior Software Engineer
 WSO2 Inc. ; http://wso2.com
 lean . enterprise . middleware

 phone : +94 715381915


 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 Prabath Abeysekara
 Associate Technical Lead, Data TG.
 WSO2 Inc.
 Email: praba...@wso2.com
 Mobile: +94774171471

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Fwd: Create CQL data source from master-datasources.xml

2014-07-22 Thread Anjana Fernando
Hi Dhanuka,

On Tue, Jul 22, 2014 at 9:03 AM, Dhanuka Ranasinghe dhan...@wso2.com
wrote:



 *Dhanuka Ranasinghe*

 Senior Software Engineer
 WSO2 Inc. ; http://wso2.com
 lean . enterprise . middleware

 phone : +94 715381915


 On Tue, Jul 22, 2014 at 5:47 PM, Anjana Fernando anj...@wso2.com wrote:

 Yeah, the format looks good .. Hope you used JAXB to represent this model
 in the code in the DataSourceReader, rather than parsing in raw DOM or
 something.

 Yes same as rdbms component, used JAXB.

 Also, what is the data source object you're using here, I guess it would
 be the Session object that you need to return, to be used by the clients.

 com.datastax.driver.core.Cluster


Be mindful in using Cluster here, because when you create a session out of
it, the connection pool resides in the Session object [1], so if Cluster
object is what you expose here, when multiple applications lookup this,
they will create their own Session objects, and will have their own
separate connection pools etc.. so the data source defined in your
datasources.xml doesn't mean that globally all the applications only use
the number of connections defined there. For example, RDBMS by default
shares a single javax.sql.DataSource object, so everyone shares the
connection pool. So maybe, also consider using Session object here also,
and with that, you would also need to give the specific keyspace being
used, as they say, a session has to be used in only one keyspace.

[1]
http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/quick_start/qsSimpleClientAddSession_t.html

Cheers,
Anjana.


 Cheers,
 Anjana.


 On Tue, Jul 22, 2014 at 2:44 AM, Prabath Abeysekera praba...@wso2.com
 wrote:

 Hi Dhanuka,

 This looks good and comprehensive!

 Let's delve further into this and see whether there's any other
 parameters available in CQL driver configurations, which one might find
 useful to be used in a production setup. If we come across any, can
 consider supporting them in the proposed datasource configuration structure
 too.

 Cheers,
 Prabath


 On Tue, Jul 22, 2014 at 12:02 PM, Dhanuka Ranasinghe dhan...@wso2.com
 wrote:

 looping architecture

 *Dhanuka Ranasinghe*

 Senior Software Engineer
 WSO2 Inc. ; http://wso2.com
 lean . enterprise . middleware

 phone : +94 715381915


 -- Forwarded message --
 From: Dhanuka Ranasinghe dhan...@wso2.com
 Date: Tue, Jul 22, 2014 at 12:00 PM
 Subject: Create CQL data source from master-datasources.xml
 To: WSO2 Developers' List d...@wso2.org
 Cc: Prabath Abeysekera praba...@wso2.com, Hasitha Hiranya 
 hasit...@wso2.com, Anjana Fernando anj...@wso2.com, Deependra
 Ariyadewa d...@wso2.com, Bhathiya Jayasekara bhath...@wso2.com,
 Shani Ranasinghe sh...@wso2.com, Poshitha Dabare poshi...@wso2.com,
 Harsha Kumara hars...@wso2.com


 Hi,

 While working on $Subject, found there are lot of configuration options
 available in CQL driver. Most of them are same as hector client
 configurations and we have identified some of them are critical for
 performance and reliability.

 Below describe the sample data source configuration that came up with
 the solution after analyzing CQL driver. Please let me know your thoughts
 regarding this.

 datasource
  nameWSO2_CASSANDRA_DB/name
 descriptionThe datasource used for cassandra/description
  jndiConfig
 nameCassandraRepo/name
 /jndiConfig
  definition type=CASSANDRA
 configuration
 asyncfalse/async
  clusterNameTestCluster/clusterName
 compressionSNAPPY/compression
  concurrency100/concurrency
 usernameadmin/username
 password encrypted=trueadmin/password
  port9042/port
 maxConnections100/maxConnections

  hosts
 host192.1.1.0/host
 host192.1.1.1/host
  /hosts
 loadBalancePolicy
 exclusionThreshold2.5/exclusionThreshold
  latencyAwaretrue/latencyAware
 minMeasure100/minMeasure
 policyNameRoundRobinPolicy/policyName
  retryPeriod10/retryPeriod
 scale2/scale
 /loadBalancePolicy

 poolOptions
 coreConnectionsForLocal10/coreConnectionsForLocal
  coreConnectionsForRemote10/coreConnectionsForRemote
 maxConnectionsForLocal10/maxConnectionsForLocal
  maxConnectionsForRemote10/maxConnectionsForRemote
 maxSimultaneousRequestsForLocal10/maxSimultaneousRequestsForLocal

 maxSimultaneousRequestsForRemote10/maxSimultaneousRequestsForRemote
 minSimultaneousRequestsForLocal10/minSimultaneousRequestsForLocal

 minSimultaneousRequestsForRemote10/minSimultaneousRequestsForRemote
 /poolOptions

 reconnectPolicy
 baseDelayMs100/baseDelayMs
 policyNameConstantReconnectionPolicy/policyName
  /reconnectPolicy
 socketOptions
 connectTimeoutMillis200/connectTimeoutMillis
  keepAlivetrue/keepAlive
 readTimeoutMillis200/readTimeoutMillis
  tcpNoDelaytrue/tcpNoDelay
 /socketOptions

 /configuration

 /definition
 /datasource





 Cheers,
 *Dhanuka Ranasinghe*

 Senior Software Engineer
 WSO2 Inc. ; http://wso2.com
 lean . enterprise . middleware

 phone : +94 715381915


 ___
 Architecture mailing list

Re: [Architecture] Standardize defining of BAM server profiles for Carbon products

2014-07-22 Thread Anjana Fernando
Now, since this is just to contain the publisher information, shouldn't it
be something like event-publisher-config.xml? .. when we say
analytics.xml, it gives an idea like it's a configuration for whole of
analytics operations, like a config for some analyzing operation settings.
Anyways, this will just contain the settings required to connect to an
event receiver, that is the hosts, the secure/non-secure ports etc.. After
this, we can create an OSGi service, which will expose an API to just
create a DataPublisher for you.

Cheers,
Anjana.


On Tue, Jul 22, 2014 at 6:26 AM, Sagara Gunathunga sag...@wso2.com wrote:




 On Tue, Jul 22, 2014 at 2:06 PM, Afkham Azeez az...@wso2.com wrote:

 analytics.xml seems like a better name.


 +1



 On Tue, Jul 22, 2014 at 1:51 PM, Srinath Perera srin...@wso2.com wrote:

 These events can go to BAM or CEP.

 Shall we go with analytics.xml file instead of a bam.xml file? Sagara,
 can you send the content for current bam.xml file to this thread so we can
 finalise the content.


 Current bam.xml files is only used with AS and contains following two
 lines to control AS service/web-app stat publishing in global level.

 WebappDataPublishingdisable/WebappDataPublishing
 ServiceDataPublishingdisable/ServiceDataPublishing

 I will send draft design for new analytics.xml file soon.

 Thanks !




 that will mean BPS, ESB, API-M needs to fix this (may be with BAM
 toolbox improvements). Also, when decided Shammi, MB training project needs
 to use this too.

 WDYT?

 --Srinath








 On Tue, Jul 22, 2014 at 1:43 PM, Afkham Azeez az...@wso2.com wrote:

 The correct approach is to introduce a bam.xml config. BAM is optional,
 hence we should avoid BAM specific configs to the carbon.xml.

 Azeez


 On Mon, Jul 21, 2014 at 9:52 PM, Sagara Gunathunga sag...@wso2.com
 wrote:


 Right now each of our product use it's own way to define BAM server
 profiles, it would be nice if we can follow an unified process when
 configuring BAM servers and to enable/disable server level data 
 publishing.
 FYI these are some of the approaches used by our products.


 ESB  - Through BAM server profile UI and no configuration file.

 AS - Use bam.xml to enable disable  server level data publishing
 and Webapp/Service Data Publishing  UI for server configuration.


 BPS - Through bps.xml and writing  a BAMServerProfile.xml file.

 API-M  - Through api-manager.xml file.



 IMHO we can unified this process among all the servers up to some
 extend, as an example

 1. Configuring BAM server details  - urls, user name, password
 2. Globally enable and disable data publishing
 3. Name of the stat database
 4. Publishing protocol and it's configuration

 I have two suggestion on this.


 a.) As BAM publishing is common for most of the product define new
 element called Analytic under carbon.xml to hold above common
 configurations.

 b.) Alternatively define bam.xml file to hold above common
 configurations.


 WDYT ?


 NOTE - I only considered BAM but I guess we can consider CEP as well.


 Thanks !
 --
 Sagara Gunathunga

 Senior Technical Lead; WSO2, Inc.;  http://wso2.com
 V.P Apache Web Services;http://ws.apache.org/
 Linkedin; http://www.linkedin.com/in/ssagara
 Blog ;  http://ssagara.blogspot.com




 --
 *Afkham Azeez*
 Director of Architecture; WSO2, Inc.; http://wso2.com
 Member; Apache Software Foundation; http://www.apache.org/
 * http://www.apache.org/*
 *email: **az...@wso2.com* az...@wso2.com
 * cell: +94 77 3320919 %2B94%2077%203320919 blog: *
 *http://blog.afkham.org* http://blog.afkham.org
 *twitter: **http://twitter.com/afkham_azeez*
 http://twitter.com/afkham_azeez
 * linked-in: **http://lk.linkedin.com/in/afkhamazeez
 http://lk.linkedin.com/in/afkhamazeez*

 *Lean . Enterprise . Middleware*




 --
 
 Director, Research, WSO2 Inc.
 Visiting Faculty, University of Moratuwa
 Member, Apache Software Foundation
 Research Scientist, Lanka Software Foundation
 Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
 Site: http://people.apache.org/~hemapani/
 Photos: http://www.flickr.com/photos/hemapani/
 Phone: 0772360902




 --
 *Afkham Azeez*
 Director of Architecture; WSO2, Inc.; http://wso2.com
 Member; Apache Software Foundation; http://www.apache.org/
 * http://www.apache.org/*
 *email: **az...@wso2.com* az...@wso2.com
 * cell: +94 77 3320919 %2B94%2077%203320919 blog: *
 *http://blog.afkham.org* http://blog.afkham.org
 *twitter: **http://twitter.com/afkham_azeez*
 http://twitter.com/afkham_azeez
 * linked-in: **http://lk.linkedin.com/in/afkhamazeez
 http://lk.linkedin.com/in/afkhamazeez*

 *Lean . Enterprise . Middleware*




 --
 Sagara Gunathunga

 Senior Technical Lead; WSO2, Inc.;  http://wso2.com
 V.P Apache Web Services;http://ws.apache.org/
 Linkedin; http://www.linkedin.com/in/ssagara
 Blog ;  http://ssagara.blogspot.com




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware

Re: [Architecture] Machine Learning using Apache Mahout for WSO2 BAM

2014-06-30 Thread Anjana Fernando
Hi,

I was simply thinking, the UDF could directly mapped to some basic Mahout
operation it implements, and the input/output should be given as parameters
to the UDF, so probably, we can publish some input data beforehand to
Cassandra etc.. and give the location of that data to the UDF, and the UDF
will, as it is called, create the map/reduce jobs and execute.

Cheers,
Anjana.


On Tue, Jul 1, 2014 at 9:18 AM, Srinath Perera srin...@wso2.com wrote:

 +1 we wanted to explore that more.

 However, It is not a simple UDF as this is a stateful op where we feed lot
 of data and start a separate map reduce process. Anjana, do you have any
 thoughts on how it can be done?


 On Tue, Jul 1, 2014 at 5:37 AM, Anjana Fernando anj...@wso2.com wrote:

 Hi,

 I'm just wondering if we have any way to integrate this to Hive itself
 (UDF?), to get results of an ML algorithm run, to a result there. A similar
 scenario is possible in Shark/MLlib integration.

 Cheers,
 Anjana.


 On Mon, Jun 30, 2014 at 12:28 PM, Supun Sethunga sup...@wso2.com wrote:

 Hi,

 Im working on the $subject, and the objective is to apply Machine
 Learning algorithms on the data stored by WSO2 BAM. Apache Mahout will be
 used as the ML tool, for this purpose.

 As per the discussion I had With Srinath, the procedure for $subject
 would be:

- Test a Machine Learning algorithm using Mahout libraries within
Java.
- Implement a RESTful service which provides the above functionality.
- Since Mahout also uses Hadoop, the above service can send Map
Reduce Jobs to the Hadoop built inside the BAM.
- Deploy the service as a Carbon Component on WSO2 BAM.

 The first step is completed for now.
 Any feedback is highly appreciated.

 Thanks,
 Supun

 --
 *Supun Sethunga*
 Software Engineer
 WSO2, Inc.
 lean | enterprise | middleware
 Mobile : +94 716546324




 --
 *Anjana Fernando*
 Senior Technical Lead
 WSO2 Inc. | http://wso2.com
 lean . enterprise . middleware




 --
 
 Director, Research, WSO2 Inc.
 Visiting Faculty, University of Moratuwa
 Member, Apache Software Foundation
 Research Scientist, Lanka Software Foundation
 Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
 Site: http://people.apache.org/~hemapani/
 Photos: http://www.flickr.com/photos/hemapani/
 Phone: 0772360902




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Machine Learning using Apache Mahout for WSO2 BAM

2014-06-30 Thread Anjana Fernando
I see, sure, I was thinking of doing all the operations, including the
training operations using an UDF. Will come and meet you.

Cheers,
Anjana.


On Tue, Jul 1, 2014 at 9:56 AM, Srinath Perera srin...@wso2.com wrote:

 No, we need to get the data, preprocess them using hive, and send all the
 data (not 1-2 values, rather say 10 millions values) to training phase.
 Lets chat f2f a bit.

 --Srinath


 On Tue, Jul 1, 2014 at 6:24 AM, Anjana Fernando anj...@wso2.com wrote:

 Hi,

 I was simply thinking, the UDF could directly mapped to some basic Mahout
 operation it implements, and the input/output should be given as parameters
 to the UDF, so probably, we can publish some input data beforehand to
 Cassandra etc.. and give the location of that data to the UDF, and the UDF
 will, as it is called, create the map/reduce jobs and execute.

 Cheers,
 Anjana.


 On Tue, Jul 1, 2014 at 9:18 AM, Srinath Perera srin...@wso2.com wrote:

 +1 we wanted to explore that more.

 However, It is not a simple UDF as this is a stateful op where we feed
 lot of data and start a separate map reduce process. Anjana, do you have
 any thoughts on how it can be done?


 On Tue, Jul 1, 2014 at 5:37 AM, Anjana Fernando anj...@wso2.com wrote:

 Hi,

 I'm just wondering if we have any way to integrate this to Hive itself
 (UDF?), to get results of an ML algorithm run, to a result there. A similar
 scenario is possible in Shark/MLlib integration.

 Cheers,
 Anjana.


 On Mon, Jun 30, 2014 at 12:28 PM, Supun Sethunga sup...@wso2.com
 wrote:

 Hi,

 Im working on the $subject, and the objective is to apply Machine
 Learning algorithms on the data stored by WSO2 BAM. Apache Mahout will be
 used as the ML tool, for this purpose.

 As per the discussion I had With Srinath, the procedure for $subject
 would be:

- Test a Machine Learning algorithm using Mahout libraries within
Java.
- Implement a RESTful service which provides the above
functionality.
- Since Mahout also uses Hadoop, the above service can send Map
Reduce Jobs to the Hadoop built inside the BAM.
- Deploy the service as a Carbon Component on WSO2 BAM.

 The first step is completed for now.
 Any feedback is highly appreciated.

 Thanks,
 Supun

 --
 *Supun Sethunga*
 Software Engineer
 WSO2, Inc.
 lean | enterprise | middleware
 Mobile : +94 716546324




 --
 *Anjana Fernando*
 Senior Technical Lead
 WSO2 Inc. | http://wso2.com
 lean . enterprise . middleware




 --
 
 Director, Research, WSO2 Inc.
 Visiting Faculty, University of Moratuwa
 Member, Apache Software Foundation
 Research Scientist, Lanka Software Foundation
 Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
 Site: http://people.apache.org/~hemapani/
 Photos: http://www.flickr.com/photos/hemapani/
 Phone: 0772360902




 --
 *Anjana Fernando*
 Senior Technical Lead
 WSO2 Inc. | http://wso2.com
 lean . enterprise . middleware




 --
 
 Director, Research, WSO2 Inc.
 Visiting Faculty, University of Moratuwa
 Member, Apache Software Foundation
 Research Scientist, Lanka Software Foundation
 Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
 Site: http://people.apache.org/~hemapani/
 Photos: http://www.flickr.com/photos/hemapani/
 Phone: 0772360902




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Implementing Datasource Deployer

2014-06-13 Thread Anjana Fernando
/chintana
 linkedin: http://www.linkedin.com/in/engwar
 twitter: twitter.com/std_err

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 Sagara Gunathunga

 Senior Technical Lead; WSO2, Inc.;  http://wso2.com
 V.P Apache Web Services;http://ws.apache.org/
 Linkedin; http://www.linkedin.com/in/ssagara
 Blog ;  http://ssagara.blogspot.com


 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --

 Thanks  regards,
 Nirmal

 Senior Software Engineer- Platform Technologies Team, WSO2 Inc.
 Mobile: +94715779733
 Blog: http://nirmalfdo.blogspot.com/



 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 Rajith Vitharana

 Software Engineer,
 WSO2 Inc. : wso2.com
 Mobile : +94715883223
 Blog : http://lankavitharana.blogspot.com/




 --
 S.Uthaiyashankar
 VP Engineering
 WSO2 Inc.
 http://wso2.com/ - lean . enterprise . middleware

 Phone: +94 714897591

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Data Storage Architecture Change in BAM

2014-06-09 Thread Anjana Fernando
Hi Srinath,

On Mon, Jun 9, 2014 at 10:31 AM, Srinath Perera srin...@wso2.com wrote:

 Hi Anjana,


 * No support for other data stores for storing events

 Yes we need to support RDBMS and Hive

 * Toolboxes being bound to certain types of data sources
 Need to fix this.

 * Transports
 IMO, for this we can depend on ESB to support other transports for near
 future. Need to make sure ESB thrift mediator is working smoothly.

 Please note above should go in the release following the toolboxes, not
 the immediate release which will add toolboxes. We should only allocate
 people AFTER tool boxes are out.


Product toolboxes are a matter of coordinating with the product teams,
which we are starting to do, and the product teams will be owning their
respective toolboxes. But the BAM team itself, can be allocated to do
product features. So anyways, we can start building the product toolboxes
with the Cassandra storage handler etc.. and migrate to the new
architecture when it is finalized.

Cheers,
Anjana.



 --Srinath





 On Fri, Jun 6, 2014 at 3:31 PM, Anjana Fernando anj...@wso2.com wrote:

 Hi,

 The BAM team has been looking into some ways in improving the current
 approach in we handling the operations in the data layer. So I will here
 explain the issues we have because of the current BAM architecture, and
 propose a solution to remedy this.

 Issues
 =

 * No support for other data stores for storing events

 At the moment, we are strictly limited to storing events into Cassandra,
 but there have been strong interest in using other types of data stores
 such as MongoDB, RDBMS etc.. specially because of easy of use for some
 users to use their existing databases and so on. And also, in order for BAM
 functionality to be embeddable to other products, this support is critical,
 for example, as a light-weight analytics solution, people should be able to
 use an RDBMS based solution.

 * Toolboxes being bound to certain types of data sources

 This is the case where, we assume we always retrieve data from Cassandra
 and write to some certain RDBMS. This approach does not scale, specially
 for WSO2 product related toolboxes we have / we going to have, because
 then, the toolboxes are limited to a certain specific combination of
 databases, and we will then need to support a different versions of
 toolboxes for each database combination, which is not practical to
 maintain, and also a huge effort will be spent on testing these each time.

 * Multi-tenancy limitations

 At the moment, we use our own MT Cassandra to store the events
 tenant-wise, and because of this, we cannot use any other Cassandra
 distribution that is out there to implement MT features. So effectively,
 anyone who may use their own Cassandra installation cannot use MT features.
 Which makes the BAM product inconsistent with its features. So ideally, we
 should support anyone having their own Cassandra, or actually any type of
 database that is supported without any special modifications for MT.

 * Transports

 CEP introduced a new architecture on defining transports/data formats in
 the system. And there are many transports such as HTTP/JMS etc.. with data
 types such as XML/Text/JSON available to get events in. But BAM is limited
 to using the Thrift transport, where, because we explicitly needs
 authentication support from the transport, because that is how we
 authenticate to Cassandra data store. So we cannot use any other transport,
 because we cannot authenticate to our data store. But ideally, what we need
 is, a way to have a default system user for a tenant, where by only
 figuring out the tenant this request belongs to, we should be able to write
 the events to the data store. For example, we can use a JMS queue, where we
 can use the data from that to write to super-tenant's space.

 Also, in toolboxes, the stream definitions needs to contain a
 username/password pair to create streams and their respective
 representation in the data store, ideally, it should be just, identify the
 tenant the toolbox should be deployed and just do the data operations that
 is needed internally.

 Solution
 ==

 So the proposed solution, is to create a clear data abstraction layer for
 BAM. Rather than having just having Cassandra and some other RDBMS for
 storage of events and analyzed data, we propose having a single interface
 called AnalyticsDataStore to keep all the required data and its metadata.
 This would be the store used to store all the events coming into BAM and
 also the place to put summarized data. So basically AnalyticsDataStore will
 have several implementations, with backing data stores such as Cassandra,
 MongoDB and RDBMS. And, the data bridge connector for BAM will be
 implemented to simply write data to AnalyticsDataStore, and also, we will
 be having a Hive storage handler called AnalyticsDataStoreStorageHandler
 which reads and writes data to our common data store. So basically, users
 will have no idea about

[Architecture] Data Storage Architecture Change in BAM

2014-06-06 Thread Anjana Fernando
 store for summarized data as well, it wont goto
the usual RDBMS based tables, where the earlier point there was, many tools
can be already used to visualize data from RDBMS tables. But this
requirement will be reduced, where in BAM itself, we are going to provide
rich visualization support with UES. And also, AnalyticsDataStore
functionalities will also be exposed from a well defined REST API and a
Java API, so external tools also can access this data if needed. And also,
functionalities such as data archival will also use this interface, rather
than directly going to the back-end data store.  And also, because of this
centralized API based data access, multi-tenancy aspects can be implemented
as an implementation detail, where we are free to store the data in any
structure we want internally, for example, for Cassandra, we can keep a
single admin user in a configuration file, and store all the tenant based
data in a single space.

And also, now users will not directly go to the backend data store to
browse for data and all, they will simply use the API with the proper user
credentials to retrieve/update data. So then, we should also remove data
store specific tools such as Cassandra explorer and so on from BAM,
because, browsing the raw data there may not make sense to the users. And
anyway, we should not keep any data store specific tools, since we will be
supporting many. So at the end, the aim is to possibly solve all the issues
mentioned earlier, with the suggested layered approach, to ultimately
create a much more stable and a functional BAM. Any comments on this idea
is appreciated.

Cheers,
Anjana.
-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Grouping options for BAM Message Tracer

2014-04-21 Thread Anjana Fernando
Hi Manoj,

Yeah, that's a good idea. Basically I guess, there are two parts, first is,
how we propagate the correlation property / activity id (grouping value),
that is, where in the message between nodes do we set it, at the moment,
for HTTP, we have a specific HTTP header, and for HL7, we use a specific
attribute in the HL7 message. So probably, we can make it customizable, for
example, for XML messages, to give an XPath expression to extract a
specific element value as the the activity id. So possibly we can put that
as a feature in the message tracer agent we have. Also, the other part is
in the stream definition, we can let the user select, which property is to
be used as the activity id, rather than using a single well known property
id called activityId. So at the end, the stream events can have a
meaningful property name for the correlation id such as orderId and so on.

I've created JIRAs [1] and [2] to track these.

[1] https://wso2.org/jira/browse/BAM-1575
[2] https://wso2.org/jira/browse/BAM-1576

Cheers,
Anjana.


On Sat, Apr 19, 2014 at 1:22 PM, Manoj Fernando man...@wso2.com wrote:

 Folks,

 Following a chat I had with Anjana and Shankar, some thoughts for
 improving the BAM Message Tracer.

 As of now, a message is traced by a unique ID that is assigned by the the
 tracer handler (if not already set).  This unique ID is what BAM Activity
 Dashboard uses for grouping all correlated messages (in/out).   However,
 there can be situations where we might need to use a more business related
 parameter for grouping (take a mobile device ID + session ID for example)
 which is likely to exist on the header or the message body.

 To support this, we can use two options.

 1. Provide a feature on Message Tracer to let users specify expressions to
 extract this uniqueID or a combination of them.
 2. Let users specify which parameters are 'group-able' (either from header
 or payload), and use them in the stream definition so that on the Dashboard
 we can support multiple grouping options.

 We created an Axis2 module to do something similar (more closer to Option
 1) for a PoC, so we can reuse some of that stuff.

 Regards,
 Manoj

 --
 Manoj Fernando
 Director - Solutions Architecture

 Contact:
 LK -  +94 112 145345
 Mob: +94 773 759340
 www.wso2.com




-- 
*Anjana Fernando*
Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Integrating ntask component into ESB

2014-04-13 Thread Anjana Fernando
Obviously, check if that class is available and where it is referred from
in the code. As I remember, there isn't a package called ntaskint, so
check where this is coming from.

Cheers,
Anjana.


On Sat, Apr 12, 2014 at 6:46 AM, Ishan Jayawardena is...@wso2.com wrote:

 We developed the quartz task manager and we are currently working on the
 ntask task manager. While developing the task handling component that uses
 ntask, we observed that we cannot schedule a task in it due to a class not
 found error. See the below error message. The ntask component (which is
 used by the component that we are currently writing) cannot load the actual
 task implementation. Does anyone know how to get rid of this?

 java.lang.ClassNotFoundException: class org.wso2.carbon.ntaskint.core.Task
  at
 org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:501)
  at
 org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:421)
 at
 org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:412)
  at
 org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.loadClass(DefaultClassLoader.java:107)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 at
 org.wso2.carbon.ntask.core.impl.TaskQuartzJobAdapter.execute(TaskQuartzJobAdapter.java:58)
  at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
  at java.lang.Thread.run(Thread.java:662)
 Thanks,
 Ishan.



 On Mon, Apr 7, 2014 at 9:11 AM, Anjana Fernando anj...@wso2.com wrote:

 Hi Paul,

 Task Server is actually another server itself. NTask component is the
 task scheduling component we put to all our Carbon server when we need
 distributed task scheduling functionality. That component support
 scheduling tasks in a standalone manner (in a single server), or in a
 clustered mode for the distributed nature (it does the coordination using
 Hazelcast), or else, also a remote mode where it can interface with an
 external Task Server.

 So basically the full required functionality of distributed tasks can be
 achieved with the ntask component working in the clustered mode, where it
 identifies all the participating servers in the cluster and do the proper
 fail-over/load balanced scheduling of scheduled tasks. And they schedule
 the tasks themselves using their internal Quartz functionality. With TS,
 all the task triggering is offloaded to TS, where it will be sending HTTP
 messages to each server saying to execute the tasks. This should happen
 through the LB as I explained in the earlier mail.

 So basically Task Server = ntask component + remote tasks component. What
 any other Carbon server will need is just the ntask component for full task
 scheduling functionality.

 Cheers,
 Anjana.


 On Sat, Apr 5, 2014 at 1:43 PM, Paul Fremantle p...@wso2.com wrote:

 Can someone clarify? I'm lost but I really don't understand why we are
 creating any other approach than task server. It is the only approach that
 scales clearly. Is our task server code too heavyweight?

 Paul


 On 5 April 2014 08:47, Chanaka Fernando chana...@wso2.com wrote:

 Hi Kasun/Anjana,

 I think what Anjana mentioned and Ishan mentioned are somewhat converge
 to same idea (even though they looks different).

 What we have discussed and agreed was that we are developing a separate
 carbon-component which is used for executing the ntask component. Since we
 need a common interface to support both the existing quartz based
 synapse-tasks implementation and the ntask component, we have defined the
 TaskManager interface.

 When ESB is loading the synapse configuration, it will create an object
 of type TaskManager according to the Task provider mentioned in the
 configuration. This task manager object will delegate the scheduling and
 other task related stuff to the respective implementation of the
 TaskManager (which can be either QuartzTaskManager or NTaskManager).

 @Kasun/Anjana: are we missing something here?


 Thanks,
 Chanaka


 On Sat, Apr 5, 2014 at 9:32 AM, Kasun Indrasiri ka...@wso2.com wrote:




 On Sat, Apr 5, 2014 at 9:22 AM, Anjana Fernando anj...@wso2.comwrote:

 Hi Ishan,

 On Sat, Apr 5, 2014 at 7:33 AM, Ishan Jayawardena is...@wso2.comwrote:

 Currently, we have developed following design and started to work on
 it.

 Synapse will have defined the TaskManager, and Task interfaces whose
 implementations will provide the concrete tasks and management of those
 tasks depending on the scheduler(ie quartz or ntask).
 For instance, for inbuilt quartz based task scheduling, we
 will refactor and develop a quartz task manager, and a task type while

Re: [Architecture] Shipping Streaming JSON Builder/Formatter by Default in Kernel axis2.xml

2014-03-21 Thread Anjana Fernando
Hi Manoj,

Attached the new patch in the issue and also sent a pull request for GitHub.

Cheers,
Anjana.


On Fri, Mar 21, 2014 at 2:49 PM, Manoj Kumara ma...@wso2.com wrote:

 Hi Anjana,

 Yes tenant-axis2.xml file. Sorry for that.

 Thanks,
 Manoj


 *Manoj Kumara*
 Software Engineer
 WSO2 Inc. http://wso2.com/
 *lean.enterprise.middleware*
 Mobile: +94713448188


 On Fri, Mar 21, 2014 at 2:36 PM, Anjana Fernando anj...@wso2.com wrote:

 Hi Manoj,

 Sure will do, and I'm guessing you mean tenant-axis2.xml, since we are
 not doing this change to axis2_client.xml.

 Cheers,
 Anjana.


 On Fri, Mar 21, 2014 at 2:20 PM, Manoj Kumara ma...@wso2.com wrote:

 Hi Anjana,

 Can you please add the diff relevant to axis2_client.xml as well.
 Please send a pull request to wso2-dev repo on [1] as well.

 [1] https://github.com/wso2-dev/carbon4-kernel

 Thanks,
 Manoj


 *Manoj Kumara*
 Software Engineer
 WSO2 Inc. http://wso2.com/
 *lean.enterprise.middleware*
 Mobile: +94713448188


 On Wed, Mar 19, 2014 at 10:56 AM, Anjana Fernando anj...@wso2.comwrote:

 Hi Manoj,

 Not the axis2_client.xml, since ESB is using it, and other servers like
 DSS and AS will not be using it, which is what this is mainly aimed for, so
 lets not change that now. As for tenant-axis2.xml, what does that do? .. is
 it the same as standard axis2.xml for tenants or something.

 Cheers,
 Anjana.


 On Wed, Mar 19, 2014 at 10:50 AM, Manoj Kumara ma...@wso2.com wrote:

 Hi Anjana,

 I committed the fix relevant to axis2.xml to patch0006 with r198653.

 Should we need to apply this change to *axis2_client.xml,
 tenant-axis2.xml *configuration files also ?

 Thanks,
 Manoj


 *Manoj Kumara*
 Software Engineer
 WSO2 Inc. http://wso2.com/
 *lean.enterprise.middleware*
 Mobile: +94713448188


 On Tue, Mar 18, 2014 at 7:46 PM, Anjana Fernando anj...@wso2.comwrote:

 Hi Sameera / Carbon Team,

 Can you please apply the patch at [1], to patch0006 in Turing branch,
 Carbon 4.3.0 and the trunk.

 [1] https://wso2.org/jira/browse/CARBON-14738

 Cheers,
 Anjana.


 On Tue, Mar 18, 2014 at 7:15 PM, Sagara Gunathunga 
 sag...@wso2.comwrote:




 On Tue, Mar 18, 2014 at 7:06 PM, Anjana Fernando anj...@wso2.comwrote:

 Hi,

 In an offline chat with Sameera, Shameera and Sagara, we decided we
 will put it in the kernel's axis2.xml, since many products can benefit 
 from
 the new message builder/receiver, and for ESB, for the moment, they 
 will
 retain the older settings with their own axis2.xml and later possibly 
 come
 with a solution for both scenarios to work.


  Proposed new JSON Builder/Formatter are much effective if the
 underline server is the final destination but for ESB this is not the 
 case
 hence we don't need to apply this change to ESB.

 Thanks !



 Cheers,
 Anjana.


 On Tue, Mar 18, 2014 at 6:07 PM, Anjana Fernando 
 anj...@wso2.comwrote:

 Hi,

 OK, so for now, I will put the changes for DSS product, Sagara,
 shall we put the same changes for AS as well, I guess AS 
 functionality will
 not be affected by the new builder/formatter. As for ESB having data
 services features, there is no straightforward way to make it work 
 now, so
 we can say, if proper JSON mapping is needed for data services, 
 either DSS
 or AS have to be used and it wont be possible to embed this in the 
 ESB.

 Cheers,
 Anjana.


 On Tue, Mar 18, 2014 at 5:06 PM, Ishan Jayawardena is...@wso2.com
  wrote:

 Several basic ESB mediators depend on the message built by ESB's
 existing JSON message builder (implemented in Synapse), so switching 
 to
 this new message builder will break them.
 If we need to make DSS features work in ESB, we have to rebuild
 the message for DSS after it has been first built by ESB's builder.
 Similarly, we have to handle the formatter flow.

 Thanks,
 Ishan.



 On Tue, Mar 18, 2014 at 3:58 PM, Anjana Fernando anj...@wso2.com
  wrote:

 Hi,

 Yeah, but in the ESB case, it will be a bit tricky, where the
 WSDL they create by default for proxy services actually create a 
 mediate
 operation and all, so unless the incoming message actually have a 
 mediate
 wrapper in the message, the message builder will fail. So maybe we 
 should
 have like a axis2.xml parameter to say, for these type of axis2 
 services,
 ignore the schema definition, but then again, the streaming message 
 builder
 actually fully depends on the schema to actually do the streaming 
 and to
 build the message, so not sure how feasible this would be. Maybe, 
 in the
 new message builder, it can revert back to the older message 
 builder's
 implementation, if he can see that the service  dispatching has 
 already
 happened earlier, probably through the URL based dispatcher, and if 
 it can
 find out that, for this service/service-type, it is not suppose to 
 use the
 schema based parsing of the message.

 Cheers,
 Anjana.


 On Tue, Mar 18, 2014 at 3:46 PM, Sameera Jayasoma 
 same...@wso2.com wrote:

 Hi Anjana/Shameera,

 Great stuff. Now we have a proper JSON support

[Architecture] Shipping Streaming JSON Builder/Formatter by Default in Kernel axis2.xml

2014-03-18 Thread Anjana Fernando
Hi,

We've added JSON mapping support for DSS, which is mentioned in the mail
with subject JSON Mapping Support for Data Services. For this, I've used
the GSON based streaming JSON message builder/formatter, where this was
needed for a correct JSON message generation by looking at the service
schema. There were some fixes done by Shameera lately, and this is working
properly now for all of the scenarios I've tested. So shall we ship this
message builder/formatter by default from the axis2.xml in the kernel, so
all the products, including AS and DSS will get this feature. It will be
specifically required by AS, as it still contains the data services
features.

And for ESB, I'm not sure how the new message builder/formatter would work,
since they will not always have correct service schemas in proxy services
etc.. so I guess those scenarios may fail, maybe Shameera can give some
insight on this more. Anyways, the ESB has their own axis2.xml, so they
will not be affected.

So shall we go ahead in updating the kernel's axis2.xml to contain the
following sections? ..

messageFormatter contentType=application/json
 class=org.apache.axis2.json.gson.JsonFormatter /

 messageBuilder contentType=application/json
 class=org.apache.axis2.json.gson.JsonBuilder /

Cheers,
Anjana
-- 
*Anjana Fernando*
Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Shipping Streaming JSON Builder/Formatter by Default in Kernel axis2.xml

2014-03-18 Thread Anjana Fernando
Hi,

Yeah, but in the ESB case, it will be a bit tricky, where the WSDL they
create by default for proxy services actually create a mediate operation
and all, so unless the incoming message actually have a mediate wrapper
in the message, the message builder will fail. So maybe we should have like
a axis2.xml parameter to say, for these type of axis2 services, ignore the
schema definition, but then again, the streaming message builder actually
fully depends on the schema to actually do the streaming and to build the
message, so not sure how feasible this would be. Maybe, in the new message
builder, it can revert back to the older message builder's implementation,
if he can see that the service  dispatching has already happened earlier,
probably through the URL based dispatcher, and if it can find out that, for
this service/service-type, it is not suppose to use the schema based
parsing of the message.

Cheers,
Anjana.


On Tue, Mar 18, 2014 at 3:46 PM, Sameera Jayasoma same...@wso2.com wrote:

 Hi Anjana/Shameera,

 Great stuff. Now we have a proper JSON support in Axis2.

 But we need to think carefully before adding this formatter and the
 builder as the default builder/formatter for the application/json content
 type. I think we need to fix this JSON support to work in ESB as well.
 Otherwise users will not be able to deploy data services features ESB.

 If we improve this JSON support to handle xsd:any type then we should be
 able to support proxy services case.

 Lets fix this to work in ESB as well and then we can commit it to the
 Kernel.

 Thanks,
 Sameera.




 On Tue, Mar 18, 2014 at 2:28 PM, Shameera Rathnayaka shame...@wso2.comwrote:

 Hi Anjana et al,

 Above new JSON implementation has been introduced to handle XML -- JSON
 lossless transformation. and this implementation highly depend on the
 schema definitions, where it generate the message structure by reading this
 schemas.  In short, to work XML stream base JSON implementation we need to
 have proper schema definition for in and out messages otherwise it won't
 work.

 Addition to the above entries we need to do following changes to
 axis2.xml file in order to integrate above implementation.

 Remove RequestURIOperationDispatcher handler from dispatch phase and
 place it as the last handler in transport phase. IMO it is ok to move
 RequestURIOperationDispatcher  to transport phase as we are dealing with
 URI.

  Now add new JSONMessageHandler after the RequestURIOperationDispatcher.
 Finally transport phase would be like following

 phaseOrder type=InFlow
 !--  System predefined phases  --
 phase name=Transport
 -
 handler name=RequestURIOperationDispatcher

 class=org.apache.axis2.dispatchers.RequestURIOperationDispatcher/
 handler name=JSONMessageHandler

 class=org.apache.axis2.json.gson.JSONMessageHandler /
 /phase
 
 /phaseOrder

 Thanks,
 Shameera.



 On Tue, Mar 18, 2014 at 1:40 PM, Anjana Fernando anj...@wso2.com wrote:

 Hi,

 We've added JSON mapping support for DSS, which is mentioned in the mail
 with subject JSON Mapping Support for Data Services. For this, I've used
 the GSON based streaming JSON message builder/formatter, where this was
 needed for a correct JSON message generation by looking at the service
 schema. There were some fixes done by Shameera lately, and this is working
 properly now for all of the scenarios I've tested. So shall we ship this
 message builder/formatter by default from the axis2.xml in the kernel, so
 all the products, including AS and DSS will get this feature. It will be
 specifically required by AS, as it still contains the data services
 features.

 And for ESB, I'm not sure how the new message builder/formatter would
 work, since they will not always have correct service schemas in proxy
 services etc.. so I guess those scenarios may fail, maybe Shameera can give
 some insight on this more. Anyways, the ESB has their own axis2.xml, so
 they will not be affected.

 So shall we go ahead in updating the kernel's axis2.xml to contain the
 following sections? ..

 messageFormatter contentType=application/json
  class=org.apache.axis2.json.gson.JsonFormatter /

  messageBuilder contentType=application/json
  class=org.apache.axis2.json.gson.JsonBuilder /

 Cheers,
 Anjana
 --
 *Anjana Fernando*
 Technical Lead
 WSO2 Inc. | http://wso2.com
 lean . enterprise . middleware




 --
 *Software Engineer - WSO2 Inc.*
 *email: shameera AT wso2.com shame...@wso2.com , shameera AT apache.org
 shame...@apache.org*
 *phone:  +9471 922 1454 %2B9471%20922%201454*

 *Linked in : *http://lk.linkedin.com/pub/shameera-rathnayaka/1a/661/561
 *Twitter : *https://twitter.com/Shameera_R




 --
 Sameera Jayasoma,
 Software Architect,

 WSO2, Inc. (http://wso2.com)
 email: same...@wso2.com
 blog: http://sameera.adahas.org
 twitter

Re: [Architecture] Shipping Streaming JSON Builder/Formatter by Default in Kernel axis2.xml

2014-03-18 Thread Anjana Fernando
Hi,

OK, so for now, I will put the changes for DSS product, Sagara, shall we
put the same changes for AS as well, I guess AS functionality will not be
affected by the new builder/formatter. As for ESB having data services
features, there is no straightforward way to make it work now, so we can
say, if proper JSON mapping is needed for data services, either DSS or AS
have to be used and it wont be possible to embed this in the ESB.

Cheers,
Anjana.


On Tue, Mar 18, 2014 at 5:06 PM, Ishan Jayawardena is...@wso2.com wrote:

 Several basic ESB mediators depend on the message built by ESB's existing
 JSON message builder (implemented in Synapse), so switching to this new
 message builder will break them.
 If we need to make DSS features work in ESB, we have to rebuild the
 message for DSS after it has been first built by ESB's builder. Similarly,
 we have to handle the formatter flow.

 Thanks,
 Ishan.



 On Tue, Mar 18, 2014 at 3:58 PM, Anjana Fernando anj...@wso2.com wrote:

 Hi,

 Yeah, but in the ESB case, it will be a bit tricky, where the WSDL they
 create by default for proxy services actually create a mediate operation
 and all, so unless the incoming message actually have a mediate wrapper
 in the message, the message builder will fail. So maybe we should have like
 a axis2.xml parameter to say, for these type of axis2 services, ignore the
 schema definition, but then again, the streaming message builder actually
 fully depends on the schema to actually do the streaming and to build the
 message, so not sure how feasible this would be. Maybe, in the new message
 builder, it can revert back to the older message builder's implementation,
 if he can see that the service  dispatching has already happened earlier,
 probably through the URL based dispatcher, and if it can find out that, for
 this service/service-type, it is not suppose to use the schema based
 parsing of the message.

 Cheers,
 Anjana.


 On Tue, Mar 18, 2014 at 3:46 PM, Sameera Jayasoma same...@wso2.comwrote:

 Hi Anjana/Shameera,

 Great stuff. Now we have a proper JSON support in Axis2.

 But we need to think carefully before adding this formatter and the
 builder as the default builder/formatter for the application/json content
 type. I think we need to fix this JSON support to work in ESB as well.
 Otherwise users will not be able to deploy data services features ESB.

 If we improve this JSON support to handle xsd:any type then we should be
 able to support proxy services case.

 Lets fix this to work in ESB as well and then we can commit it to the
 Kernel.

 Thanks,
 Sameera.




 On Tue, Mar 18, 2014 at 2:28 PM, Shameera Rathnayaka 
 shame...@wso2.comwrote:

 Hi Anjana et al,

 Above new JSON implementation has been introduced to handle XML --
 JSON lossless transformation. and this implementation highly depend on the
 schema definitions, where it generate the message structure by reading this
 schemas.  In short, to work XML stream base JSON implementation we need to
 have proper schema definition for in and out messages otherwise it won't
 work.

 Addition to the above entries we need to do following changes to
 axis2.xml file in order to integrate above implementation.

 Remove RequestURIOperationDispatcher handler from dispatch phase and
 place it as the last handler in transport phase. IMO it is ok to move
 RequestURIOperationDispatcher  to transport phase as we are dealing with
 URI.

  Now add new JSONMessageHandler after the
 RequestURIOperationDispatcher. Finally transport phase would be like
 following

 phaseOrder type=InFlow
 !--  System predefined phases  --
 phase name=Transport
 -
 handler name=RequestURIOperationDispatcher

 class=org.apache.axis2.dispatchers.RequestURIOperationDispatcher/
 handler name=JSONMessageHandler

 class=org.apache.axis2.json.gson.JSONMessageHandler /
 /phase
 
 /phaseOrder

 Thanks,
 Shameera.



 On Tue, Mar 18, 2014 at 1:40 PM, Anjana Fernando anj...@wso2.comwrote:

 Hi,

 We've added JSON mapping support for DSS, which is mentioned in the
 mail with subject JSON Mapping Support for Data Services. For this, I've
 used the GSON based streaming JSON message builder/formatter, where this
 was needed for a correct JSON message generation by looking at the service
 schema. There were some fixes done by Shameera lately, and this is working
 properly now for all of the scenarios I've tested. So shall we ship this
 message builder/formatter by default from the axis2.xml in the kernel, so
 all the products, including AS and DSS will get this feature. It will be
 specifically required by AS, as it still contains the data services
 features.

 And for ESB, I'm not sure how the new message builder/formatter would
 work, since they will not always have correct service schemas in proxy
 services etc.. so I guess those scenarios may fail

Re: [Architecture] Shipping Streaming JSON Builder/Formatter by Default in Kernel axis2.xml

2014-03-18 Thread Anjana Fernando
Hi,

In an offline chat with Sameera, Shameera and Sagara, we decided we will
put it in the kernel's axis2.xml, since many products can benefit from the
new message builder/receiver, and for ESB, for the moment, they will retain
the older settings with their own axis2.xml and later possibly come with a
solution for both scenarios to work.

Cheers,
Anjana.


On Tue, Mar 18, 2014 at 6:07 PM, Anjana Fernando anj...@wso2.com wrote:

 Hi,

 OK, so for now, I will put the changes for DSS product, Sagara, shall we
 put the same changes for AS as well, I guess AS functionality will not be
 affected by the new builder/formatter. As for ESB having data services
 features, there is no straightforward way to make it work now, so we can
 say, if proper JSON mapping is needed for data services, either DSS or AS
 have to be used and it wont be possible to embed this in the ESB.

 Cheers,
 Anjana.


 On Tue, Mar 18, 2014 at 5:06 PM, Ishan Jayawardena is...@wso2.com wrote:

 Several basic ESB mediators depend on the message built by ESB's existing
 JSON message builder (implemented in Synapse), so switching to this new
 message builder will break them.
 If we need to make DSS features work in ESB, we have to rebuild the
 message for DSS after it has been first built by ESB's builder. Similarly,
 we have to handle the formatter flow.

 Thanks,
 Ishan.



 On Tue, Mar 18, 2014 at 3:58 PM, Anjana Fernando anj...@wso2.com wrote:

 Hi,

 Yeah, but in the ESB case, it will be a bit tricky, where the WSDL they
 create by default for proxy services actually create a mediate operation
 and all, so unless the incoming message actually have a mediate wrapper
 in the message, the message builder will fail. So maybe we should have like
 a axis2.xml parameter to say, for these type of axis2 services, ignore the
 schema definition, but then again, the streaming message builder actually
 fully depends on the schema to actually do the streaming and to build the
 message, so not sure how feasible this would be. Maybe, in the new message
 builder, it can revert back to the older message builder's implementation,
 if he can see that the service  dispatching has already happened earlier,
 probably through the URL based dispatcher, and if it can find out that, for
 this service/service-type, it is not suppose to use the schema based
 parsing of the message.

 Cheers,
 Anjana.


 On Tue, Mar 18, 2014 at 3:46 PM, Sameera Jayasoma same...@wso2.comwrote:

 Hi Anjana/Shameera,

 Great stuff. Now we have a proper JSON support in Axis2.

 But we need to think carefully before adding this formatter and the
 builder as the default builder/formatter for the application/json content
 type. I think we need to fix this JSON support to work in ESB as well.
 Otherwise users will not be able to deploy data services features ESB.

 If we improve this JSON support to handle xsd:any type then we should
 be able to support proxy services case.

 Lets fix this to work in ESB as well and then we can commit it to the
 Kernel.

 Thanks,
 Sameera.




 On Tue, Mar 18, 2014 at 2:28 PM, Shameera Rathnayaka shame...@wso2.com
  wrote:

 Hi Anjana et al,

 Above new JSON implementation has been introduced to handle XML --
 JSON lossless transformation. and this implementation highly depend on the
 schema definitions, where it generate the message structure by reading 
 this
 schemas.  In short, to work XML stream base JSON implementation we need to
 have proper schema definition for in and out messages otherwise it won't
 work.

 Addition to the above entries we need to do following changes to
 axis2.xml file in order to integrate above implementation.

 Remove RequestURIOperationDispatcher handler from dispatch phase and
 place it as the last handler in transport phase. IMO it is ok to move
 RequestURIOperationDispatcher  to transport phase as we are dealing with
 URI.

  Now add new JSONMessageHandler after the
 RequestURIOperationDispatcher. Finally transport phase would be like
 following

 phaseOrder type=InFlow
 !--  System predefined phases  --
 phase name=Transport
 -
 handler name=RequestURIOperationDispatcher

 class=org.apache.axis2.dispatchers.RequestURIOperationDispatcher/
 handler name=JSONMessageHandler

 class=org.apache.axis2.json.gson.JSONMessageHandler /
 /phase
 
 /phaseOrder

 Thanks,
 Shameera.



 On Tue, Mar 18, 2014 at 1:40 PM, Anjana Fernando anj...@wso2.comwrote:

 Hi,

 We've added JSON mapping support for DSS, which is mentioned in the
 mail with subject JSON Mapping Support for Data Services. For this, 
 I've
 used the GSON based streaming JSON message builder/formatter, where this
 was needed for a correct JSON message generation by looking at the 
 service
 schema. There were some fixes done by Shameera lately, and this is 
 working
 properly now for all of the scenarios I've tested

Re: [Architecture] Shipping Streaming JSON Builder/Formatter by Default in Kernel axis2.xml

2014-03-18 Thread Anjana Fernando
Hi Sameera / Carbon Team,

Can you please apply the patch at [1], to patch0006 in Turing branch,
Carbon 4.3.0 and the trunk.

[1] https://wso2.org/jira/browse/CARBON-14738

Cheers,
Anjana.


On Tue, Mar 18, 2014 at 7:15 PM, Sagara Gunathunga sag...@wso2.com wrote:




 On Tue, Mar 18, 2014 at 7:06 PM, Anjana Fernando anj...@wso2.com wrote:

 Hi,

 In an offline chat with Sameera, Shameera and Sagara, we decided we will
 put it in the kernel's axis2.xml, since many products can benefit from the
 new message builder/receiver, and for ESB, for the moment, they will retain
 the older settings with their own axis2.xml and later possibly come with a
 solution for both scenarios to work.


  Proposed new JSON Builder/Formatter are much effective if the underline
 server is the final destination but for ESB this is not the case hence we
 don't need to apply this change to ESB.

 Thanks !



 Cheers,
 Anjana.


 On Tue, Mar 18, 2014 at 6:07 PM, Anjana Fernando anj...@wso2.com wrote:

 Hi,

 OK, so for now, I will put the changes for DSS product, Sagara, shall we
 put the same changes for AS as well, I guess AS functionality will not be
 affected by the new builder/formatter. As for ESB having data services
 features, there is no straightforward way to make it work now, so we can
 say, if proper JSON mapping is needed for data services, either DSS or AS
 have to be used and it wont be possible to embed this in the ESB.

 Cheers,
 Anjana.


 On Tue, Mar 18, 2014 at 5:06 PM, Ishan Jayawardena is...@wso2.comwrote:

 Several basic ESB mediators depend on the message built by ESB's
 existing JSON message builder (implemented in Synapse), so switching to
 this new message builder will break them.
 If we need to make DSS features work in ESB, we have to rebuild the
 message for DSS after it has been first built by ESB's builder. Similarly,
 we have to handle the formatter flow.

 Thanks,
 Ishan.



 On Tue, Mar 18, 2014 at 3:58 PM, Anjana Fernando anj...@wso2.comwrote:

 Hi,

 Yeah, but in the ESB case, it will be a bit tricky, where the WSDL
 they create by default for proxy services actually create a mediate
 operation and all, so unless the incoming message actually have a 
 mediate
 wrapper in the message, the message builder will fail. So maybe we should
 have like a axis2.xml parameter to say, for these type of axis2 services,
 ignore the schema definition, but then again, the streaming message 
 builder
 actually fully depends on the schema to actually do the streaming and to
 build the message, so not sure how feasible this would be. Maybe, in the
 new message builder, it can revert back to the older message builder's
 implementation, if he can see that the service  dispatching has already
 happened earlier, probably through the URL based dispatcher, and if it can
 find out that, for this service/service-type, it is not suppose to use the
 schema based parsing of the message.

 Cheers,
 Anjana.


 On Tue, Mar 18, 2014 at 3:46 PM, Sameera Jayasoma same...@wso2.comwrote:

 Hi Anjana/Shameera,

 Great stuff. Now we have a proper JSON support in Axis2.

 But we need to think carefully before adding this formatter and the
 builder as the default builder/formatter for the application/json content
 type. I think we need to fix this JSON support to work in ESB as well.
 Otherwise users will not be able to deploy data services features ESB.

 If we improve this JSON support to handle xsd:any type then we should
 be able to support proxy services case.

 Lets fix this to work in ESB as well and then we can commit it to the
 Kernel.

 Thanks,
 Sameera.




 On Tue, Mar 18, 2014 at 2:28 PM, Shameera Rathnayaka 
 shame...@wso2.com wrote:

 Hi Anjana et al,

 Above new JSON implementation has been introduced to handle XML --
 JSON lossless transformation. and this implementation highly depend on 
 the
 schema definitions, where it generate the message structure by reading 
 this
 schemas.  In short, to work XML stream base JSON implementation we need 
 to
 have proper schema definition for in and out messages otherwise it won't
 work.

 Addition to the above entries we need to do following changes to
 axis2.xml file in order to integrate above implementation.

 Remove RequestURIOperationDispatcher handler from dispatch phase and
 place it as the last handler in transport phase. IMO it is ok to move
 RequestURIOperationDispatcher  to transport phase as we are dealing with
 URI.

  Now add new JSONMessageHandler after the
 RequestURIOperationDispatcher. Finally transport phase would be like
 following

 phaseOrder type=InFlow
 !--  System predefined phases  --
 phase name=Transport
 -
 handler name=RequestURIOperationDispatcher

 class=org.apache.axis2.dispatchers.RequestURIOperationDispatcher/
 handler name=JSONMessageHandler

 class=org.apache.axis2.json.gson.JSONMessageHandler /
 /phase

[Architecture] JSON Mapping Support for Data Services

2014-02-27 Thread Anjana Fernando
Hi,

I've implemented JSON mapping support for data services, which is
basically, rather than defining the XML elements in the result, now we can
give a JSON message template to provide how the JSON representation would
be in the result.

The following is a sample of a data services result element for JSON
mapping:-

query id=customersInBostonSQL useConfig=default
  sqlselect  * from Customers where city = 'Boston' and country =
'USA'/sql
  result outputType=json
{
   customers:{
  customer:[
 {
phone:$phone,
city:$city,
contact:{
   customer-name:$customerName,
   contact-last-name:$contactLastName,
   contact-first-name:$contactFirstName
}
 }
  ]
   }
}
  /result
/query

So here, the result element's outputType value is set to json (we had
xml and rdf, defaulting to xml). So here, to refer to result set's
columns / query-params, we have used the convention of prefixes the name of
the parameter with $, which basically signals we are looking up a value.

Also, since this is a template based approach, for other special properties
like output fields data types, required roles (used for content filtering)
needs to be specified in a special way encoded in the value of a field.
This is done in the following way,

{ age : $age(type:integer;requiredRoles:r1,r2) } ..

So the first part is the looked up variable (column), and the section
covered by ( and ) specifies the extended attributed.

Also, another feature to support here are nested queries, that means, from
the JSON mapping, we should be able to specify a query to be executed and
its result be replaced at the place it was invoked. For this, I've at the
moment implemented in the following way:

{phone:$phone,
@employeesInOfficeSQL:$officeCode-officeCode,$param2-param2 } ..

So basically here, calling a query is symbolized by prefixing the field
name with an @, where the name of the target query will follow, and the
value of the field will contain the parameter mappings, that is out column
values map to target query's params, which is connected with a -
operator. I personally prefer a short symbol like @ to denote the nested
query option, rather than having a keyword like operation, where this is
more compact.

So this mapping works fully when used with the GSON based streaming JSON
implementation [1], that is, if we say, the records are in an JSON array,
it will always return as an array, even if the result just gives out a
single object. This is done by the JSON message formatter looking at the
XML schema created. But, this does not work with nested queries, where the
default JSON message formatter works properly. I've verified the XML schema
generated do conform to the message returned by the service calls, so this
seems like a bug in the new JSON message formatter. I'm attaching here the
data service I'm using, and also the WSDL of it. Below contains some sample
requests you can do against the data service.

(run all with HTTP header Accept: application/json, and HTTP GET)

* http://10.100.1.45:9763/services/JSONSample/boston_customers;
* http://10.100.1.45:9763/services/JSONSample/employee/1002;
* http://10.100.1.45:9763/services/JSONSample/offices; (nested query
request)

A DSS build with the JSON mapping features can be found here [2], download
it and configure the streaming JSON message formatter as explained at [1].
@Shameera, can you please try this out and figure out what the issue might
be with the complex result outputs, and if it's a bug in the JSON message
formatter, appreciate if you can provide a fix for it.

[1]
https://builds.apache.org/job/Axis2/javadoc/docs/json_gson_user_guide.html
[2]
https://svn.wso2.org/repos/wso2/people/anjana/tmp/wso2dss-3.2.0-20140227.zip

Cheers,
Anjana.
-- 
*Anjana Fernando*
Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware


JSONSample.dbs
Description: Binary data
wsdl:definitions xmlns:wsdl=http://schemas.xmlsoap.org/wsdl/; xmlns:xs=http://www.w3.org/2001/XMLSchema; xmlns:ns2=http://ws.wso2.org/dataservice/samples/nested_query_sample/employeesByNumberSQL; xmlns:ns1=http://ws.wso2.org/dataservice/samples/nested_query_sample/employeesInOfficeSQL; xmlns:ns4=http://ws.wso2.org/dataservice/samples/nested_query_sample/customersInBostonSQL; xmlns:ns3=http://ws.wso2.org/dataservice; xmlns:wsaw=http://www.w3.org/2006/05/addressing/wsdl; xmlns:http=http://schemas.xmlsoap.org/wsdl/http/; xmlns:tns=http://ws.wso2.org/dataservice/samples/nested_query_sample; xmlns:ns0=http://ws.wso2.org/dataservice/samples/nested_query_sample/listOfficesSQL; xmlns:soap=http://schemas.xmlsoap.org/wsdl/soap/; xmlns:mime=http://schemas.xmlsoap.org/wsdl/mime/; xmlns:soap12=http://schemas.xmlsoap.org/wsdl/soap12

Re: [Architecture] Dynamically load data sources defined on master-datasources.xml on startup

2014-02-20 Thread Anjana Fernando

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 /sumedha
 b :  bit.ly/sumedha

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 Manoj Fernando
 Director - Solutions Architecture

 Contact:
 LK -  +94 112 145345
 Mob: +94 773 759340
 www.wso2.com

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --


 *[image: http://wso2.com] http://wso2.com Senaka Fernando*
 Senior Technical Lead; WSO2 Inc.; http://wso2.com



 * Member; Apache Software Foundation; http://apache.org
 http://apache.orgE-mail: senaka AT wso2.com http://wso2.com**P: +1
 408 754 7388 %2B1%20408%20754%207388; ext: 51736*;


 *M: +94 77 322 1818 %2B94%2077%20322%201818 Linked-In:
 http://linkedin.com/in/senakafernando
 http://linkedin.com/in/senakafernando*Lean . Enterprise . Middleware

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 Manoj Fernando
 Director - Solutions Architecture

 Contact:
 LK -  +94 112 145345
 Mob: +94 773 759340
 www.wso2.com

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --


 *[image: http://wso2.com] http://wso2.com Senaka Fernando*
 Senior Technical Lead; WSO2 Inc.; http://wso2.com



 * Member; Apache Software Foundation; http://apache.org
 http://apache.orgE-mail: senaka AT wso2.com http://wso2.com**P: +1
 408 754 7388 %2B1%20408%20754%207388; ext: 51736*;


 *M: +94 77 322 1818 %2B94%2077%20322%201818 Linked-In:
 http://linkedin.com/in/senakafernando
 http://linkedin.com/in/senakafernando*Lean . Enterprise . Middleware




-- 
*Anjana Fernando*
Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Dynamically load data sources defined on master-datasources.xml on startup

2014-02-20 Thread Anjana Fernando
Hi Manoj,

Just having a dependency in the pom.xml does not guarantee the ordering, it
is a build time dependency .. maybe you're using some other OSGi service
which in-turn transitively uses the ndatasource OSGi service somewhere, or
else, most probably, it is by chance, that your bundle is getting activated
after ndatasource.

Cheers,
Anjana.


On Fri, Feb 21, 2014 at 8:57 AM, Manoj Fernando man...@wso2.com wrote:

 I didn't really face that issue, but thanks for pointing out. Since I used
 the dependency config from the previous throttle component on my pom.xml,
 the activation order would have worked correctly.

 Regards,
 Manoj


 On Fri, Feb 21, 2014 at 8:45 AM, Amila Maha Arachchi ami...@wso2.comwrote:



 On Friday, February 21, 2014, Kasun Gajasinghe kas...@wso2.com wrote:


 I believe the issue here is that the JNDi context won't be available for
 a carbon component/bundle until the org.wso2.carbon.ndatasource.core bundle
 is activated during server startup. Any bundle that gets activated before
 this bundle won't see the JNDi contexts. I think Manoj is facing that issue
 here.


 AFAIK, there isn't a way to specify the bundle order. So, one option is
 to write a o.w.c.core.ServerStartupHandler which will get invoked after the
 server starts up successfully. By that time, the JNDi contexts etc. will be
 available. Is there any other option?


 You can delay the bundle activation by using a osgi service dependency to
 ndatasource core (if it has registered one). Then the bundle won't get
 activated until ndatasource core is active.


 Regards,
 KasunG



 On Fri, Feb 21, 2014 at 1:19 AM, Anjana Fernando anj...@wso2.comwrote:

 Hi guys,

 Yeah, the existing data source component does exactly that. When you
 mention a data source in a *-datasources.xml file, you can make it
 available as JNDI resource, that is what the following section of a data
 source configuration does:

 jndiConfig
 name{RES_NAME}/name
 !-- optional properties --
 environment
 property name=java.naming.factory.initial{ICS}/property
 property
 name=java.naming.provider.url{PROVIDER_URL}/property
 /environment
 /jndiConfig

 And as Senaka mentioned, this is how registry and user-manager looks up
 its data sources when the server is starting up. Hope this is what Manoj is
 looking for.

 Cheers,
 Anjana.


 On Fri, Feb 21, 2014 at 1:00 AM, Senaka Fernando sen...@wso2.comwrote:

 Hi Manoj,

 Please find the responses inline.

 On Thu, Feb 20, 2014 at 8:25 PM, Manoj Fernando man...@wso2.com wrote:

 Hi Senaka,

 What I meant was the scenario of me as an outside developer wanting to
 add a new datasource for my own carbon component.  Right now, just adding
 the datasource into the xml doesn't make it available as a JNDI resource.
  You need to do that extra step of reading the XML and attaching that onto
 the InitialContext (AFAIK).  It would be much nicer IMO to have those
 datasources added into the initialcontext during bootstrap so that whatever
 component needing to use any of them can simply use the JNDI key to
 reference.


 IINM, you should not be doing this. The JNDI referencing should work
 from any component, webapp etc. We have done nothing special @ the registry
 kernel for instance and if the JNDI referencing works in there it should
 work elsewhere too. Copied Anjana to get this clarified.


 The convenience on system properties would be similar.  We can have a
 config file under repository conf that will get automatically loaded as
 system properties for any component that might need them.  Yes I know we
 can pass them as startup parameters, but it was basically a suggestion for
 sysadmin/developer convenience.  Nothing major... but just for convenience.


 IMHO, it can be a convinience with regards to some use-cases and an
 inconvinience with regards to some others. I think we need to consider the
 pros and cons. There are things like clustering, environment-separation,
 which overrides what (i.e. JAVA_OPTS vs this file) etc that we need to
 think about. Will add some points later.

 Thanks,
 Senaka.


 Regards,
 Manoj


 On Thu, Feb 20, 2014 at 11:14 PM, Senaka Fernando sen...@wso2.comwrote:

 Hi Manoj,

 Datasources can be referenced by JNDI key even now. This is how it works
 in Registry Kernel and UM. Is it done in some other way in carbon
 components?

 And, for system properties, you can pass these through the
 wso2server.sh/bat. I see no benefit of having a separate component to
 do just that. Am I missing something here?

 Thanks,
 Senaka.


 On Thu, Feb 20, 2014 at 6:37 PM, Manoj Fernando

 *Kasun Gajasinghe*
 Software Engineer;
 WSO2 Inc.; http://wso2.com


  ,
 *email: *
 *kasung AT spamfree wso2.com http://wso2.com   ** cell: **+94 (77)
 678-0813 %2B94%20%2877%29%20678-0813*
 *linked-in: *http://lk.linkedin.com/in/gajasinghe



 *blog: **http://kasunbg.org* http://kasunbg.org



 *twitter: **http://twitter.com/kasunbg* http://twitter.com/kasunbg





 --
 *Amila

Re: [Architecture] CEP UI re-factoring and adding much more functionality

2014-01-30 Thread Anjana Fernando
Noted.

Cheers,
Anjana.


On Thu, Jan 30, 2014 at 3:26 PM, Srinath Perera srin...@wso2.com wrote:

 Mohan, for listing and editing Streams, could you look at integrating WSO2
 Store?

 This component MUST be use by both BAM and CEP to list and edit streams.
 Anjana please note also.

 --Srinath


 On Wed, Jan 22, 2014 at 11:32 AM, Sriskandarajah Suhothayan s...@wso2.com
  wrote:



 On Wed, Jan 22, 2014 at 11:18 AM, Lasantha Fernando lasan...@wso2.comwrote:

 Hi Mohan,

 +1 for the design. IMO, the in-flow and out-flow UI will be very useful
 to get an idea about how the events are flowing, which is currently a bit
 lacking in CEP, I think. Great addition!

 Will the user be able to sample events generated in the stream UI to
 test a flow, or will that part come under a separate component?


 Based on the current plan the Try-it for streams will become a separate
 component. In future when we have this we can integrate that with the
 sample event generation UI.

 Currently the use of Sample event generation UI is, allowing users to
 create sample events, edit them, and finally copy and send them via
 curl,JMS, etc..

 Suho


 Thanks,
 Lasantha



 On 21 January 2014 19:43, Mohanadarshan Vivekanandalingam 
 mo...@wso2.com wrote:


 Hi All,

 As you already knew that we have done major improvements and changes in
 CEP 3.0.0 (which is a complete re-write) specially in UI aspect. But we
 found, there are some gaps that we can fix and improve the usability
 experience further. These changes are targeted for next CEP release which
 is version 3.1.0. And below UI improvements also targeted on CEP tooling
 aspect.

 Please see the below figures which are mock-up design flow of the event
 stream UI and execution plan UI. Based on the below design we are trying to
 achieve the default-event concepts and also giving opportunity to advanced
 event configurations also. Appreciate any ideas and suggestions on this...

 Thanks  Regards,
 Mohan


 --
 *V. Mohanadarshan*
 *Software Engineer,*
 *Data Technologies Team,*
 *WSO2, Inc. http://wso2.com http://wso2.com *
 *lean.enterprise.middleware.*

 email: mo...@wso2.com
 phone:(+94) 771117673




 --
 *Lasantha Fernando*
 Software Engineer - Data Technologies Team
 WSO2 Inc. http://wso2.com

 email: lasan...@wso2.com
 mobile: (+94) 71 5247551




 --

 *S. Suhothayan *
 Associate Technical Lead,
  *WSO2 Inc. *http://wso2.com
 * http://wso2.com/*
 lean . enterprise . middleware


 *cell: (+94) 779 756 757 %28%2B94%29%20779%20756%20757 | blog:
 http://suhothayan.blogspot.com/ http://suhothayan.blogspot.com/ twitter:
 http://twitter.com/suhothayan http://twitter.com/suhothayan | linked-in:
 http://lk.linkedin.com/in/suhothayan http://lk.linkedin.com/in/suhothayan*




 --
 
 Srinath Perera, Ph.D.
   Director, Research, WSO2 Inc.
   Visiting Faculty, University of Moratuwa
   Member, Apache Software Foundation
   Research Scientist, Lanka Software Foundation
   Blog: http://srinathsview.blogspot.com/
   Photos: http://www.flickr.com/photos/hemapani/
Phone: 0772360902




-- 
*Anjana Fernando*
Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] [C5] Clustering API

2014-01-17 Thread Anjana Fernando
Hi Azeez,

Well I used the word 'could' loosely there .. I gave the reasons for the
group functionality :) .. I just think it would be a useful functionality
for many use cases ..

Cheers,
Anjana.


On Fri, Jan 17, 2014 at 9:30 AM, Afkham Azeez az...@wso2.com wrote:




 On Fri, Jan 17, 2014 at 10:42 PM, Anjana Fernando anj...@wso2.com wrote:

 Hi,

 Yeah, most probably, the task related functionality should not be part of
 this API, but the group functionality I mentioned *could* be useful, as
 I explained.


 The golden rule about API design is when in doubt, leave it out (
 http://www.infoq.com/articles/API-Design-Joshua-Bloch)



 Cheers,
 Anjana.


 On Fri, Jan 17, 2014 at 4:08 AM, Kishanthan Thangarajah 
 kishant...@wso2.com wrote:

 IMO, I think task related APIs are not part of kernel or clustering APIs
 provided by the kernel. Since this is more of a use-case on functions
 provided by the hazelcast, we can expose the underline hazelcast instance
 as an OSGi service which then can be used for the above purpose.


 On Fri, Jan 17, 2014 at 12:42 PM, Sriskandarajah Suhothayan 
 s...@wso2.com wrote:

 I'm OK to have a separate API to handle the task stuff, but in that
 case
 will it have access to Hazelcast or other internal stuff?
 and should it be a part of kernel ?

 I'm not sure what are the bits and pieces we need from Hazelcast to
 create this API and exposing all of them will make the Caching API ugly :)

 Regards,
 Suho




 On Fri, Jan 17, 2014 at 11:44 AM, Supun Malinga sup...@wso2.comwrote:

 Hi,

 Also in here we should consider the use cases of OC as well IMO..

  thanks,


 On Fri, Jan 17, 2014 at 11:24 AM, Afkham Azeez az...@wso2.com wrote:

 I think this is making clustering more specific to running tasks.
 Handling tasks should be implemented at a layer above clustering.


 On Fri, Jan 17, 2014 at 11:06 AM, Sriskandarajah Suhothayan 
 s...@wso2.com wrote:

 Based on the Anjana's suggestions, to support different products
 having different way of coordination.

 My suggestion is as follows

 //This has to be a *one time thing* I'm not sure how we should have
 API for this!
 //ID is Task or GroupID
 //Algorithm-class can be a class or name registered in carbon TBD
 void preformElection(ID, Algorithm-class);

 //Register current node to do/join the Task denoted by the ID
 void registerAsTaskWorker(ID);

 //Check is the current node is the coordinator
 boolean isCoordinator(ID);

 //Get the coordinator for the ID.
 NodeID  getCoordinator(ID);

 We also need a Listener for Coordinator

 CoordinatorListener

   void coordinatorChanged(ID,NodeID);

 WDYT?

 Suho


 On Thu, Jan 16, 2014 at 8:32 PM, Anjana Fernando anj...@wso2.comwrote:

 Hi,

 On Thu, Jan 16, 2014 at 5:10 AM, Sriskandarajah Suhothayan 
 s...@wso2.com wrote:

 We also need an election API,

 E.g for certain tasks only one/few node can be responsible and if
 that node dies some one else need to take that task.

 Here user should be able to give the Task Key and should be able
 to get to know whether he is responsible for the task.

 It is also impotent that the election logic is pluggable based on
 task


 The task scenarios are similar to what we do in our scheduled tasks
 component. I'm not sure if that type of functionality should be 
 included in
 this API, or did you mean, you need the election API to build on top 
 of it?
 ..

 Also, another requirement we have is, creating groups within a
 cluster. That is, when we work on the cluster, sometimes we need a 
 node a
 specific group/groups. And it each group will have it's own 
 coordinator. So
 then, there wouldn't be a single coordinator for the full physical 
 cluster.
 I know we can build this functionality on a higher layer than this 
 API, but
 then, effectively the isCoordinator for the full cluster will not be 
 used,
 and also, each component that uses similar group functionality will 
 roll up
 their own implementation of this. So I'm thinking if we build in some
 robust group features to this API itself, it will be very convenient 
 for it
 consumers.

 So what I suggest is like, while a member joins for the full
 cluster automatically, can we have another API method like,
 joinGroup(groupId), then later when we register a membership listener, 
 we
 can give the groupId as an optional parameter to register a membership
 listener for a specific group. And as for the isCoordinator 
 functionality,
 we can also overload that method to provide a gropuId, or else, in the
 membership listener itself, we can have an additional method like
 coordinatorChanged(String memberId) or else, maybe more suitable,
 assumedCoordinatorRole() or something like that to simply say, you 
 just
 became the coordinator of this full cluster/group.

 Cheers,
 Anjana.



 Regards
 Suho


 On Thu, Jan 16, 2014 at 4:56 PM, Afkham Azeez az...@wso2.comwrote:




 On Thu, Jan 16, 2014 at 4:55 PM, Kishanthan Thangarajah 
 kishant...@wso2.com wrote:

 Adding more.

 Since we will follow

Re: [Architecture] [C5] Clustering API

2014-01-16 Thread Anjana Fernando
, Inc.
 lean.enterprise.middleware

 Mobile - +94773426635
 Blog - *http://kishanthan.wordpress.com
 http://kishanthan.wordpress.com*
 Twitter - *http://twitter.com/kishanthan
 http://twitter.com/kishanthan*




 --
 *Afkham Azeez*
 Director of Architecture; WSO2, Inc.; http://wso2.com
 Member; Apache Software Foundation; http://www.apache.org/
 * http://www.apache.org/*
 *email: **az...@wso2.com* az...@wso2.com
 * cell: +94 77 3320919 %2B94%2077%203320919 blog: *
 *http://blog.afkham.org* http://blog.afkham.org
 *twitter: **http://twitter.com/afkham_azeez*http://twitter.com/afkham_azeez
 * linked-in: **http://lk.linkedin.com/in/afkhamazeez
 http://lk.linkedin.com/in/afkhamazeez*

 *Lean . Enterprise . Middleware*

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --

 *S. Suhothayan *
 Associate Technical Lead,
  *WSO2 Inc. *http://wso2.com
 * http://wso2.com/*
 lean . enterprise . middleware


 *cell: (+94) 779 756 757 %28%2B94%29%20779%20756%20757 | blog:
 http://suhothayan.blogspot.com/ http://suhothayan.blogspot.com/ twitter:
 http://twitter.com/suhothayan http://twitter.com/suhothayan | linked-in:
 http://lk.linkedin.com/in/suhothayan http://lk.linkedin.com/in/suhothayan*


 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




-- 
*Anjana Fernando*
Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] BAM Notifications From Hive

2013-10-24 Thread Anjana Fernando
Hi Sanjiva,

The deleting works by giving the row key of the record, so it will not
conflict with anything else. The scenario is, where after a batch job is
run, lets say, some of the result is processed, needs to be sent as an
notification to someone. For example, after the mediation stats are
processed, we can check if a specific service is overloaded with requests
or anything like that, and we send out that information out to a stream
using this feature. And we can define a flow where we get event from that
stream, and send out an email/sms notification.

The coordination across the Hadoop cluster works where, the result will
only be written by a single operation, similar to records being written to
a database. And the task the processes these data also is running in a
fail-over aware manner, where if the task goes down, it will be be started
in another node.

Cheers,
Anjana.


On Thu, Oct 24, 2013 at 1:56 PM, Sanjiva Weerawarana sanj...@wso2.comwrote:

 Anjana given that Cassandra is not transactional how does deleting work
 when someone else may be writing at the same time?

 I'm a bit unclear why its critical to send events out of Hive itself. Can
 you elaborate the scenario please? How do you coordinate that across
 potentially a large Hadoop cluster?

 Sanjiva.


 On Tue, Oct 22, 2013 at 3:00 PM, Anjana Fernando anj...@wso2.com wrote:

 Hi Srinath,

 Yeah, the data is always cleaned up when the task is run. Basically, the
 task reads all the data in the column family, send each event to the target
 stream, and in the same time, deletes it from the data store.

 The notification is totally customizable by the user, what this simply
 does is send any arbitrary data to a stream, at the point when possibly
 some insert statement is executed from the Hive script, which can be at
 the end of the script or anywhere. After the event comes to a stream, the
 user can do anything with it, either run an CEP query against it, or
 directly passthrough it to some transport like email or sms.

 Cheers,
 Anjana.


 On Tue, Oct 22, 2013 at 2:30 PM, Srinath Perera srin...@wso2.com wrote:

 Hi Anjana,

 Basically, we are polling the cassandra location. I think it is OK. But
 we need to make sure we clean up these tasks when we detected that job has
 finished.

 What does the notification says? does it says job has finished or can
 user give an condition when to send the notification? We eventually need
 that.

 --Srinath


 On Tue, Oct 22, 2013 at 2:03 PM, Anjana Fernando anj...@wso2.comwrote:

 Hi,

 For BAM notification, the approach we have at the moment is, using CEP,
 which we do ship by default with BAM now. But, there is another limitation,
 where we cannot trigger any notifications from Hive scripts, which is what
 is used mostly.

 So the requirement is, somehow, we should be able to send messages from
 Hive to a stream to send out notifications, that is, when messages comes to
 a stream, we can use (CEPs) message builder/formatters to send out
 email/sms etc.. So I've implemented a simple mechanism to do this, where
 when Hive wants to send out a message to a stream, it will write a data row
 to a pre-defined Cassandra CF (bam_notification_messages), where it will
 have a column with the name streamId, and other columns (maps to
 payload section of a stream). And then, in the BAM server, there is a
 scheduled task running, where it polls the data in that CF (5 second
 intervals), to get the existing rows, and reads the streamId and other
 columns to generate an event to be send to the target stream, and processed
 rows will be deleted. So with this approach, effectively, we can now send
 events to a specific stream from Hive.

 I've tested this feature in BAM. And hope this approach is fine for the
 requirement.

 Cheers,
 Anjana.

 --
 *Anjana Fernando*
 Technical Lead
 WSO2 Inc. | http://wso2.com
 lean . enterprise . middleware




 --
 
 Srinath Perera, Ph.D.
   Director, Research, WSO2 Inc.
   Visiting Faculty, University of Moratuwa
   Member, Apache Software Foundation
   Research Scientist, Lanka Software Foundation
   Blog: http://srinathsview.blogspot.com/
   Photos: http://www.flickr.com/photos/hemapani/
Phone: 0772360902




 --
 *Anjana Fernando*
 Technical Lead
 WSO2 Inc. | http://wso2.com
 lean . enterprise . middleware

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 Sanjiva Weerawarana, Ph.D.
 Founder, Chairman  CEO; WSO2, Inc.;  http://wso2.com/
 email: sanj...@wso2.com; phone: +94 11 763 9614; cell: +94 77 787 6880 | +1
 650 265 8311
 blog: http://sanjiva.weerawarana.org/

 Lean . Enterprise . Middleware

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




-- 
*Anjana Fernando*
Technical Lead
WSO2 Inc. | http://wso2.com
lean

[Architecture] BAM Notifications From Hive

2013-10-22 Thread Anjana Fernando
Hi,

For BAM notification, the approach we have at the moment is, using CEP,
which we do ship by default with BAM now. But, there is another limitation,
where we cannot trigger any notifications from Hive scripts, which is what
is used mostly.

So the requirement is, somehow, we should be able to send messages from
Hive to a stream to send out notifications, that is, when messages comes to
a stream, we can use (CEPs) message builder/formatters to send out
email/sms etc.. So I've implemented a simple mechanism to do this, where
when Hive wants to send out a message to a stream, it will write a data row
to a pre-defined Cassandra CF (bam_notification_messages), where it will
have a column with the name streamId, and other columns (maps to
payload section of a stream). And then, in the BAM server, there is a
scheduled task running, where it polls the data in that CF (5 second
intervals), to get the existing rows, and reads the streamId and other
columns to generate an event to be send to the target stream, and processed
rows will be deleted. So with this approach, effectively, we can now send
events to a specific stream from Hive.

I've tested this feature in BAM. And hope this approach is fine for the
requirement.

Cheers,
Anjana.

-- 
*Anjana Fernando*
Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] BAM Notifications From Hive

2013-10-22 Thread Anjana Fernando
Hi Srinath,

Yeah, the data is always cleaned up when the task is run. Basically, the
task reads all the data in the column family, send each event to the target
stream, and in the same time, deletes it from the data store.

The notification is totally customizable by the user, what this simply does
is send any arbitrary data to a stream, at the point when possibly some
insert statement is executed from the Hive script, which can be at the
end of the script or anywhere. After the event comes to a stream, the user
can do anything with it, either run an CEP query against it, or directly
passthrough it to some transport like email or sms.

Cheers,
Anjana.


On Tue, Oct 22, 2013 at 2:30 PM, Srinath Perera srin...@wso2.com wrote:

 Hi Anjana,

 Basically, we are polling the cassandra location. I think it is OK. But we
 need to make sure we clean up these tasks when we detected that job has
 finished.

 What does the notification says? does it says job has finished or can user
 give an condition when to send the notification? We eventually need that.

 --Srinath


 On Tue, Oct 22, 2013 at 2:03 PM, Anjana Fernando anj...@wso2.com wrote:

 Hi,

 For BAM notification, the approach we have at the moment is, using CEP,
 which we do ship by default with BAM now. But, there is another limitation,
 where we cannot trigger any notifications from Hive scripts, which is what
 is used mostly.

 So the requirement is, somehow, we should be able to send messages from
 Hive to a stream to send out notifications, that is, when messages comes to
 a stream, we can use (CEPs) message builder/formatters to send out
 email/sms etc.. So I've implemented a simple mechanism to do this, where
 when Hive wants to send out a message to a stream, it will write a data row
 to a pre-defined Cassandra CF (bam_notification_messages), where it will
 have a column with the name streamId, and other columns (maps to
 payload section of a stream). And then, in the BAM server, there is a
 scheduled task running, where it polls the data in that CF (5 second
 intervals), to get the existing rows, and reads the streamId and other
 columns to generate an event to be send to the target stream, and processed
 rows will be deleted. So with this approach, effectively, we can now send
 events to a specific stream from Hive.

 I've tested this feature in BAM. And hope this approach is fine for the
 requirement.

 Cheers,
 Anjana.

 --
 *Anjana Fernando*
 Technical Lead
 WSO2 Inc. | http://wso2.com
 lean . enterprise . middleware




 --
 
 Srinath Perera, Ph.D.
   Director, Research, WSO2 Inc.
   Visiting Faculty, University of Moratuwa
   Member, Apache Software Foundation
   Research Scientist, Lanka Software Foundation
   Blog: http://srinathsview.blogspot.com/
   Photos: http://www.flickr.com/photos/hemapani/
Phone: 0772360902




-- 
*Anjana Fernando*
Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] BAM Data Archival Feature improvements

2013-09-04 Thread Anjana Fernando
Hi Dipesh,

Thank you for the ideas. Actually, yeah, we can also support archiving a
general CF without filtering the records on field like, stream version and
so on. It should be a straight forward functionality, without changing much
in the backend.

As for using Hive, you've a point there. We also actually considered it
earlier, but thought not going in that approach, thinking it has some
limitations, where we can't address the data when the column names are not
known in Cassandra. But by looking into it more now, we identified that it
can actually be done. So yeah, we will now look again into using Hive to do
the processing. And also with that, we can easily support archiving from/to
several data sources, such as RDBMS, Cassandra, and HDFS.

And also now, for the indexing concerns we are going to use custom index
based approach. Now actually, most probably if Hive is used, we are going
to straight away use the the functionality given by the incremental
processing, where it already contains the indexing features for timestamps.
So with these features tied in, hopefully it would be a solid
implementation.

Cheers,
Anjana.

On Wed, Sep 4, 2013 at 11:44 AM, Dipesh Chheda wrote:

 Hi Malith,

 The current (hive-based) solution (and it seems the proposed solution) only
 handles Column Families (CFs) created/maintained by BAM (based on the
 stream-def). Couple of improvements would really help:
  - Currently, the archiving configuration is per 'CF+stream-def-version'.
 Is
 it possible to have just one Archive configuration that takes care of a
 given CF irrespective of the stream-def-version.
  - Archiving feature to support 'any CF' exist in a given Cassandra
 Cluster.
 We are currently using Cassandra (instead of RDBMS like MySql) to store
 Analyzed Data. Of course, the configuration would need to have name of the
 'timestamp' column for each CF, based on which the data would be filtered
 for archiving.

 For Hector-based implementation, I would imagine that 'non-secondary'
 indexing on the 'timestamp column' would require to efficiently filter and
 archive the data. If you agree, how do you folks plan to handle this? If
 not
 required, how would the solution scale/perform-better without indexing?

 Also, in addition to archiving data from Cassandra (ActiveStore) to
 Cassandra (ArchiveStore), shouldn't it support archiving to
 traditional-SAN-like-storage-options, HDFS etc.
 I think, these other options could easily/naturally supported by Hive
 itself
 - where the hive-result could be streamed as key-value to these type of
 archive-stores.

 Regards,
 Dipesh


 Malith Dhanushka wrote
  Hi folks,
 
  We(BAM team, Sumedha) had a  discussion about the $Subject and following
  are the suggested improvements for the Cassandra data archival feature in
  BAM.
 
  - Remove hive script based archiving and use hector API to directly issue
  archive queries to Cassandra  (Current implementation is
 based
  on hive where it generates hive script and archiving process uses
  map-reduce jobs to achieve the task and it has a limitation of discarding
  custom key value pares in column family)
 
  - Use Task component for scheduling purposes
 
  - Archive data to external Cassandra ring
 
  - Major UI improvements
  - List the current archiving tasks
  - Edit, Remove and Schedule archiving tasks
  - Add new archiving task
 
  If there is any additional requirements please raise.
 
  Thanks,
  Malith
  --
  Malith Dhanushka
 
  Engineer - Data Technologies
  *WSO2, Inc. : wso2.com*
 
  *Mobile*  : +94 716 506 693
 
  ___
  Architecture mailing list

  Architecture@

  https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture





 --
 View this message in context:
 http://wso2-oxygen-tank.10903.n7.nabble.com/BAM-Data-Archival-Feature-improvements-tp85315p85330.html
 Sent from the WSO2 Architecture mailing list archive at Nabble.com.
 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




-- 
*Anjana Fernando*
Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] NTask component updated to use Hazelcast instead of ZooKeeper

2013-08-19 Thread Anjana Fernando
Hi Amila,

For BAM as in, BAM required ZK for ntask, since ntask doesn't need ZK
anymore, BAM will not need ZK anymore. And yeah, TS can be used, the idea
of TS is to be used in large deployments where features like tenant
partition is used. Or else, the usual clustered mode is enough.

Cheers,
Anjana.


On Mon, Aug 19, 2013 at 11:16 AM, Amila Maha Arachchi ami...@wso2.comwrote:

 Cloud deployment was planning to use ZooKeeper for the BAM setup. We will
 use TS instead.


 On Mon, Aug 19, 2013 at 11:01 AM, Anjana Fernando anj...@wso2.com wrote:

 Hi,

 I had a chat with Dimuthu and, she said, they are not using ZooKeeper in
 AF it seems.

 Cheers,
 Anjana.


 On Mon, Aug 19, 2013 at 10:49 AM, Anjana Fernando anj...@wso2.comwrote:

 Hi Sanjiva,

 Yeah, sure, will schedule a review, and will talk to the app-factory
 guys.

 Cheers,
 Anjana.


 On Mon, Aug 19, 2013 at 6:31 AM, Sanjiva Weerawarana 
 sanj...@wso2.comwrote:

 Excellent! Can we do a review too before this is final?

 Ref AF use of ZK - please help them to undo it ASAP .. we need to
 totally drop ZooKeeper.

 Sanjiva.


 On Sun, Aug 18, 2013 at 2:46 AM, Anjana Fernando anj...@wso2.comwrote:

 Hi everyone,

 I've changed the ntask component to use Hazelcast for the coordination
 / group communication activities. This is because, the earlier ZooKeeper
 based coordination component use was too much troublesome, where it takes 
 a
 whole different ZooKeeper cluster to be set up to properly cluster a 
 Carbon
 server which has scheduled tasks. And also, ZooKeeper has little support
 for proper error handling, and it's hard/not-possible to prevent some edge
 cases.

 So with the Hazelcast integration, you will not have to install a
 different server, since it just works in a peer to peer fashion inside the
 Carbon server itself. And also since it's also used in Axis2 clustering,
 the integration is seamless.

 The scheduled tasks has three main modes it can work, STANDALONE,
 CLUSTERED and REMOTE. I've introduced a new setting called AUTO, that is
 being set in tasks-config.xml, as the default, where, it automatically
 checks if clustering is enabled in the system, and switches to CLUSTERED
 mode if so, or it falls back to the STANDALONE mode. So in the typical
 setup, there no additional settings needs to be changed for distributed
 tasks to work properly (other than, startup task server count, which is 
 set
 to 2 by default).

 With this change, I've removed the coordination (ZK based) components
 from products which uses it for ntask. The following products are the
 changes I did in branch/trunk and built the possible ones.

 DSS - Branch/Trunk
 AS:- Branch/Trunk, cannot build branch because of a Jaggary version
 problem
 ELB:- Trunk, coordination-server also removed
 GREG:- Branch/Trunk, cannot build branch - Jaggary version problem
 Manager:- Trunk
 AppFactory:- Trunk
 BAM:- Trunk
 BPS:- Trunk

 SS also uses the coordination-core feature, which they seem to use for
 other purposes, not for scheduled tasks. I'd recommend, if possible, to
 re-write that part of the code to use Hazelcast instead.

 Cheers,
 Anjana.

 --
 *Anjana Fernando*
 Technical Lead
 WSO2 Inc. | http://wso2.com
 lean . enterprise . middleware

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 Sanjiva Weerawarana, Ph.D.
 Founder, Chairman  CEO; WSO2, Inc.;  http://wso2.com/
 email: sanj...@wso2.com; phone: +94 11 763 9614; cell: +94 77 787 6880| +1
 650 265 8311
 blog: http://sanjiva.weerawarana.org/

 Lean . Enterprise . Middleware

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 *Anjana Fernando*
 Technical Lead
 WSO2 Inc. | http://wso2.com
 lean . enterprise . middleware




 --
 *Anjana Fernando*
 Technical Lead
 WSO2 Inc. | http://wso2.com
 lean . enterprise . middleware

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 *Amila Maharachchi*
 Senior Technical Lead
 WSO2, Inc.; http://wso2.com

 Blog: http://maharachchi.blogspot.com
 Mobile: +94719371446


 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




-- 
*Anjana Fernando*
Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] NTask component updated to use Hazelcast instead of ZooKeeper

2013-08-18 Thread Anjana Fernando
Hi,

I had a chat with Dimuthu and, she said, they are not using ZooKeeper in AF
it seems.

Cheers,
Anjana.


On Mon, Aug 19, 2013 at 10:49 AM, Anjana Fernando anj...@wso2.com wrote:

 Hi Sanjiva,

 Yeah, sure, will schedule a review, and will talk to the app-factory guys.

 Cheers,
 Anjana.


 On Mon, Aug 19, 2013 at 6:31 AM, Sanjiva Weerawarana sanj...@wso2.comwrote:

 Excellent! Can we do a review too before this is final?

 Ref AF use of ZK - please help them to undo it ASAP .. we need to totally
 drop ZooKeeper.

 Sanjiva.


 On Sun, Aug 18, 2013 at 2:46 AM, Anjana Fernando anj...@wso2.com wrote:

 Hi everyone,

 I've changed the ntask component to use Hazelcast for the coordination /
 group communication activities. This is because, the earlier ZooKeeper
 based coordination component use was too much troublesome, where it takes a
 whole different ZooKeeper cluster to be set up to properly cluster a Carbon
 server which has scheduled tasks. And also, ZooKeeper has little support
 for proper error handling, and it's hard/not-possible to prevent some edge
 cases.

 So with the Hazelcast integration, you will not have to install a
 different server, since it just works in a peer to peer fashion inside the
 Carbon server itself. And also since it's also used in Axis2 clustering,
 the integration is seamless.

 The scheduled tasks has three main modes it can work, STANDALONE,
 CLUSTERED and REMOTE. I've introduced a new setting called AUTO, that is
 being set in tasks-config.xml, as the default, where, it automatically
 checks if clustering is enabled in the system, and switches to CLUSTERED
 mode if so, or it falls back to the STANDALONE mode. So in the typical
 setup, there no additional settings needs to be changed for distributed
 tasks to work properly (other than, startup task server count, which is set
 to 2 by default).

 With this change, I've removed the coordination (ZK based) components
 from products which uses it for ntask. The following products are the
 changes I did in branch/trunk and built the possible ones.

 DSS - Branch/Trunk
 AS:- Branch/Trunk, cannot build branch because of a Jaggary version
 problem
 ELB:- Trunk, coordination-server also removed
 GREG:- Branch/Trunk, cannot build branch - Jaggary version problem
 Manager:- Trunk
 AppFactory:- Trunk
 BAM:- Trunk
 BPS:- Trunk

 SS also uses the coordination-core feature, which they seem to use for
 other purposes, not for scheduled tasks. I'd recommend, if possible, to
 re-write that part of the code to use Hazelcast instead.

 Cheers,
 Anjana.

 --
 *Anjana Fernando*
 Technical Lead
 WSO2 Inc. | http://wso2.com
 lean . enterprise . middleware

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 Sanjiva Weerawarana, Ph.D.
 Founder, Chairman  CEO; WSO2, Inc.;  http://wso2.com/
 email: sanj...@wso2.com; phone: +94 11 763 9614; cell: +94 77 787 6880 | +1
 650 265 8311
 blog: http://sanjiva.weerawarana.org/

 Lean . Enterprise . Middleware

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 *Anjana Fernando*
 Technical Lead
 WSO2 Inc. | http://wso2.com
 lean . enterprise . middleware




-- 
*Anjana Fernando*
Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Improving activity ID propagation in activity data publishing

2013-08-06 Thread Anjana Fernando



 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




-- 
*Anjana Fernando*
Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Annotation scheme for Hive scripts

2013-08-05 Thread Anjana Fernando
*

 *Mobile*  : +94 716 506 693

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture



 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




 --
 Malith Dhanushka

 Engineer - Data Technologies
 *WSO2, Inc. : wso2.com*

 *Mobile*  : +94 716 506 693




 --
 Malith Dhanushka

 Engineer - Data Technologies
 *WSO2, Inc. : wso2.com*

 *Mobile*  : +94 716 506 693

 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture



 ___
 Architecture mailing list
 Architecture@wso2.org
 https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture




-- 
*Anjana Fernando*
Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


  1   2   >