Re: [Architecture] C5 Carbon Kernel : Secure Vault Implementation

2016-09-12 Thread Jayanga Dissanayake
@Indika,
With the new C5 implementation, Secure Vault doesn't care about the
location where the alias is used, hence there is no file similar to "
cipher-tool.properties".
Instead, when the product is build, all the secrets (passwords) that are
needed should be specified in the secrets.properties file. To avoid
conflicts, each alias should have the component name as the first part of
the alias.

eg:
datasources.jdbc.mysql.password (this pattern is not yet finalied)

There is another component "ConfigResolver", which will be introduced in
the 5.2.0. Which allows components to update the configuration with
environment specific values. This component uses a file called
"deployment.properties", which has the file name and element to update ie.
[carbon.yaml]/tenant=new_tenant.

All the components must get their configurations parsed via this
"ConfigResolver" service. It will properly update all the placeholders,
i. ${env:compoment1.db.password} - will be updated with
environment variable called "compoment1.db.password"
ii. ${sys:compoment1.db.password} - will be updated with system property
called "compoment1.db.password"
iii. ${sec:compoment1.db.password} - ConfigResolver will internally call
the SecureVault and get the alias "compoment1.db.password" resolved for you.

The design and functionalities of the "ConfigResolver" will be described in
the separate thread.

@Sanjeewa,
Thanks for the suggestion.
The current implementation doesn't support secrets (passwords) in
deployable artifacts.
We will look more into.

Thanks,
Jayanga.



*Jayanga Dissanayake*
Associate Technical Lead
WSO2 Inc. - http://wso2.com/
lean . enterprise . middleware
email: jaya...@wso2.com
mobile: +94772207259


On Sun, Sep 11, 2016 at 12:21 PM, Sanjeewa Malalgoda 
wrote:

> @jayanga, how can we extend this for hot deployed artifacts(synapse
> config, user stores etc)? When we implement stuff on top of kernel this
> will be helpful IMO.
>
> Thanks
> sanjeewa.
>
> Sent from my phone.
>
> On Sep 10, 2016 8:54 AM, "Indika Sampath"  wrote:
>
>> Hi Jayanga,
>>
>> Carbon 4 secure vault implementation had cipher-text.properties that
>> contain key-value pair which is equivalent to secrets.properties in
>> proposed implementation. But there is another file cipher-tool.properties,
>> where it map key with the configuration path. I would like to know how are
>> we going to identify the path of config to update in proposed
>> implementation ?
>>
>> Cheers!
>>
>> On Sat, Sep 10, 2016 at 7:53 AM, Jayanga Dissanayake 
>> wrote:
>>
>>> Hi All,
>>>
>>> C5 Carbon Kernel (5.2.0) will consist of a newly designed secure vault
>>> implementation, which was redesigned with the concerns of extensibility.
>>> The default implementation will be based on the JKS, but it will provide
>>> the flexibility to incorporate with external vaults and key managers.
>>>
>>> Secure Vault enables component/product developers to keep the secrets
>>> (passwords) in one file "secrets.properties" (where the encrypted passwords
>>> are persisted), with an alias associated with each secret. Configuration
>>> files refer the secrets via aliases and at the runtime these secrets will
>>> be resolved to their original (decrypted) values.
>>>
>>> There are two major components in the Secure Vault Implementation;
>>> 1. *CipherTool* - Encrypts the secrets given in the
>>> "secrets.properties" file.
>>> 2.
>>> *​​SecureVault* *Component*
>>> -
>>>
>>> Decrypts
>>>
>>> the secrets
>>> ​ ​
>>> at runtime.
>>>
>>> *​*
>>> *SecureVault* *Component*
>>>
>>> Below is a high-level illustration of the Secure Vault Component
>>>
>>>
>>> ​
>>> ​
>>> There are three main sub-components in the Secure Vault Component. (
>>> *SecretRepository, **MasterKeyReader *and* SecureVault OSGi service*)
>>>
>>> *1. SecretRepository* - Secret Repository is responsible for reading
>>> the secrets.properties file, providing the decrypted secrets and providing
>>> encryption/decryption capability depending on the underlying cipher
>>> mechanism.
>>>
>>> In the default implementation 'org.wso2.carbon.kernel.secure
>>> vault.repository.DefaultSecretRepository', will expose the secrets
>>> given in the secrets.properties file and provides the encryption/decryption
>>> based on the javax.crypto.Cipher and configured JKS in the
>>> secure-vault.yaml file.
>>>
>>> If it's needed to implement a custom SecretRepository, one can implement
>>> the 'org.wso2.carbon.kernel.securevault.SecretRepository', register it
>>> as an OSGi service and update the secure-vault.yaml file.
>>>
>>> *2. MasterKeyReader* - Master Key Reader is responsible for reading
>>> passwords/keys/etc. that are needed to initialize the SecretRepository.
>>>
>>> In the default implementation 'org.wso2.carbon.kernel.secure
>>> vault.reader.DefaultMasterKeyReader', it reads the master keys in
>>> several ways. Following is the order in which the DefaultMasterKeyReader
>>> reads 

Re: [Architecture] Creating several versions of same API on APIM REST APIs

2016-09-12 Thread Malintha Amarasinghe
Hi,

On Mon, Sep 12, 2016 at 9:46 AM, Thilini Cooray  wrote:

> Hi Kaveesha,
>
> Shouldn't we use apiProvider.checkIfAPIExists(apiId) as the first check
> of POST /apis endpoint?
>
IMO it is good to start the duplicate check from API Identifier as it is
> the unique entry for all the APIs in APIM.
>
Then we can go for apiProvider.isDuplicateContextTemplate(body.getContext()
> ).
>
+1.
>From the functionality it won't be much different but its better to check
name and version first as it more likely the Id of the API..

Apart from that, I think we need to be bit careful when doing this with a
user with Admin permission who is capable of creating APIs on behalf of
other users..
Lets say there is an API, with name api1, version 1.0.0 and context /api1
which is created by an API creator called John.. Now as an Admin user, you
are trying to create an API with name api1, version *2.0.0* and context
/api1 but accidentally using the "provider": "admin" in the payload.. Now
APIM will trying to create the API as a different user than the original
API..
As I checked from the UI, If we create a new API version using a different
creator than that of the original API, the new API version's creator is the
original APIs creator, not the current creator who creates the new version..

Thanks!
Malintha


> WDYT?
>
> Thanks.
>
> On Mon, Sep 12, 2016 at 9:15 AM, Kaveesha Perera 
> wrote:
>
>> Hi,
>>
>> Currently I'm working on a client side tool to perform import and export
>> of APIs in APIM.
>>
>> I encountered conflict issues(error 409) on following instances when
>> using REST APIs to create a new API.
>> 1.when trying to re-import already published API
>> 2.when trying to import a new version of a already published API
>>
>> This is because along API post , initially it check for the context by
>> [1] and throw a conflict exception if context template of the payload and a
>> existing API alike.
>>
>> Several versions of the same API has same *context template* and it
>> should be a exceptional scenario of above mentioned procedure. To handle
>> this hope to do following modifications for apisPost REST API.
>>
>> On API post initially do check [1]. If returns true, get the
>> corresponding API name from the database and check if the API name in the
>> payload and the published API are same. If same, then retrieve all the
>> published versions of that API from the database and check those against
>> version stated in the payload. Method should throws a conflict exception
>> only if the payload holds a already published version of the API or if the
>> API name on the payload and API name in the database corresponds to same
>> context template differs.Else it will allow the normal process of creating
>> a new API.Summary of the proposed changes are shown in Fig.1.0
>>
>>
>> ​*Fig.1.0*
>>
>>
>>
>> [1] apiProvider.isDuplicateContextTemplate(body.getContext())
>>
>> If any feedback please do reply.
>>
>> Regards,
>> Kaveesha
>>
>> --
>> Kaveesha Perera
>> Intern - Software Engineering
>>
>> mobile: 0716130471
>>
>> ___
>> Architecture mailing list
>> Architecture@wso2.org
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
> Best Regards,
>
> *Thilini Cooray*
> Software Engineer
> Mobile : +94 (0) 774 570 112 <%2B94%20%280%29%20774%20570112>
> E-mail : thili...@wso2.com
>
> WSO2 Inc. www.wso2.com
> lean.enterprise.middleware
>
> ___
> Architecture mailing list
> Architecture@wso2.org
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
Malintha Amarasinghe
Software Engineer
*WSO2, Inc. - lean | enterprise | middleware*
http://wso2.com/

Mobile : +94 712383306
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Creating several versions of same API on APIM REST APIs

2016-09-12 Thread Lahiru Cooray
Hi,
As per my understanding here what you are trying to achieve is to avoid api
duplication when creating new api's.
So I would suggest you to implement a validation method and keep it as an
util where this could be used in several operations(eg: copy api/update api
etc with slight changes)

Validation method should be consist of following constraints:
1) Check if the api provide/name/version is unique
if true,
2) Check it the context is duplicated only within same api-name/provider
api's. (you could do this easily with a simple sql dml statement)



On Mon, Sep 12, 2016 at 9:15 AM, Kaveesha Perera  wrote:

> Hi,
>
> Currently I'm working on a client side tool to perform import and export
> of APIs in APIM.
>
> I encountered conflict issues(error 409) on following instances when using
> REST APIs to create a new API.
> 1.when trying to re-import already published API
> 2.when trying to import a new version of a already published API
>
> This is because along API post , initially it check for the context by [1]
> and throw a conflict exception if context template of the payload and a
> existing API alike.
>
> Several versions of the same API has same *context template* and it
> should be a exceptional scenario of above mentioned procedure. To handle
> this hope to do following modifications for apisPost REST API.
>
> On API post initially do check [1]. If returns true, get the corresponding
> API name from the database and check if the API name in the payload and the
> published API are same. If same, then retrieve all the published versions
> of that API from the database and check those against version stated in the
> payload. Method should throws a conflict exception only if the payload
> holds a already published version of the API or if the API name on the
> payload and API name in the database corresponds to same context template
> differs.Else it will allow the normal process of creating a new API.Summary
> of the proposed changes are shown in Fig.1.0
>
>
> ​*Fig.1.0*
>
>
>
> [1] apiProvider.isDuplicateContextTemplate(body.getContext())
>
> If any feedback please do reply.
>
> Regards,
> Kaveesha
>
> --
> Kaveesha Perera
> Intern - Software Engineering
>
> mobile: 0716130471
>
> ___
> Architecture mailing list
> Architecture@wso2.org
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
*Lahiru Cooray*
Software Engineer
WSO2, Inc.;http://wso2.com/
lean.enterprise.middleware

Mobile: +94 715 654154
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


[Architecture] Dynamically Detection Of Broker Nodes In Cluster And Load Balancing

2016-09-12 Thread Sidath Weerasinghe
Hi all,

Dynamically detection of broker nodes in cluster project; This implement is
changed to C4 from C5.

In server side -
I got the AMQP transport address which is mention in broker.xml. Sometimes
it mentions as 0.0.0.0 so it got the problem when getting that address. For
that, I'm going get all the interfaces (LAN , WIFI) bound IP address and
store in the database when the node is joined to the cluster. When node
shutdown or node goes down I will remove them from the database.
Maintaining a flag to notify last issued IP address on the database.
According to that, I will shuffle IP address.


I modify the Andes client and implements the new InitialContextFactory.
Inside this, I will call web service and get the IP and port and resolve it
according to the client network. After that get the Actual IP and the port.


Refer mail thread - "Dynamic Detection of Broker Nodes in Cluster and Load
Balancing"
If any feedback please do reply.

-- 
Thank You,
Best Regards,

Sidath Weerasinghe


*Intern*

*WSO2, Inc. *

*lean . enterprise . middleware *


*Mobile: +94719802550 <%2B94719802550>*

*Email: *sid...@wso2.com

Blog: https://medium.com/@sidath

Linkedin: https://lk.linkedin.com/in/sidathweerasinghe
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


[Architecture] [Dev] WSO2 Message Broker 3.2.0-M3 Released !

2016-09-12 Thread Indika Sampath
Hi All,

The WSO2 Message Broker team is pleased to announce the 3rd milestone
release of WSO2 Message Broker (MB)  3.2.0.

Source & binary distribution files of WSO2 Message Broker :

 Runtime : https://github.com/wso2/product-mb/releases/tag/v3.2.0-M3
 Analytics :
https://github.com/wso2/analytics-mb/releases/tag/v3.2.0-M3

All the known issues identified are listed in JIRA.[1][2].

[1] https://wso2.org/jira/browse/MB
[2] https://wso2.org/jira/browse/ANLYMB

Regards,
~MB Team~

-- 
Indika Sampath
Senior Software Engineer
WSO2 Inc.
http://wso2.com

Phone: +94 716 424 744
Blog: http://indikasampath.blogspot.com/
___
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture


Re: [Architecture] Multidimensional Space Search with Lucene 6 - Possible Scenarios and the API

2016-09-12 Thread Srinath Perera
ah loot=look

On Tue, Sep 13, 2016 at 10:37 AM, Srinath Perera  wrote:

> Hi Suho,
>
> Please loot this, and some of the scenarios should be handy for IoT
>
> --Srinath
>
> On Mon, Sep 12, 2016 at 9:52 AM, Janaka Thilakarathna 
> wrote:
>
>> Hi all,
>>
>> I am currently working on the project Multidimensional Space Search with
>> Lucene 6 for DAS. Here is a list of possible functionalities (and some
>> example scenarios) that we can provide using Lucene 6.1 's multi
>> dimensional space search.
>> (This is a brief description on collected details, to see more info and
>> the test codes, please visit my git repo
>> .)
>>
>> Multidimensional space points in Lucene 6 can be categorized into two
>> types of space points depending on their query types and distribution of
>> points in the space.
>>
>>1. General Multidimensional space points.
>>2. Locations on the planet surface.
>>
>>
>> *1. General Multidimensional Space Points*
>> This is the generic type of multi dimensional space points. Those are
>> actually vector spaces. Space has a dimension K. In a K-dimensional space a
>> point is represented by a K number of numeric values.
>>
>> For an example 3D point will be represented by 3 float/double values.
>> (because distance is measured as a floating point number.)
>>
>> *Possible Queries for API*
>>
>>- Search for points with exact given values.
>>- Search for points which has one of the value from a given set of
>>values.
>>- Search for points within a given range.
>>- Get the number of points which has exact point.
>>- Get the number of points within a given range. (Ranges are
>>multidimensional ranges. In 3D, they are boxes.)
>>- Divide points into range-buckets and get the count in each buckets.
>>(Range bucket is a range which has a label in it)
>>
>> *Scenarios*
>>
>> Since this is a more general space definition this could have many
>> general applications. These dimensions can be used to represent any numeric
>> field. They can be used to represent locations in a 3D space if we use 3
>> dimensional points. We can use it to represent both space and time if we
>> use 4 dimensions. (X, Y, Z, time all are numeric fields. Double can be
>> used).
>>
>> *Independent Parameters*
>>
>> It can be used to represent completely independent parameters. Think
>> there is a set of employees. A multi dimensional space can be used to
>> represent different parameters of them.
>>
>> *Example:* Age, salary, height, average number of leaves per month.
>> These are 4 numeric fields which are completely independent. It can be
>> represented as a 4 Dimensional Space. Each person will be represented as a
>> point in this space. Then the user can use Lucene to query  on these
>> people.
>>
>>- What is the number of people who's age is 25-30 years, Height is
>>160cm to 180cm, Salary is 50,000 to 75,000 and take 1-5 average number of
>>leaves per month?
>>- Or user can divide people into different buckets and count them
>>depending on the ranges for each parameter.
>>
>> (Of course this can be done by indexing those parameters separately and
>> query using 'AND' keyword, but indexing them together as a multidimensional
>> space will make searching more efficient)
>>
>> *2. Locations on the planet surface. (Latitude, Longitude)*
>> Here points represents locations on top of the planet surface. This is a
>> more specific type of search provided by Lucene to index and search
>> geographical locations.
>>
>> These points are created using only the latitude and longitude values of
>> locations.
>> **Please consider altitude is not yet supported by Lucene.*
>>
>> Since this is specifically designed for Locations search it has more
>> useful queries than General Multidimensional Points.
>>
>> *Possible Queries for the API*
>>
>>- Search for the K-nearest points from a given location. (return the
>>given number of points)
>>- Search for the Points within a given radius from a given point.
>>- Sort them by the distance from a given location.
>>- Points inside a polygon.(Polygons are geometric shapes on the
>>surface of the planet. Example: map of a country)
>>- Get the number of points inside a polygon.*
>>- Get the number of points in each bucket where buckets are specified
>>as polygons.
>>- Get the number of points in each bucket where buckets are specified
>>by the distance from a given location.
>>
>> * Composite polygons are possible.
>> *Scenarios*
>>
>> *Airport Scenario *
>> If we index the set of airports in the world as GeoPoints. Following
>> queries are possible examples. (Here is the test code I implemented as
>> an example.)
>> 
>>
>>- Find closest set of airports to a given town.
>>- Find the set of airports within a given radius 

Re: [Architecture] RFC: Video and Image Processing Support in WSO2 Platform

2016-09-12 Thread Srinath Perera
Anusha, we should try Adaboost as Geesara mentioned ( when we done with
what we are doing).

--Srinath

On Sun, Sep 11, 2016 at 10:52 AM, Anusha Jayasundara 
wrote:

> Hi Sumedha,
>
> I just detect the face. I went through few articles about face
> recognition, and I have a sample code also, but it is not that much
> accurate.
>
> Thanks,
>
>
> On Fri, Sep 9, 2016 at 11:26 AM, Sumedha Rubasinghe 
> wrote:
>
>> On Fri, Sep 9, 2016 at 11:24 AM, Anusha Jayasundara 
>> wrote:
>>
>>> Hi Geesara,
>>>
>>> I used Haar full body cascade and HoG pedestrian detection cascade, In
>>> Haar full body cascade they have mentioned that, upper body detection,
>>> lower body detection and full body detection is there in the cascade. even
>>> thought it is there, once I tried to use separate upper body detection
>>> cascade with full body detection cascade. but when it is implemented system
>>> took long time to process even a simple video with two person.
>>> I'll upload my code to Github repo.
>>> I still didn't work with real-time CCTV videos ,but I was able to build
>>> a real-time face detection system using the web cam of my laptop and it had
>>> issues on processing as the machine couldn't handle it.
>>>
>>
>> Anusha,
>> Did you just detect the face or associated that with a name as well?
>>
>>
>>
>>> We thought of doing video processing out side of the CEP and send the
>>> process data in to the CEP.(i.e human count, time_stamp, frame rate
>>> ,etc..). For now I send those data into CEP as a Json POST request.
>>>
>>>
>>> Thank You,
>>>
>>>
>>>
>>>
>>> On Wed, Sep 7, 2016 at 11:57 PM, Geesara Prathap 
>>> wrote:
>>>
 Hi Anusha,

 A few suggestions to improve your implementation.
 Haar and HoG  are used to get visual descriptors which can be used to
 describe an image. Then both of them are using boosting classification like
 AdaBoost to tune up its performance. When you are using haar-like feature
 extraction method you need to use more that one model in order to make the
 final decision. Let's say you are using  full body classifier for human
 detection. Along with this classifier,  can’t detect  upper body properly.
 When haar-like feature extraction is used you may have to use more that one
 classifier and the final decision will be taken aggregation or composition
 of those results. Next important thing is pre-processing. It may be
 composed of color balancing, gamma correction , changing color space and
 some of the factors which unique to  the environment which you're trying
 out. Processing model is also more important since this is to be done in
 real time. If you can explain your algorithm we will able to provide some
 guidance in order to improve your algorithm to get a better result.

 Since the main intention of this project is to facilitate support for
 images process in the WSO2 Platform. I am just curious to know, how do you
 process the video stream in real-time with the help of CEP. Since you are
 using CCTV feeds which might be using RTSP or RTMP, where do you process
 the incoming video stream? Are you to develop RTSP or RTMP input adapters
 so as to get input stream into CEP?

 Thanks,
 Geesara

 On Wed, Aug 31, 2016 at 8:16 PM, Anusha Jayasundara 
 wrote:

> Hi,
>
> The Progress of the video processing project is described in the
> attached pdf.
>
> On Wed, Aug 31, 2016 at 11:39 AM, Srinath Perera 
> wrote:
>
>> Anusha has the people counting from video working through CEP and
>> have a dashboard. ( Anusha can u send an update with screen shots?). We
>> will also setup a meeting.
>>
>> Also seems new Camaras automatically do human detection etc and add
>> object codes to videos, and if we can extract them, we can do some 
>> analysis
>> without heavy processing as well. Will explore this too.
>>
>> Also Facebook opensourced their object detection code called FaceMask
>> https://code.facebook.com/posts/561187904071636. Another to look at.
>>
>> --Srinath
>>
>>
>>
>> On Mon, Aug 15, 2016 at 4:14 PM, Sanjiva Weerawarana <
>> sanj...@wso2.com> wrote:
>>
>>> Looks good!
>>>
>>> In terms of test data we can take the video cameras in the LK Palm
>>> Grove lobby as an input source to play around with people analysis. For
>>> vehicles we can plop a camera pointing to Duplication Road and get 
>>> plenty
>>> of data :-).
>>>
>>> I guess we should do some small experiments to see how things work.
>>>
>>> Sanjiva.
>>>
>>> On Wed, Aug 10, 2016 at 3:02 PM, Srinath Perera 
>>> wrote:
>>>
 Attached document list some of the initial ideas about the topic.
 Anusha is exploring some of the 

Re: [Architecture] Log Analyzer performance analysis

2016-09-12 Thread Malith Jayasinghe
Hi Tishan,
Inserting results indeed.  Just wondering if you made any efforts to do
query level optimization? For example, there are ways to analyze the query
logs (e.g. mysql slow-query log). Such analysis/profiling can provide us
information about queries (execution time, lock time, index of dispersion,
query-execution time distribution etc) which we can use to further optimize
the performance of db queries.

For example, you can get highest time consuming queries (say top 10) and
these queries can then be optimized individually (i.e. by adding indexes
etc).

regards

Malith


On Mon, Sep 12, 2016 at 11:36 AM, Ruwan Abeykoon  wrote:

> Hi Tishan,
> Yes, In production there is less number of logs in healthy system.
> However, we have to think of the worst case scenario too. We have to make
> sure that APIM does not become unstable even worst external service outage.
>
> Cheers,
> Ruwan
>
> On Mon, Sep 12, 2016 at 10:56 AM, Tishan Dahanayakage 
> wrote:
>
>> Hi Ruwan,
>>
>> Thanks for the comments and suggestions.
>>
>> On Mon, Sep 12, 2016 at 8:01 AM, Ruwan Abeykoon  wrote:
>>
>>> Hi All,
>>> Thanks Tishan for the performance test.
>>>
>>> The test machines looks having average configuration for the purpose.
>>> Hence I think the hardware is realistic.
>>>
>>> What we notice here is that 10,000 events per second being the
>>> sustainable throughput. Since it seems at this rate CPU is loaded 60%, and
>>> plenty of GC happening.
>>>
>>> Lets assume that on average an API invocation prints four(4) log lines.
>>> This means we can sustain only 2500 API invocations on API manager when Log
>>> Analyzer is turned on. This looks pretty low and not acceptable figure IMO.
>>>
>> ​AFAIK in a production environment we don't print any log per API
>> invocation. Scenario which I can think logs can flood is the scenario of BE
>> unavailability which will lead to timeouts logs. But in that case also we
>> will get 1-2 log lines per API. Another situation is where ​the user has
>> enabled debug logs. Given this what is the throughput level that you see as
>> reasonable?
>>
>>>
>>> We have to investigate where the bottlenecks are, and optimize them so
>>> that the log publisher performance closely matches that of APIM/ESB
>>> advertised throughput.
>>>
>> ​Bottleneck is at the persisting layer. We are writing each and every
>> message to DB(DAS of-course write this in batches) and then run scripts on
>> that. So we are bounded by that limitation in this case. These numbers are
>> after we tune DB parameters. Earlier it was in the range of 700-1000. After
>> tuning DB level we were able to increase throughput to this level.
>> I can certainly look into ​code level and see whether we can optimize
>> this more. Or else we need to re-think 'persist everything' strategy and
>> look into option of persisting only the summary. But does that effort
>> reasonable with our future road map?
>>
>> Thanks
>> /Tishan
>>
>>>
>>> Cheers,
>>> Ruwan
>>>
>>> On Fri, Sep 9, 2016 at 6:09 PM, Tishan Dahanayakage 
>>> wrote:
>>>
 HI all,

 Please find the below document which contains log analyzer performance
 analysis. Initially we could not achieve desired performance numbers. After
 analyzing the JVM recording figured out the limitation was with Database
 access. After tuning DB access we were able to get better throughput.
 Analysis correlates throughput results with system resource utilization.

 Thanks,
 /Tishan

 --
 Tishan Dahanayakage
 Senior Software Engineer
 WSO2, Inc.
 Mobile:+94 716481328

 Disclaimer: This communication may contain privileged or other
 confidential information and is intended exclusively for the addressee/s.
 If you are not the intended recipient/s, or believe that you may have
 received this communication in error, please reply to the sender indicating
 that fact and delete the copy you received and in addition, you should not
 print, copy, re-transmit, disseminate, or otherwise use the information
 contained in this communication. Internet communications cannot be
 guaranteed to be timely, secure, error or virus-free. The sender does not
 accept liability for any errors or omissions.

>>>
>>>
>>>
>>> --
>>>
>>> *Ruwan Abeykoon*
>>> *Associate Director/Architect**,*
>>> *WSO2, Inc. http://wso2.com  *
>>> *lean.enterprise.middleware.*
>>>
>>>
>>
>>
>> --
>> Tishan Dahanayakage
>> Senior Software Engineer
>> WSO2, Inc.
>> Mobile:+94 716481328
>>
>> Disclaimer: This communication may contain privileged or other
>> confidential information and is intended exclusively for the addressee/s.
>> If you are not the intended recipient/s, or believe that you may have
>> received this communication in error, please reply to the sender indicating
>> that fact and delete the copy you received and in addition, you should not
>> 

Re: [Architecture] Multidimensional Space Search with Lucene 6 - Possible Scenarios and the API

2016-09-12 Thread Srinath Perera
Hi Suho,

Please loot this, and some of the scenarios should be handy for IoT

--Srinath

On Mon, Sep 12, 2016 at 9:52 AM, Janaka Thilakarathna 
wrote:

> Hi all,
>
> I am currently working on the project Multidimensional Space Search with
> Lucene 6 for DAS. Here is a list of possible functionalities (and some
> example scenarios) that we can provide using Lucene 6.1 's multi
> dimensional space search.
> (This is a brief description on collected details, to see more info and
> the test codes, please visit my git repo
> .)
>
> Multidimensional space points in Lucene 6 can be categorized into two
> types of space points depending on their query types and distribution of
> points in the space.
>
>1. General Multidimensional space points.
>2. Locations on the planet surface.
>
>
> *1. General Multidimensional Space Points*
> This is the generic type of multi dimensional space points. Those are
> actually vector spaces. Space has a dimension K. In a K-dimensional space a
> point is represented by a K number of numeric values.
>
> For an example 3D point will be represented by 3 float/double values.
> (because distance is measured as a floating point number.)
>
> *Possible Queries for API*
>
>- Search for points with exact given values.
>- Search for points which has one of the value from a given set of
>values.
>- Search for points within a given range.
>- Get the number of points which has exact point.
>- Get the number of points within a given range. (Ranges are
>multidimensional ranges. In 3D, they are boxes.)
>- Divide points into range-buckets and get the count in each buckets.
>(Range bucket is a range which has a label in it)
>
> *Scenarios*
>
> Since this is a more general space definition this could have many general
> applications. These dimensions can be used to represent any numeric field.
> They can be used to represent locations in a 3D space if we use 3
> dimensional points. We can use it to represent both space and time if we
> use 4 dimensions. (X, Y, Z, time all are numeric fields. Double can be
> used).
>
> *Independent Parameters*
>
> It can be used to represent completely independent parameters. Think there
> is a set of employees. A multi dimensional space can be used to represent
> different parameters of them.
>
> *Example:* Age, salary, height, average number of leaves per month. These
> are 4 numeric fields which are completely independent. It can be
> represented as a 4 Dimensional Space. Each person will be represented as a
> point in this space. Then the user can use Lucene to query  on these
> people.
>
>- What is the number of people who's age is 25-30 years, Height is
>160cm to 180cm, Salary is 50,000 to 75,000 and take 1-5 average number of
>leaves per month?
>- Or user can divide people into different buckets and count them
>depending on the ranges for each parameter.
>
> (Of course this can be done by indexing those parameters separately and
> query using 'AND' keyword, but indexing them together as a multidimensional
> space will make searching more efficient)
>
> *2. Locations on the planet surface. (Latitude, Longitude)*
> Here points represents locations on top of the planet surface. This is a
> more specific type of search provided by Lucene to index and search
> geographical locations.
>
> These points are created using only the latitude and longitude values of
> locations.
> **Please consider altitude is not yet supported by Lucene.*
>
> Since this is specifically designed for Locations search it has more
> useful queries than General Multidimensional Points.
>
> *Possible Queries for the API*
>
>- Search for the K-nearest points from a given location. (return the
>given number of points)
>- Search for the Points within a given radius from a given point.
>- Sort them by the distance from a given location.
>- Points inside a polygon.(Polygons are geometric shapes on the
>surface of the planet. Example: map of a country)
>- Get the number of points inside a polygon.*
>- Get the number of points in each bucket where buckets are specified
>as polygons.
>- Get the number of points in each bucket where buckets are specified
>by the distance from a given location.
>
> * Composite polygons are possible.
> *Scenarios*
>
> *Airport Scenario *
> If we index the set of airports in the world as GeoPoints. Following
> queries are possible examples. (Here is the test code I implemented as an
> example.)
> 
>
>- Find closest set of airports to a given town.
>- Find the set of airports within a given radius from a particular
>town.
>- Find the set of airports inside a country. (Country can be given as
>a polygon)
>- Find the set of airports within a given range of Latitudes and
>Longitudes. It