Re: [Proposal] Thread monitoring mechanism

2018-02-23 Thread Anilkumar Gingade
Good idea, we need better tools (echo systems) to manage/monitor Geode
resources.
In Geode many times the work is handed to other low-level threads
(messaging) or new threads/runnables; it will be nice to have some
mechanism to associate main work-thread to low level thread; that will give
better indication on who is waiting on whom.

-Anil




On Fri, Feb 23, 2018 at 3:08 PM, Barry Oglesby  wrote:

> A lot of the Geode thread pools are defined in ClusterDistributionManager.
> Most of these use custom ThreadPoolExecutors like:
>
> SerialQueuedExecutorWithDMStats
> PooledExecutorWithDMStats
> FunctionExecutionPooledExecutor
>
> These classes all extend ThreadPoolExecutor and override beforeExecute and
> afterExecute. These methods are currently used by helper classes to update
> the stats before and after a thread executes. Potentially these same
> methods could be used to add and remove a thread from a monitor. For
> example, there could be a FunctionExecutionThreadMonitor that is created as
> part of the FunctionExecutionPooledExecutor whose job it would be to
> monitor FunctionExecution threads. The beforeExecute method would add the
> thread to the monitor; the afterExecute would remove the thread from the
> monitor.
>
> I would be mindful about the performance impact of adding these monitors,
> though.
>
>
> Thanks,
> Barry Oglesby
>
>
> On Wed, Feb 21, 2018 at 11:41 AM, Gregory Vortman <
> gregory.vort...@amdocs.com> wrote:
>
> > That's the point exactly to have a single very thin and generic mechanism
> > to cover all threads/threads pool. Nothing is specific in this solution.
> > Regards
> >
> >
> > -Original Message-
> > From: Jason Huynh [jhu...@pivotal.io]
> > Received: Wednesday, 21 Feb 2018, 20:54
> > To: d...@geode.apache.org [d...@geode.apache.org]
> > CC: user@geode.apache.org [user@geode.apache.org]
> > Subject: Re: [Proposal] Thread monitoring mechanism
> >
> > I am assuming this would be for all thread/thread pools and not specific
> > to Function threads.  I wonder what the impact would be for put/get
> > operations or are we going to target specific operations.
> >
> >
> >
> > On Tue, Feb 20, 2018 at 1:04 AM Gregory Vortman <
> > gregory.vort...@amdocs.com> wrote:
> > Hello team,
> > One of the most severe issues hitting our real time application is thread
> > stuck for multiple reasons, such as long lasting locks, deadlocks,
> threads
> > which wait for reply forever in case of packet drop issue etc...
> > Such kind of stuck are under Radar of the existing system health check
> > methods.
> > In mission critical applications, this will be resulted as an immediate
> > outage.
> >
> > As a short we are implementing kind of internal watch dog mechanism for
> > stuck detector:
> >There is a registration object
> >Function executor having start/end hooks to
> > register/unregister the thread via the registration object
> > Customized Monitoring scheduled thread is spawned on startup. The thread
> > to wake up every N seconds, to scan the registration map and to detect
> > unregistered threads for a long time (configurable).
> > Once such threads has been detected, process stack is taken and thread
> > stack statistic metric is provided.
> >
> > This helps us to monitor, detect and take fast decision about the action
> > which should be taken - usually it is member bounce decision (consistency
> > issue is possible, in our case it is better than deny of service).
> > The above solution is not touching GEODE core code, but implemented in
> > boundaries of customized code only.
> >
> > I would like to raise a proposal to introduce a long term generic thread
> > monitoring mechanism, to detect threads which are stuck for any reason.
> > To maintain a monitoring object having a start/end methods to be invoked
> > similarly to FunctionStats.startFunctionExecution and FunctionStats.
> > endFunctionExecution.
> >
> > Your feedback would be appreciated
> >
> > Thank you for cooperation.
> > Best regards!
> >
> > Gregory Vortman
> >
> > This message and the information contained herein is proprietary and
> > confidential and subject to the Amdocs policy statement,
> >
> > you may review at https://www.amdocs.com/about/email-disclaimer <
> > https://www.amdocs.com/about/email-disclaimer>
> > This message and the information contained herein is proprietary and
> > confidential and subject to the Amdocs policy statement,
> >
> > you may review at https://www.amdocs.com/about/email-disclaimer <
> > https://www.amdocs.com/about/email-disclaimer>
> >
>


Re: [Proposal] Thread monitoring mechanism

2018-02-23 Thread Barry Oglesby
A lot of the Geode thread pools are defined in ClusterDistributionManager.
Most of these use custom ThreadPoolExecutors like:

SerialQueuedExecutorWithDMStats
PooledExecutorWithDMStats
FunctionExecutionPooledExecutor

These classes all extend ThreadPoolExecutor and override beforeExecute and
afterExecute. These methods are currently used by helper classes to update
the stats before and after a thread executes. Potentially these same
methods could be used to add and remove a thread from a monitor. For
example, there could be a FunctionExecutionThreadMonitor that is created as
part of the FunctionExecutionPooledExecutor whose job it would be to
monitor FunctionExecution threads. The beforeExecute method would add the
thread to the monitor; the afterExecute would remove the thread from the
monitor.

I would be mindful about the performance impact of adding these monitors,
though.


Thanks,
Barry Oglesby


On Wed, Feb 21, 2018 at 11:41 AM, Gregory Vortman <
gregory.vort...@amdocs.com> wrote:

> That's the point exactly to have a single very thin and generic mechanism
> to cover all threads/threads pool. Nothing is specific in this solution.
> Regards
>
>
> -Original Message-
> From: Jason Huynh [jhu...@pivotal.io]
> Received: Wednesday, 21 Feb 2018, 20:54
> To: d...@geode.apache.org [d...@geode.apache.org]
> CC: user@geode.apache.org [user@geode.apache.org]
> Subject: Re: [Proposal] Thread monitoring mechanism
>
> I am assuming this would be for all thread/thread pools and not specific
> to Function threads.  I wonder what the impact would be for put/get
> operations or are we going to target specific operations.
>
>
>
> On Tue, Feb 20, 2018 at 1:04 AM Gregory Vortman <
> gregory.vort...@amdocs.com> wrote:
> Hello team,
> One of the most severe issues hitting our real time application is thread
> stuck for multiple reasons, such as long lasting locks, deadlocks, threads
> which wait for reply forever in case of packet drop issue etc...
> Such kind of stuck are under Radar of the existing system health check
> methods.
> In mission critical applications, this will be resulted as an immediate
> outage.
>
> As a short we are implementing kind of internal watch dog mechanism for
> stuck detector:
>There is a registration object
>Function executor having start/end hooks to
> register/unregister the thread via the registration object
> Customized Monitoring scheduled thread is spawned on startup. The thread
> to wake up every N seconds, to scan the registration map and to detect
> unregistered threads for a long time (configurable).
> Once such threads has been detected, process stack is taken and thread
> stack statistic metric is provided.
>
> This helps us to monitor, detect and take fast decision about the action
> which should be taken - usually it is member bounce decision (consistency
> issue is possible, in our case it is better than deny of service).
> The above solution is not touching GEODE core code, but implemented in
> boundaries of customized code only.
>
> I would like to raise a proposal to introduce a long term generic thread
> monitoring mechanism, to detect threads which are stuck for any reason.
> To maintain a monitoring object having a start/end methods to be invoked
> similarly to FunctionStats.startFunctionExecution and FunctionStats.
> endFunctionExecution.
>
> Your feedback would be appreciated
>
> Thank you for cooperation.
> Best regards!
>
> Gregory Vortman
>
> This message and the information contained herein is proprietary and
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer <
> https://www.amdocs.com/about/email-disclaimer>
> This message and the information contained herein is proprietary and
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer <
> https://www.amdocs.com/about/email-disclaimer>
>


[SECURITY] CVE-2017-15693 Apache Geode unsafe deserialization of application objects

2018-02-23 Thread Anthony Baker
CVE-2017-15693 Apache Geode unsafe deserialization of application objects

Severity:  Important

Vendor: The Apache Software Foundation

Versions Affected:  Apache Geode 1.0.0 through 1.3.0

Description:
The Geode server stores application objects in serialized form.
Certain cluster operations and API invocations cause these objects to
be deserialized.  An user with DATA:WRITE access to the cluster may be
able to cause remote code execution if certain classes are present on
the classpath.

Mitigation:
Users of the affected versions should upgrade to Apache Geode 1.4.0 or
later.  In addition, users should set the flags
validate-serializable-objects and serializable-object-filter.

Credit:
This issue was reported responsibly to the Apache Geode Security Team
by Man Yue Mo from Semmle.

References:
[1] https://issues.apache.org/jira/browse/GEODE-3923
[2] 
https://cwiki.apache.org/confluence/display/GEODE/Release+Notes#ReleaseNotes-SecurityVulnerabilities


[SECURITY] CVE-2017-15692 Apache Geode unsafe deserialization in TcpServer

2018-02-23 Thread Anthony Baker
CVE-2017-15692 Apache Geode unsafe deserialization in TcpServer

Severity:  Important

Vendor: The Apache Software Foundation

Versions Affected:  Apache Geode 1.0.0 through 1.3.0

Description:
The TcpServer within the Geode locator opens a network port that
deserializes data.  If an unprivileged user gains access to the Geode
locator, they may be able to cause remote code execution if certain
classes are present on the classpath.

A malicious user can send a network message to the Geode locator and
execute code if certain classes are present on the classpath.

Mitigation:
Users of the affected versions should upgrade to Apache Geode 1.4.0 or
later.  In addition, users should set the flag
validate-serializable-objects.

Credit:
This issue was reported responsibly to the Apache Geode Security Team
by Man Yue Mo from Semmle.

References:
[1] https://issues.apache.org/jira/browse/GEODE-3923
[2] 
https://cwiki.apache.org/confluence/display/GEODE/Release+Notes#ReleaseNotes-SecurityVulnerabilities


Re: LinkedList with OQL not working

2018-02-23 Thread Jason Huynh
Hi Dharam,

I tried something similar but instead of IS_DEFINED I just used a string
equality comparison.
I have a top level object and a linkedList in the top level object.  The
linkedList contains two Position objects with a field named secId. One
position has a secId of OBJECTA and other OBJECTB.
I created 2 of these top level objects, so the region only has 2 objects.

When running "select p.ID, p.linkedList from /region p, p.linkedList empty"
 (no filter), I get 4 structs returned as expected, the structs are tuples
of the from clause.  So all combinations of ID and empd.  However we have
projected the entire linkedList into the result based on our query.
struct(ID:1,linkedList:[Position [secId=OBJECTA], Position [secId=OBJECTB]])
struct(ID:1,linkedList:[Position [secId=OBJECTA], Position [secId=OBJECTB]])
struct(ID:3,linkedList:[Position [secId=OBJECTA], Position [secId=OBJECTB]])
struct(ID:3,linkedList:[Position [secId=OBJECTA], Position [secId=OBJECTB]])

When running with a filter  "select p.ID, p.linkedList from /region p,
p.linkedList empd where empd.secId='OBJECTA'", I receive two rows:
struct(ID:1,linkedList:[Position [secId=OBJECTA], Position [secId=OBJECTB]])
struct(ID:3,linkedList:[Position [secId=OBJECTA], Position [secId=OBJECTB]])

This time we only returned the tuples where empd matched the criteria.  We
still projected the entire linkedList into the result (as expected).

Is this what you were hoping to be filtered out?  Are you expecting OBJECTB
to be filtered out from the projection?:
struct(ID:1,linkedList:[Position [secId=OBJECTA]])
struct(ID:3,linkedList:[Position [secId=OBJECTA]])

Sorry if I I have a misunderstanding the expectation/problem.

-Jason



On Fri, Feb 23, 2018 at 5:05 AM Thacker, Dharam 
wrote:

> Hello Jason/Team,
>
> Did you get chance to try out option 1 with filter? I double checked today
> but it's not filtering inside LinkedList.
>
> For my region, I have value as json with nested array which is converted
> to pdx instance using JSONFormatter api.
>
> By default JSON array is always converted to LinkedList with JSONFormatter
> api. Not sure why?
>
> I tried with my other regions as well having LinkedList. But it's not
> getting filtered. As they are JSON mapped PDXInstance, it's quite possible
> that some of JSON documents do not have "something" field which I am trying
> to check using IS_DEFINED(empd.something)
>
> Thanks,
> Dharam
>
> Sent with BlackBerry Work (www.blackberry.com)
> --
> *From: *"Thacker, Dharam" 
> *Sent: *Feb 22, 2018 08:46
> *To: *user@geode.apache.org
> *Subject: *RE: LinkedList with OQL not working
>
> Thanks Jason/Anil for the reply!
>
> Jason,
> It's exactly the same output I am also getting for option 1 and option 2.
>
> Now with your option 1, if you try to apply filter like I have
> IS_DEFINED(empd.something) it won't get filtered.
>
> In your case your linked list has "ObjectX". Assume that ObjectX is
> instance of java class "Customer" having field as "country". Now as
> customer object is within LinkedList, Option 1 would fail with filter
> (IS_DEFINEE(objectX.country)).
>
> Option2 works but it then changes the layout of output and in real world
> output is sent back as xml/json to business service.
>
> Could you try that as well for Option 1?
>
> Anil,
>
> In both the options, my filter and outcome should be same as per my
> queries but layout is different.
>
> In one case I am directly asking "service.dependencies" and in another
> case I am asking "alias" for "service.dependenies".
>
> But the issue is Option1 does not give right outcome as filter condition
> is not being applied.
>
> In real world, the actual object stored in geode itself is de normalized
> document. In my example "Dependencies grouped by service name"
>
> Now if I again have to write code to group them(Map)
> then it defeats the purpose of having such prepared de normalized model for
> efficiency.
>
> I hope that makes sense.
>
> Dharam
>
> Sent with BlackBerry Work (www.blackberry.com)
> --
> *From: *Jason Huynh 
> *Sent: *Feb 22, 2018 00:20
> *To: *user@geode.apache.org
> *Subject: *Re: LinkedList with OQL not working
>
> Correction, empd not empty, not sure how autocorrect came into play.
>
> On Wed, Feb 21, 2018 at 10:48 AM Jason Huynh  wrote:
>
>> I just tried similar queries and get the following output
>> Option1:
>> struct(ID:1,linkedList:[ObjectA, ObjectB])
>> struct(ID:1,linkedList:[ObjectA, ObjectB])
>> struct(ID:3,linkedList:[ObjectC, ObjectD])
>> struct(ID:3,linkedList:[ObjectC, ObjectD])
>>
>> If I add a distinct keyword then I get only 2 rows:
>> struct(ID:1,linkedList:[ObjectA, ObjectB])
>> struct(ID:3,linkedList:[ObjectC, ObjectD])
>>
>> Option2:
>> struct(ID:3,empd:ObjectD)
>> struct(ID:1,empd:ObjectA)
>> struct(ID:1,empty:ObjectB)
>> struct(ID:3,empd:ObjectC)
>>
>> 

RE: LinkedList with OQL not working

2018-02-23 Thread Thacker, Dharam
Hello Jason/Team,

Did you get chance to try out option 1 with filter? I double checked today but 
it's not filtering inside LinkedList.

For my region, I have value as json with nested array which is converted to pdx 
instance using JSONFormatter api.

By default JSON array is always converted to LinkedList with JSONFormatter api. 
Not sure why?

I tried with my other regions as well having LinkedList. But it's not getting 
filtered. As they are JSON mapped PDXInstance, it's quite possible that some of 
JSON documents do not have "something" field which I am trying to check using 
IS_DEFINED(empd.something)

Thanks,
Dharam

Sent with BlackBerry Work (www.blackberry.com)

From: "Thacker, Dharam" 
Sent: Feb 22, 2018 08:46
To: user@geode.apache.org
Subject: RE: LinkedList with OQL not working

Thanks Jason/Anil for the reply!

Jason,
It's exactly the same output I am also getting for option 1 and option 2.

Now with your option 1, if you try to apply filter like I have 
IS_DEFINED(empd.something) it won't get filtered.

In your case your linked list has "ObjectX". Assume that ObjectX is instance of 
java class "Customer" having field as "country". Now as customer object is 
within LinkedList, Option 1 would fail with filter 
(IS_DEFINEE(objectX.country)).

Option2 works but it then changes the layout of output and in real world output 
is sent back as xml/json to business service.

Could you try that as well for Option 1?

Anil,

In both the options, my filter and outcome should be same as per my queries but 
layout is different.

In one case I am directly asking "service.dependencies" and in another case I 
am asking "alias" for "service.dependenies".

But the issue is Option1 does not give right outcome as filter condition is not 
being applied.

In real world, the actual object stored in geode itself is de normalized 
document. In my example "Dependencies grouped by service name"

Now if I again have to write code to group them(Map) then 
it defeats the purpose of having such prepared de normalized model for 
efficiency.

I hope that makes sense.

Dharam

Sent with BlackBerry Work (www.blackberry.com)

From: Jason Huynh 
Sent: Feb 22, 2018 00:20
To: user@geode.apache.org
Subject: Re: LinkedList with OQL not working

Correction, empd not empty, not sure how autocorrect came into play.

On Wed, Feb 21, 2018 at 10:48 AM Jason Huynh 
> wrote:
I just tried similar queries and get the following output
Option1:
struct(ID:1,linkedList:[ObjectA, ObjectB])
struct(ID:1,linkedList:[ObjectA, ObjectB])
struct(ID:3,linkedList:[ObjectC, ObjectD])
struct(ID:3,linkedList:[ObjectC, ObjectD])

If I add a distinct keyword then I get only 2 rows:
struct(ID:1,linkedList:[ObjectA, ObjectB])
struct(ID:3,linkedList:[ObjectC, ObjectD])

Option2:
struct(ID:3,empd:ObjectD)
struct(ID:1,empd:ObjectA)
struct(ID:1,empty:ObjectB)
struct(ID:3,empd:ObjectC)

Adding an order by I can get them ordered by ID and empty
struct(ID:1,empd:ObjectA)
struct(ID:1,empty:ObjectB)
struct(ID:3,empty:ObjectC)
struct(ID:3,empd:ObjectD)


On Wed, Feb 21, 2018 at 10:30 AM Jason Huynh 
> wrote:
In option1, are you receiving the linked list or is it not returning the values 
at all?
Is the problem in option1 just a display issue?

For option2,  you might be able to do a distinct with an order by but that will 
force uniqueness in the tupling which you may not be looking for.

On Wed, Feb 21, 2018 at 6:10 AM Thacker, Dharam 
> wrote:
Hello Team,

I am unable to apply any filter conditions using OQL if collection is of type 
LinkedList. Below query does not work as expected.
Below query gives me dependencies grouped at service name level and array of 
depenencies under it.

Option1:
select 
service.name, 
service.dependencies from /Service service,service.dependancies empd where 
IS_DEFINED(empd.something)

Output:
Each row = serviceName -> {LinkedList}

Option2:
If I change query like below one then it gives filtered result but I don’t get 
grouped by service name at every result comes as individual element.

select 
service.name, empd 
from /Service service,service.dependancies empd where IS_DEFINED(empd.something)

Output:
Each row >>
serviceName -> empd1
serviceName -> empd2
serviceName -> empd3

Is there any such limitation?
Anything we can do to achieve this?

Thanks & Regards,
Dharam


This message is confidential and subject to terms at: 
http://www.jpmorgan.com/emaildisclaimer
 including on confidentiality, legal privilege, viruses and monitoring of 
electronic messages. If you are not the intended recipient, 

RE: Geode: Deadlock situation upon startup

2018-02-23 Thread Daniel Kojic
Thanks for the quick responses!

“Is it possible to share the DiskStoreIDs getting printed across the nodes. 
This will help us to see the dependencies between the nodes/diskstores.”
== Node A:
Region /Configuration has potentially stale data. It is waiting for another 
member to recover the latest data.
My persistent id:

  DiskStore ID: a7ba00dd-b21e-4ee7-a06b-10f93f593180
  Name: XXX
  Location: XXX/Configuration

Members with potentially new data:
[
  DiskStore ID: b4b3f9e3-3abc-4ad1-8871-6c6d61df027f
  Name:
  Location: XXX/Configuration
]
== Node B:
Region /PdxTypes has potentially stale data. It is waiting for another member 
to recover the latest data.
My persistent id:

  DiskStore ID: ca383b42-b7aa-44c1-986e-1af2433fc3c2
  Name:
  Location: XXX/PDX

Members with potentially new data:
[
  DiskStore ID: e4e53819-35a1-4f14-8731-79a2f6ff
  Name: XXX
  Location: XXX/PDX
]

The nodes wait for different diskstores. Does the order in which the diskstores 
are defined in the application play a role?

“Also, you don't need to start a locator for gfsh, you could connect to the 
running locator (part of the cluster); unless the JMX-manager on that locator 
is turned-off.”
You cannot use that JMX manager if the Spring Shell library is not on the apps’ 
classpath (please correct me if I’m wrong) and we’re running an OSGi 
application in which we cannot integrate that dependency that easily.

“As long as you start each member of your cluster, in parallel, they should 
work out amongst themselves who has the latest copy of the data.”
That’s what we thought, too. This however does not seem to be the case if you 
abruptly “power-off” the VM. In all other cases where we gracefully or even 
kill -9ed the process this never happened. The deadlock also occurs when 
starting all at once.

Best
Daniel

From: Anilkumar Gingade [mailto:aging...@pivotal.io]
Sent: Donnerstag, 22. Februar 2018 20:47
To: user@geode.apache.org
Subject: Re: Geode: Deadlock situation upon startup

Is it possible to share the DiskStoreIDs getting printed across the nodes. This 
will help us to see the dependencies between the nodes/diskstores.

Also, you don't need to start a locator for gfsh, you could connect to the 
running locator (part of the cluster); unless the JMX-manager on that locator 
is turned-off.

-Anil.







On Thu, Feb 22, 2018 at 11:14 AM, Darrel Schneider 
> wrote:
As long as you start each member of your cluster, in parallel, they should work 
out amongst themselves who has the latest copy of the data.
You should not need to revoke disk-stores that you still have. Since you are 
only using replicates your temporary solution is safe as long as you do pick 
the last one to write data as the winner.
If you had partitioned regions then it would not be safe to get rid of all disk 
stores except one.

This issue may have been fixed. You are using 1.0.0-incubating. Have you 
considered upgrading to 1.4?


On Thu, Feb 22, 2018 at 2:08 AM, Daniel Kojic 
> wrote:
Hi there

Our setup:
We have a multi-node clustered Java application running in an ESXi environment. 
Each cluster node has Geode embedded via. Spring Data for Apache Geode and has 
its own locator. Multiple replicated regions are shared among the nodes where 
each node has its own disk store.
* Java runtime version: 1.8.0_151
* Geode version: 1.0.0-incubating
* Spring Data Geode version: 1.0.0.INCUBATING-RELEASE
* Spring Data version: 1.12.1.RELEASE

Our problem:
We had situation that caused our geode processes to quit abruptly e.g.
* VM being abruptly powered off (no guest shutdown) or...
* ...CPU-freezes caused by IO degradation.
After restarting the cluster nodes (one after another or all at once), geode 
logs on all nodes show the following:
Region /XXX has potentially stale data. It is waiting for another 
member to recover the latest data.
My persistent id:
  DiskStore ID: XXX
  Name:
  Location: /XXX
Members with potentially new data:
[
  DiskStore ID: XXX
  Name: XXX
  Location: /XXX
]
The problem however is that each node is waiting for the other nodes to join 
although they are already started. Any combination of starting/stopping the 
nodes that are shown as "missing" doesn't seem to do anything.

Our temporary solution:
We managed to "recover" from such a deadlock using gfsh:
* Revoke all missing disk stores except for one "chosen" (preferably the last 
running) node.
* Delete those disk stores.
* Restart the nodes.
As of today we're not able to add the "Spring Shell" dependency to our 
application easily which is why we have to run gfsh with its own locator. This 
requires us to define such a "gfsh locator" in all of our cluster nodes.

What we're looking for:
Our temporary solution comes with some flaws: we're dependent of the gfsh 
tooling with its own locator and