Re: Indexing fields of non-POJO cache values

2017-10-13 Thread Andrey Kornev
[Crossposting to the dev list]

Alexey,

Yes, something like that, where the "reference"/"alias" is expressed as a piece 
of Java code (as part of QueryEntity definition, perhaps) that is invoked by 
Ignite at the cache entry indexing time.

My point is that rather than limiting indexable fields only to predefined POJO 
attributes (or BinaryObject fields) Ignite could adopt a more general approach 
by allowing users designate an arbitrary piece of code (a lambda/closure) to be 
used as an index value extractor. In such case, the current functionality 
(extracting index values from POJO attributes) becomes just a special case 
that's supported by Ignite out of the box.

This would really help in cases (like mine) where the cache values are non-POJO 
entities.

Thanks
Andrey

From: Alexey Kuznetsov <akuznet...@apache.org>
Sent: Thursday, October 12, 2017 5:53 PM
To: user@ignite.apache.org
Subject: Re: Indexing fields of non-POJO cache values

Just as idea.

What if we can to declare a kind of "references" or "aliases" for fields in 
such cases?
And this will help us to avoid duplication of data.

For example in JavaScript I could (almost on the fly) declare getters and 
setters that could be as aliases for my data.


On Fri, Oct 13, 2017 at 12:39 AM, Andrey Kornev 
<andrewkor...@hotmail.com<mailto:andrewkor...@hotmail.com>> wrote:
Hey Andrey,

Thanks for your reply!

We've been using a slightly different approach, where we extract the values of 
the indexable leaf nodes and store them as individual fields of the binary 
object along with the serialized tree itself. Then we configure the cache to 
use those fields as QueryEntities. It works fine and this way we avoid using 
joins in our queries.

However an obvious drawback of such approach is data duplication. We end up 
with three copies of a field value:

1) the leaf node of the tree,
2) the field of the binary object, and
3) Ignite index

I was hoping that there may be a better way to achieve this. In particular I'd 
like to avoid storing the value as a field of a binary object (copy #2).

One possible (and elegant) approach to solving this problem would be to 
introduce a way to specify a method (or a closure) for a QueryEntity in 
addition to currently supported BinaryObject field/POJO attribute.

Regards
Andrey


From: Andrey Mashenkov 
<andrey.mashen...@gmail.com<mailto:andrey.mashen...@gmail.com>>
Sent: Thursday, October 12, 2017 6:25 AM
To: user@ignite.apache.org<mailto:user@ignite.apache.org>
Subject: Re: Indexing fields of non-POJO cache values

Hi,

Another way here is to implement your own query engine by extending IndexingSPI 
interface, which looks much more complicated.

On Thu, Oct 12, 2017 at 4:23 PM, Andrey Mashenkov 
<andrey.mashen...@gmail.com<mailto:andrey.mashen...@gmail.com>> wrote:
Hi,

There is no way to index such data as is. To index data you need to have 
entry_field<->column mapping configured.
As a workaround here, leaves can be stored in cache as values.

E.g. you can have a separate cache to index leaf nodes, where entries will have 
2 fields: "original tree key" field and "leaf node value" indexed field.
So, you will be able to query serialized tree-like structures via SQL query 
with JOIN condition on  "original tree key" and WHERE condition on "leaf node 
value" field.
Obviously, you will need to implement intermediate logic to keep data of both 
caches consistent.


On Wed, Oct 11, 2017 at 9:40 PM, Andrey Kornev 
<andrewkor...@hotmail.com<mailto:andrewkor...@hotmail.com>> wrote:
Hello,

Consider the following use case: my cache values are a serialized tree-like 
structure (as opposed to a POJO). The leaf nodes of the tree are Java 
primitives. Some of the leaf nodes are used by the queries and should be 
indexed.

What are my options for indexing such data?

Thanks
Andrey



--
Best regards,
Andrey V. Mashenkov



--
Best regards,
Andrey V. Mashenkov



--
Alexey Kuznetsov


Re: Indexing fields of non-POJO cache values

2017-10-12 Thread Alexey Kuznetsov
Just as idea.

What if we can to declare a kind of "references" or "aliases" for fields in
such cases?
And this will help us to avoid duplication of data.

For example in JavaScript I could (almost on the fly) declare getters and
setters that could be as aliases for my data.


On Fri, Oct 13, 2017 at 12:39 AM, Andrey Kornev <andrewkor...@hotmail.com>
wrote:

> Hey Andrey,
>
> Thanks for your reply!
>
> We've been using a slightly different approach, where we extract the
> values of the indexable leaf nodes and store them as individual fields of
> the binary object along with the serialized tree itself. Then we configure
> the cache to use those fields as QueryEntities. It works fine and this way
> we avoid using joins in our queries.
>
> However an obvious drawback of such approach is data duplication. We end
> up with three copies of a field value:
>
> 1) the leaf node of the tree,
> 2) the field of the binary object, and
> 3) Ignite index
>
> I was hoping that there may be a better way to achieve this. In particular
> I'd like to avoid storing the value as a field of a binary object (copy #2).
>
> One possible (and elegant) approach to solving this problem would be to
> introduce a way to specify a method (or a closure) for a QueryEntity in
> addition to currently supported BinaryObject field/POJO attribute.
>
> Regards
> Andrey
>
> --
> *From:* Andrey Mashenkov <andrey.mashen...@gmail.com>
> *Sent:* Thursday, October 12, 2017 6:25 AM
> *To:* user@ignite.apache.org
> *Subject:* Re: Indexing fields of non-POJO cache values
>
> Hi,
>
> Another way here is to implement your own query engine by extending
> IndexingSPI interface, which looks much more complicated.
>
> On Thu, Oct 12, 2017 at 4:23 PM, Andrey Mashenkov <
> andrey.mashen...@gmail.com> wrote:
>
>> Hi,
>>
>> There is no way to index such data as is. To index data you need to have
>> entry_field<->column mapping configured.
>> As a workaround here, leaves can be stored in cache as values.
>>
>> E.g. you can have a separate cache to index leaf nodes, where entries
>> will have 2 fields: "original tree key" field and "leaf node value" indexed
>> field.
>> So, you will be able to query serialized tree-like structures via SQL
>> query with JOIN condition on  "original tree key" and WHERE condition on
>> "leaf node value" field.
>> Obviously, you will need to implement intermediate logic to keep data of
>> both caches consistent.
>>
>>
>> On Wed, Oct 11, 2017 at 9:40 PM, Andrey Kornev <andrewkor...@hotmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> Consider the following use case: my cache values are a
>>> serialized tree-like structure (as opposed to a POJO). The leaf nodes of
>>> the tree are Java primitives. Some of the leaf nodes are used by the
>>> queries and should be indexed.
>>>
>>> What are my options for indexing such data?
>>>
>>> Thanks
>>> Andrey
>>>
>>
>>
>>
>> --
>> Best regards,
>> Andrey V. Mashenkov
>>
>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>



-- 
Alexey Kuznetsov


Re: Indexing fields of non-POJO cache values

2017-10-12 Thread Andrey Kornev
Hey Andrey,

Thanks for your reply!

We've been using a slightly different approach, where we extract the values of 
the indexable leaf nodes and store them as individual fields of the binary 
object along with the serialized tree itself. Then we configure the cache to 
use those fields as QueryEntities. It works fine and this way we avoid using 
joins in our queries.

However an obvious drawback of such approach is data duplication. We end up 
with three copies of a field value:

1) the leaf node of the tree,
2) the field of the binary object, and
3) Ignite index

I was hoping that there may be a better way to achieve this. In particular I'd 
like to avoid storing the value as a field of a binary object (copy #2).

One possible (and elegant) approach to solving this problem would be to 
introduce a way to specify a method (or a closure) for a QueryEntity in 
addition to currently supported BinaryObject field/POJO attribute.

Regards
Andrey


From: Andrey Mashenkov <andrey.mashen...@gmail.com>
Sent: Thursday, October 12, 2017 6:25 AM
To: user@ignite.apache.org
Subject: Re: Indexing fields of non-POJO cache values

Hi,

Another way here is to implement your own query engine by extending IndexingSPI 
interface, which looks much more complicated.

On Thu, Oct 12, 2017 at 4:23 PM, Andrey Mashenkov 
<andrey.mashen...@gmail.com<mailto:andrey.mashen...@gmail.com>> wrote:
Hi,

There is no way to index such data as is. To index data you need to have 
entry_field<->column mapping configured.
As a workaround here, leaves can be stored in cache as values.

E.g. you can have a separate cache to index leaf nodes, where entries will have 
2 fields: "original tree key" field and "leaf node value" indexed field.
So, you will be able to query serialized tree-like structures via SQL query 
with JOIN condition on  "original tree key" and WHERE condition on "leaf node 
value" field.
Obviously, you will need to implement intermediate logic to keep data of both 
caches consistent.


On Wed, Oct 11, 2017 at 9:40 PM, Andrey Kornev 
<andrewkor...@hotmail.com<mailto:andrewkor...@hotmail.com>> wrote:
Hello,

Consider the following use case: my cache values are a serialized tree-like 
structure (as opposed to a POJO). The leaf nodes of the tree are Java 
primitives. Some of the leaf nodes are used by the queries and should be 
indexed.

What are my options for indexing such data?

Thanks
Andrey



--
Best regards,
Andrey V. Mashenkov



--
Best regards,
Andrey V. Mashenkov


Re: Indexing fields of non-POJO cache values

2017-10-12 Thread Andrey Mashenkov
Hi,

There is no way to index such data as is. To index data you need to have
entry_field<->column mapping configured.
As a workaround here, leaves can be stored in cache as values.

E.g. you can have a separate cache to index leaf nodes, where entries will
have 2 fields: "original tree key" field and "leaf node value" indexed
field.
So, you will be able to query serialized tree-like structures via SQL query
with JOIN condition on  "original tree key" and WHERE condition on "leaf
node value" field.
Obviously, you will need to implement intermediate logic to keep data of
both caches consistent.


On Wed, Oct 11, 2017 at 9:40 PM, Andrey Kornev 
wrote:

> Hello,
>
> Consider the following use case: my cache values are a
> serialized tree-like structure (as opposed to a POJO). The leaf nodes of
> the tree are Java primitives. Some of the leaf nodes are used by the
> queries and should be indexed.
>
> What are my options for indexing such data?
>
> Thanks
> Andrey
>



-- 
Best regards,
Andrey V. Mashenkov


Indexing fields of non-POJO cache values

2017-10-11 Thread Andrey Kornev
Hello,

Consider the following use case: my cache values are a serialized tree-like 
structure (as opposed to a POJO). The leaf nodes of the tree are Java 
primitives. Some of the leaf nodes are used by the queries and should be 
indexed.

What are my options for indexing such data?

Thanks
Andrey