Re: path query optimization

2016-11-01 Thread Marko Rodriguez
Hi,

So the one thing I just copied into TINKERPOP-1372 was your 
ImmutablePath.currentLabels = currentLabels (vs. copying the set). You are 
right, step labels are immutable and thus, there is no need to copy the set 
over — a direct reference is sufficient. Regarding the 
AbstractStep.traverserStepIdAndLabelsSetByChild changes — this isn’t so 
important as this is only used in OLAP  and thus, should be negligible relative 
to what else is going on in OLAP. However, to be good, I added the following:
@Override
public Path extend(final Set labels) {
if (labels.isEmpty())
return this;
else {
final Set newLabels = new LinkedHashSet<>();
newLabels.addAll(this.currentLabels);
newLabels.addAll(labels);
return new ImmutablePath(this.previousPath, this.currentObject, 
newLabels);
}
}
Thus, if the labels are empty, don’t clone.

This work along with the other recursion work in TINKERPOP-1372 provided a 
solid speed up. I’ll present benchmark results in the subsequent PR.

Thanks Pieter,
Marko.

http://markorodriguez.com



> On Nov 1, 2016, at 7:43 AM, pieter-gmail  wrote:
> 
> The branch is TINKERPOP-1404
> https://github.com/apache/tinkerpop/commit/c1556fe82c58527dc4425d23d1d69ce324e62cfa
> 
> Cheers
> Pieter
> 
> On 01/11/2016 15:23, Marko Rodriguez wrote:
>> What branch are you in? Perhaps give me URLs (to GitHub) to the files 
>> touched? (if its not too many)
>> 
>> Marko.
>> 
>> http://markorodriguez.com
>> 
>> 
>> 
>>> On Nov 1, 2016, at 7:19 AM, pieter-gmail  wrote:
>>> 
>>> Hi,
>>> 
>>> Yes I am but afraid I do not have time at present to concentrate on it.
>>> I just noticed your ImmutablePath ticket which will overlap with some of
>>> what I have done.
>>> 
>>> I'd suggest to pull my branch and look at what I did there. It was very
>>> little, but dangerous code, which is why I was reluctant to submit a PR
>>> at first. If you don't continue with it, I should in 2 or 3 weeks be
>>> able to look at it again.
>>> 
>>> Thanks
>>> Pieter
>>> 
>>> 
>>> On 01/11/2016 15:11, Marko Rodriguez wrote:
 Hi Pieter,
 
 I’m still really interested in your work in this area. Are you still doing 
 this?
 
 Marko.
 
 http://markorodriguez.com
 
 
 
> On Aug 7, 2016, at 9:12 AM, Pieter Martin  wrote:
> 
> To avoid the collection logic alltogether. For most steps there is no 
> need to
> check the labels as it is known that they are added, immutable and 
> correct.
> 
> Also with the current strategy the `ImmutablePath.currentLabels` is 
> exactly the
> same collection as that of the step. It is not a copy.
> 
> `Traverser.addLabels()` is only called for the steps previously 
> mentioned. They
> too can be optimized as there is no need to create a new `ImmutablePath` 
> just to
> set the labels. There could be a `Traverer.split()` method that creates 
> the
> initial `ImmutablePath` with the correct labels to start with. As things 
> stand
> now the `ImmutablePath` is created just to be replaced 1 or 2 millis 
> later. I
> have however not benchmarked any of those queries so am not touching that 
> at
> the moment.
> 
> In some ways its a matter of style. The label logic in `AbstractStep` is 
> not
> relevant to all of its subclasses and should be higher in the inheritance
> hierarchy.
> 
> Cheers
> Pieter
> 
> 
> Excerpts from Marko Rodriguez's message of August 7, 2016 3:54 :
>> Why not just check to see if the labels to be added already exist, if 
>> they do, don’t addLabels() and thus, don’t create a new collection.
>> Marko.
>> http://markorodriguez.com
>>> On Aug 7, 2016, at 6:07 AM, Pieter Martin  
>>> wrote:
>>> Here is what I have come up with so far.
>>> The idea is that `Traverser.split(r, step)` already copies the labels 
>>> to the
>>> traverser so there is no need to call `Traverser.addLabels(labels)` 
>>> again.
>>> I removed the `Traverser.addLabels(labels)` call from `AbstractStep`.
>>> For the traversers that do not call `Traverer.split(r, step)` I 
>>> manually added
>>> the `traverser.addLabels(labels)` call in `processNextStart()`. This 
>>> was done
>>> by fixing test failures rather than searching and investigating all 
>>> calls to
>>> `Traverser.split()`.
>>> The following steps needed the labels to be added manually.
>>> `AggregateStep`
>>> `CollectingBarrierStep`
>>> `FilterStep`
>>> `SideEffectStep`
>>> `StartStep`
>>> `NoOpBarrierStep`
>>> Further seeing as `Step.getLabels()` already returns a unmodifiable 
>>> collection and
>>> `ImmutablePath` is well immutable there is no need for it to have its 
>>> own copy
>>> of the labels. I set the labels directly on the path as oppose to 
>>> making a copy.
>>> `TinkerGraphProcessComputerT

Re: path query optimization

2016-11-01 Thread pieter-gmail
The branch is TINKERPOP-1404
https://github.com/apache/tinkerpop/commit/c1556fe82c58527dc4425d23d1d69ce324e62cfa

Cheers
Pieter

On 01/11/2016 15:23, Marko Rodriguez wrote:
> What branch are you in? Perhaps give me URLs (to GitHub) to the files 
> touched? (if its not too many)
>
> Marko.
>
> http://markorodriguez.com
>
>
>
>> On Nov 1, 2016, at 7:19 AM, pieter-gmail  wrote:
>>
>> Hi,
>>
>> Yes I am but afraid I do not have time at present to concentrate on it.
>> I just noticed your ImmutablePath ticket which will overlap with some of
>> what I have done.
>>
>> I'd suggest to pull my branch and look at what I did there. It was very
>> little, but dangerous code, which is why I was reluctant to submit a PR
>> at first. If you don't continue with it, I should in 2 or 3 weeks be
>> able to look at it again.
>>
>> Thanks
>> Pieter
>>
>>
>> On 01/11/2016 15:11, Marko Rodriguez wrote:
>>> Hi Pieter,
>>>
>>> I’m still really interested in your work in this area. Are you still doing 
>>> this?
>>>
>>> Marko.
>>>
>>> http://markorodriguez.com
>>>
>>>
>>>
 On Aug 7, 2016, at 9:12 AM, Pieter Martin  wrote:

 To avoid the collection logic alltogether. For most steps there is no need 
 to
 check the labels as it is known that they are added, immutable and correct.

 Also with the current strategy the `ImmutablePath.currentLabels` is 
 exactly the
 same collection as that of the step. It is not a copy.

 `Traverser.addLabels()` is only called for the steps previously mentioned. 
 They
 too can be optimized as there is no need to create a new `ImmutablePath` 
 just to
 set the labels. There could be a `Traverer.split()` method that creates the
 initial `ImmutablePath` with the correct labels to start with. As things 
 stand
 now the `ImmutablePath` is created just to be replaced 1 or 2 millis 
 later. I
 have however not benchmarked any of those queries so am not touching that 
 at
 the moment.

 In some ways its a matter of style. The label logic in `AbstractStep` is 
 not
 relevant to all of its subclasses and should be higher in the inheritance
 hierarchy.

 Cheers
 Pieter


 Excerpts from Marko Rodriguez's message of August 7, 2016 3:54 :
> Why not just check to see if the labels to be added already exist, if 
> they do, don’t addLabels() and thus, don’t create a new collection.
> Marko.
> http://markorodriguez.com
>> On Aug 7, 2016, at 6:07 AM, Pieter Martin  
>> wrote:
>> Here is what I have come up with so far.
>> The idea is that `Traverser.split(r, step)` already copies the labels to 
>> the
>> traverser so there is no need to call `Traverser.addLabels(labels)` 
>> again.
>> I removed the `Traverser.addLabels(labels)` call from `AbstractStep`.
>> For the traversers that do not call `Traverer.split(r, step)` I manually 
>> added
>> the `traverser.addLabels(labels)` call in `processNextStart()`. This was 
>> done
>> by fixing test failures rather than searching and investigating all 
>> calls to
>> `Traverser.split()`.
>> The following steps needed the labels to be added manually.
>> `AggregateStep`
>> `CollectingBarrierStep`
>> `FilterStep`
>> `SideEffectStep`
>> `StartStep`
>> `NoOpBarrierStep`
>> Further seeing as `Step.getLabels()` already returns a unmodifiable 
>> collection and
>> `ImmutablePath` is well immutable there is no need for it to have its 
>> own copy
>> of the labels. I set the labels directly on the path as oppose to making 
>> a copy.
>> `TinkerGraphProcessComputerTest` passes.
>> `TinkerGraphProcessStandardTest` passes.
>> `TinkerGraphStructureStandardTest` has one failure.
>> `SerializationTest$GryoTest.shouldSerializePathAsDetached` fails with
>> `java.lang.IllegalArgumentException: Class is not registered:
>> java.util.Collections$UnmodifiableSet`
>> Let me know what you think.
>> Thanks
>> Pieter
>> Excerpts from Ted Wilmes's message of August 7, 2016 12:16 :
>>> Neat find Pieter.  With regards to the update to ImmutablePath, I think
>>> that defensive copying of inbound collections is generally a good idea 
>>> but
>>> if we can target specific areas where we can reap big gains from not
>>> creating new collections it may be worth it to relax that constraint,
>>> especially if we're only talking about internal APIs.  If we do relax 
>>> that
>>> constraint though, we'll need to careful during code review and unit
>>> testing because those bugs can be difficult to track down.  The other 
>>> nice
>>> thing that the defensive copy gets us, particularly in the case of the
>>> ImmutablePath, is that it isolates ImmutablePath from whatever the 
>>> subclass
>>> of set was that the caller passed in.  I think that's what is causing 
>>> the

Re: path query optimization

2016-11-01 Thread Marko Rodriguez
What branch are you in? Perhaps give me URLs (to GitHub) to the files touched? 
(if its not too many)

Marko.

http://markorodriguez.com



> On Nov 1, 2016, at 7:19 AM, pieter-gmail  wrote:
> 
> Hi,
> 
> Yes I am but afraid I do not have time at present to concentrate on it.
> I just noticed your ImmutablePath ticket which will overlap with some of
> what I have done.
> 
> I'd suggest to pull my branch and look at what I did there. It was very
> little, but dangerous code, which is why I was reluctant to submit a PR
> at first. If you don't continue with it, I should in 2 or 3 weeks be
> able to look at it again.
> 
> Thanks
> Pieter
> 
> 
> On 01/11/2016 15:11, Marko Rodriguez wrote:
>> Hi Pieter,
>> 
>> I’m still really interested in your work in this area. Are you still doing 
>> this?
>> 
>> Marko.
>> 
>> http://markorodriguez.com
>> 
>> 
>> 
>>> On Aug 7, 2016, at 9:12 AM, Pieter Martin  wrote:
>>> 
>>> To avoid the collection logic alltogether. For most steps there is no need 
>>> to
>>> check the labels as it is known that they are added, immutable and correct.
>>> 
>>> Also with the current strategy the `ImmutablePath.currentLabels` is exactly 
>>> the
>>> same collection as that of the step. It is not a copy.
>>> 
>>> `Traverser.addLabels()` is only called for the steps previously mentioned. 
>>> They
>>> too can be optimized as there is no need to create a new `ImmutablePath` 
>>> just to
>>> set the labels. There could be a `Traverer.split()` method that creates the
>>> initial `ImmutablePath` with the correct labels to start with. As things 
>>> stand
>>> now the `ImmutablePath` is created just to be replaced 1 or 2 millis later. 
>>> I
>>> have however not benchmarked any of those queries so am not touching that at
>>> the moment.
>>> 
>>> In some ways its a matter of style. The label logic in `AbstractStep` is not
>>> relevant to all of its subclasses and should be higher in the inheritance
>>> hierarchy.
>>> 
>>> Cheers
>>> Pieter
>>> 
>>> 
>>> Excerpts from Marko Rodriguez's message of August 7, 2016 3:54 :
 Why not just check to see if the labels to be added already exist, if they 
 do, don’t addLabels() and thus, don’t create a new collection.
 Marko.
 http://markorodriguez.com
> On Aug 7, 2016, at 6:07 AM, Pieter Martin  wrote:
> Here is what I have come up with so far.
> The idea is that `Traverser.split(r, step)` already copies the labels to 
> the
> traverser so there is no need to call `Traverser.addLabels(labels)` again.
> I removed the `Traverser.addLabels(labels)` call from `AbstractStep`.
> For the traversers that do not call `Traverer.split(r, step)` I manually 
> added
> the `traverser.addLabels(labels)` call in `processNextStart()`. This was 
> done
> by fixing test failures rather than searching and investigating all calls 
> to
> `Traverser.split()`.
> The following steps needed the labels to be added manually.
> `AggregateStep`
> `CollectingBarrierStep`
> `FilterStep`
> `SideEffectStep`
> `StartStep`
> `NoOpBarrierStep`
> Further seeing as `Step.getLabels()` already returns a unmodifiable 
> collection and
> `ImmutablePath` is well immutable there is no need for it to have its own 
> copy
> of the labels. I set the labels directly on the path as oppose to making 
> a copy.
> `TinkerGraphProcessComputerTest` passes.
> `TinkerGraphProcessStandardTest` passes.
> `TinkerGraphStructureStandardTest` has one failure.
> `SerializationTest$GryoTest.shouldSerializePathAsDetached` fails with
> `java.lang.IllegalArgumentException: Class is not registered:
> java.util.Collections$UnmodifiableSet`
> Let me know what you think.
> Thanks
> Pieter
> Excerpts from Ted Wilmes's message of August 7, 2016 12:16 :
>> Neat find Pieter.  With regards to the update to ImmutablePath, I think
>> that defensive copying of inbound collections is generally a good idea 
>> but
>> if we can target specific areas where we can reap big gains from not
>> creating new collections it may be worth it to relax that constraint,
>> especially if we're only talking about internal APIs.  If we do relax 
>> that
>> constraint though, we'll need to careful during code review and unit
>> testing because those bugs can be difficult to track down.  The other 
>> nice
>> thing that the defensive copy gets us, particularly in the case of the
>> ImmutablePath, is that it isolates ImmutablePath from whatever the 
>> subclass
>> of set was that the caller passed in.  I think that's what is causing the
>> serialization test failure in this case since the caller passed in an
>> unmodifiable set.
>> --Ted
>> On Fri, Aug 5, 2016, 2:31 PM Marko Rodriguez  
>> wrote:
>>> Hello,
>>> This is cool. Check out also ImmutablePath.extend(labels) as that is
>>> ultimately what Traverser.addLabel

Re: path query optimization

2016-11-01 Thread pieter-gmail
Hi,

Yes I am but afraid I do not have time at present to concentrate on it.
I just noticed your ImmutablePath ticket which will overlap with some of
what I have done.

I'd suggest to pull my branch and look at what I did there. It was very
little, but dangerous code, which is why I was reluctant to submit a PR
at first. If you don't continue with it, I should in 2 or 3 weeks be
able to look at it again.

Thanks
Pieter


On 01/11/2016 15:11, Marko Rodriguez wrote:
> Hi Pieter,
>
> I’m still really interested in your work in this area. Are you still doing 
> this?
>
> Marko.
>
> http://markorodriguez.com
>
>
>
>> On Aug 7, 2016, at 9:12 AM, Pieter Martin  wrote:
>>
>> To avoid the collection logic alltogether. For most steps there is no need to
>> check the labels as it is known that they are added, immutable and correct.
>>
>> Also with the current strategy the `ImmutablePath.currentLabels` is exactly 
>> the
>> same collection as that of the step. It is not a copy.
>>
>> `Traverser.addLabels()` is only called for the steps previously mentioned. 
>> They
>> too can be optimized as there is no need to create a new `ImmutablePath` 
>> just to
>> set the labels. There could be a `Traverer.split()` method that creates the
>> initial `ImmutablePath` with the correct labels to start with. As things 
>> stand
>> now the `ImmutablePath` is created just to be replaced 1 or 2 millis later. I
>> have however not benchmarked any of those queries so am not touching that at
>> the moment.
>>
>> In some ways its a matter of style. The label logic in `AbstractStep` is not
>> relevant to all of its subclasses and should be higher in the inheritance
>> hierarchy.
>>
>> Cheers
>> Pieter
>>
>>
>> Excerpts from Marko Rodriguez's message of August 7, 2016 3:54 :
>>> Why not just check to see if the labels to be added already exist, if they 
>>> do, don’t addLabels() and thus, don’t create a new collection.
>>> Marko.
>>> http://markorodriguez.com
 On Aug 7, 2016, at 6:07 AM, Pieter Martin  wrote:
 Here is what I have come up with so far.
 The idea is that `Traverser.split(r, step)` already copies the labels to 
 the
 traverser so there is no need to call `Traverser.addLabels(labels)` again.
 I removed the `Traverser.addLabels(labels)` call from `AbstractStep`.
 For the traversers that do not call `Traverer.split(r, step)` I manually 
 added
 the `traverser.addLabels(labels)` call in `processNextStart()`. This was 
 done
 by fixing test failures rather than searching and investigating all calls 
 to
 `Traverser.split()`.
 The following steps needed the labels to be added manually.
 `AggregateStep`
 `CollectingBarrierStep`
 `FilterStep`
 `SideEffectStep`
 `StartStep`
 `NoOpBarrierStep`
 Further seeing as `Step.getLabels()` already returns a unmodifiable 
 collection and
 `ImmutablePath` is well immutable there is no need for it to have its own 
 copy
 of the labels. I set the labels directly on the path as oppose to making a 
 copy.
 `TinkerGraphProcessComputerTest` passes.
 `TinkerGraphProcessStandardTest` passes.
 `TinkerGraphStructureStandardTest` has one failure.
 `SerializationTest$GryoTest.shouldSerializePathAsDetached` fails with
 `java.lang.IllegalArgumentException: Class is not registered:
 java.util.Collections$UnmodifiableSet`
 Let me know what you think.
 Thanks
 Pieter
 Excerpts from Ted Wilmes's message of August 7, 2016 12:16 :
> Neat find Pieter.  With regards to the update to ImmutablePath, I think
> that defensive copying of inbound collections is generally a good idea but
> if we can target specific areas where we can reap big gains from not
> creating new collections it may be worth it to relax that constraint,
> especially if we're only talking about internal APIs.  If we do relax that
> constraint though, we'll need to careful during code review and unit
> testing because those bugs can be difficult to track down.  The other nice
> thing that the defensive copy gets us, particularly in the case of the
> ImmutablePath, is that it isolates ImmutablePath from whatever the 
> subclass
> of set was that the caller passed in.  I think that's what is causing the
> serialization test failure in this case since the caller passed in an
> unmodifiable set.
> --Ted
> On Fri, Aug 5, 2016, 2:31 PM Marko Rodriguez  wrote:
>> Hello,
>> This is cool. Check out also ImmutablePath.extend(labels) as that is
>> ultimately what Traverser.addLabels() calls. We have a lot of set copying
>> and I don’t know if its needed (as you seem to be demonstrating). What I
>> don’t like about your solution is the explicit reference to the
>> B_L_P…Traverser in AbstractStep. See if you can work your solution 
>> without
>> it.
>> Good luck,
>> Marko.
>> http://markorodriguez.com
>>> On Aug 5, 2016

Re: path query optimization

2016-11-01 Thread Marko Rodriguez
Hi Pieter,

I’m still really interested in your work in this area. Are you still doing this?

Marko.

http://markorodriguez.com



> On Aug 7, 2016, at 9:12 AM, Pieter Martin  wrote:
> 
> To avoid the collection logic alltogether. For most steps there is no need to
> check the labels as it is known that they are added, immutable and correct.
> 
> Also with the current strategy the `ImmutablePath.currentLabels` is exactly 
> the
> same collection as that of the step. It is not a copy.
> 
> `Traverser.addLabels()` is only called for the steps previously mentioned. 
> They
> too can be optimized as there is no need to create a new `ImmutablePath` just 
> to
> set the labels. There could be a `Traverer.split()` method that creates the
> initial `ImmutablePath` with the correct labels to start with. As things stand
> now the `ImmutablePath` is created just to be replaced 1 or 2 millis later. I
> have however not benchmarked any of those queries so am not touching that at
> the moment.
> 
> In some ways its a matter of style. The label logic in `AbstractStep` is not
> relevant to all of its subclasses and should be higher in the inheritance
> hierarchy.
> 
> Cheers
> Pieter
> 
> 
> Excerpts from Marko Rodriguez's message of August 7, 2016 3:54 :
>> Why not just check to see if the labels to be added already exist, if they 
>> do, don’t addLabels() and thus, don’t create a new collection.
>> Marko.
>> http://markorodriguez.com
>>> On Aug 7, 2016, at 6:07 AM, Pieter Martin  wrote:
>>> Here is what I have come up with so far.
>>> The idea is that `Traverser.split(r, step)` already copies the labels to the
>>> traverser so there is no need to call `Traverser.addLabels(labels)` again.
>>> I removed the `Traverser.addLabels(labels)` call from `AbstractStep`.
>>> For the traversers that do not call `Traverer.split(r, step)` I manually 
>>> added
>>> the `traverser.addLabels(labels)` call in `processNextStart()`. This was 
>>> done
>>> by fixing test failures rather than searching and investigating all calls to
>>> `Traverser.split()`.
>>> The following steps needed the labels to be added manually.
>>> `AggregateStep`
>>> `CollectingBarrierStep`
>>> `FilterStep`
>>> `SideEffectStep`
>>> `StartStep`
>>> `NoOpBarrierStep`
>>> Further seeing as `Step.getLabels()` already returns a unmodifiable 
>>> collection and
>>> `ImmutablePath` is well immutable there is no need for it to have its own 
>>> copy
>>> of the labels. I set the labels directly on the path as oppose to making a 
>>> copy.
>>> `TinkerGraphProcessComputerTest` passes.
>>> `TinkerGraphProcessStandardTest` passes.
>>> `TinkerGraphStructureStandardTest` has one failure.
>>> `SerializationTest$GryoTest.shouldSerializePathAsDetached` fails with
>>> `java.lang.IllegalArgumentException: Class is not registered:
>>> java.util.Collections$UnmodifiableSet`
>>> Let me know what you think.
>>> Thanks
>>> Pieter
>>> Excerpts from Ted Wilmes's message of August 7, 2016 12:16 :
 Neat find Pieter.  With regards to the update to ImmutablePath, I think
 that defensive copying of inbound collections is generally a good idea but
 if we can target specific areas where we can reap big gains from not
 creating new collections it may be worth it to relax that constraint,
 especially if we're only talking about internal APIs.  If we do relax that
 constraint though, we'll need to careful during code review and unit
 testing because those bugs can be difficult to track down.  The other nice
 thing that the defensive copy gets us, particularly in the case of the
 ImmutablePath, is that it isolates ImmutablePath from whatever the subclass
 of set was that the caller passed in.  I think that's what is causing the
 serialization test failure in this case since the caller passed in an
 unmodifiable set.
 --Ted
 On Fri, Aug 5, 2016, 2:31 PM Marko Rodriguez  wrote:
> Hello,
> This is cool. Check out also ImmutablePath.extend(labels) as that is
> ultimately what Traverser.addLabels() calls. We have a lot of set copying
> and I don’t know if its needed (as you seem to be demonstrating). What I
> don’t like about your solution is the explicit reference to the
> B_L_P…Traverser in AbstractStep. See if you can work your solution without
> it.
> Good luck,
> Marko.
> http://markorodriguez.com
> > On Aug 5, 2016, at 12:44 PM, pieter-gmail 
> wrote:
> >
> > Sorry forgot to add a rather important part.
> >
> > I changed ImmutablePath's constructor to
> >
> >private ImmutablePath(final ImmutablePathImpl previousPath, final
> > Object currentObject, final Set currentLabels) {
> >this.previousPath = previousPath;
> >this.currentObject = currentObject;
> >this.currentLabels = currentLabels;
> > //this.currentLabels.addAll(currentLabels);
> >}
> >
> > Setting the collection directly as oppose to `addAll`

Re: path query optimization

2016-08-07 Thread Pieter Martin

To avoid the collection logic alltogether. For most steps there is no need to
check the labels as it is known that they are added, immutable and correct.

Also with the current strategy the `ImmutablePath.currentLabels` is exactly the
same collection as that of the step. It is not a copy.

`Traverser.addLabels()` is only called for the steps previously mentioned. They
too can be optimized as there is no need to create a new `ImmutablePath` just to
set the labels. There could be a `Traverer.split()` method that creates the
initial `ImmutablePath` with the correct labels to start with. As things stand
now the `ImmutablePath` is created just to be replaced 1 or 2 millis later. I
have however not benchmarked any of those queries so am not touching that at
the moment.

In some ways its a matter of style. The label logic in `AbstractStep` is not
relevant to all of its subclasses and should be higher in the inheritance
hierarchy.

Cheers
Pieter


Excerpts from Marko Rodriguez's message of August 7, 2016 3:54 :

Why not just check to see if the labels to be added already exist, if they do, 
don’t addLabels() and thus, don’t create a new collection.

Marko.

http://markorodriguez.com




On Aug 7, 2016, at 6:07 AM, Pieter Martin  wrote:

Here is what I have come up with so far.

The idea is that `Traverser.split(r, step)` already copies the labels to the
traverser so there is no need to call `Traverser.addLabels(labels)` again.

I removed the `Traverser.addLabels(labels)` call from `AbstractStep`.
For the traversers that do not call `Traverer.split(r, step)` I manually added
the `traverser.addLabels(labels)` call in `processNextStart()`. This was done
by fixing test failures rather than searching and investigating all calls to
`Traverser.split()`.

The following steps needed the labels to be added manually.

`AggregateStep`
`CollectingBarrierStep`
`FilterStep`
`SideEffectStep`
`StartStep`
`NoOpBarrierStep`

Further seeing as `Step.getLabels()` already returns a unmodifiable collection 
and
`ImmutablePath` is well immutable there is no need for it to have its own copy
of the labels. I set the labels directly on the path as oppose to making a copy.

`TinkerGraphProcessComputerTest` passes.
`TinkerGraphProcessStandardTest` passes.
`TinkerGraphStructureStandardTest` has one failure.
`SerializationTest$GryoTest.shouldSerializePathAsDetached` fails with
`java.lang.IllegalArgumentException: Class is not registered:
java.util.Collections$UnmodifiableSet`

Let me know what you think.

Thanks
Pieter


Excerpts from Ted Wilmes's message of August 7, 2016 12:16 :

Neat find Pieter.  With regards to the update to ImmutablePath, I think
that defensive copying of inbound collections is generally a good idea but
if we can target specific areas where we can reap big gains from not
creating new collections it may be worth it to relax that constraint,
especially if we're only talking about internal APIs.  If we do relax that
constraint though, we'll need to careful during code review and unit
testing because those bugs can be difficult to track down.  The other nice
thing that the defensive copy gets us, particularly in the case of the
ImmutablePath, is that it isolates ImmutablePath from whatever the subclass
of set was that the caller passed in.  I think that's what is causing the
serialization test failure in this case since the caller passed in an
unmodifiable set.
--Ted
On Fri, Aug 5, 2016, 2:31 PM Marko Rodriguez  wrote:

Hello,

This is cool. Check out also ImmutablePath.extend(labels) as that is
ultimately what Traverser.addLabels() calls. We have a lot of set copying
and I don’t know if its needed (as you seem to be demonstrating). What I
don’t like about your solution is the explicit reference to the
B_L_P…Traverser in AbstractStep. See if you can work your solution without
it.

Good luck,
Marko.

http://markorodriguez.com



> On Aug 5, 2016, at 12:44 PM, pieter-gmail 
wrote:
>
> Sorry forgot to add a rather important part.
>
> I changed ImmutablePath's constructor to
>
>private ImmutablePath(final ImmutablePathImpl previousPath, final
> Object currentObject, final Set currentLabels) {
>this.previousPath = previousPath;
>this.currentObject = currentObject;
>this.currentLabels = currentLabels;
> //this.currentLabels.addAll(currentLabels);
>}
>
> Setting the collection directly as oppose to `addAll`
>
> Thanks
> Pieter
>
>
> On 05/08/2016 20:40, pieter-gmail wrote:
>> Hi,
>>
>> I have been optimizing Sqlg of late and eventually arrived at TinkerPop
>> code.
>>
>> The gremlin in particular that I am interested is path queries.
>>
>> Here is the test that I am running in jmh.
>>
>>//@Setup
>>Vertex a = graph.addVertex(T.label, "A", "name", "a1");
>>for (int i = 1; i < 1_000_001; i++) {
>>Vertex b = graph.addVertex(T.label, "B", "name", "name_" +
i);
>>a.addEdge("outB", b);
>>for (int j = 0; j < 1; j++) {
>>Verte

Re: path query optimization

2016-08-07 Thread Marko Rodriguez
Why not just check to see if the labels to be added already exist, if they do, 
don’t addLabels() and thus, don’t create a new collection.

Marko.

http://markorodriguez.com



> On Aug 7, 2016, at 6:07 AM, Pieter Martin  wrote:
> 
> Here is what I have come up with so far.
> 
> The idea is that `Traverser.split(r, step)` already copies the labels to the
> traverser so there is no need to call `Traverser.addLabels(labels)` again.
> 
> I removed the `Traverser.addLabels(labels)` call from `AbstractStep`.
> For the traversers that do not call `Traverer.split(r, step)` I manually added
> the `traverser.addLabels(labels)` call in `processNextStart()`. This was done
> by fixing test failures rather than searching and investigating all calls to
> `Traverser.split()`.
> 
> The following steps needed the labels to be added manually.
> 
> `AggregateStep`
> `CollectingBarrierStep`
> `FilterStep`
> `SideEffectStep`
> `StartStep`
> `NoOpBarrierStep`
> 
> Further seeing as `Step.getLabels()` already returns a unmodifiable 
> collection and
> `ImmutablePath` is well immutable there is no need for it to have its own copy
> of the labels. I set the labels directly on the path as oppose to making a 
> copy.
> 
> `TinkerGraphProcessComputerTest` passes.
> `TinkerGraphProcessStandardTest` passes.
> `TinkerGraphStructureStandardTest` has one failure.
> `SerializationTest$GryoTest.shouldSerializePathAsDetached` fails with
> `java.lang.IllegalArgumentException: Class is not registered:
> java.util.Collections$UnmodifiableSet`
> 
> Let me know what you think.
> 
> Thanks
> Pieter
> 
> 
> Excerpts from Ted Wilmes's message of August 7, 2016 12:16 :
>> Neat find Pieter.  With regards to the update to ImmutablePath, I think
>> that defensive copying of inbound collections is generally a good idea but
>> if we can target specific areas where we can reap big gains from not
>> creating new collections it may be worth it to relax that constraint,
>> especially if we're only talking about internal APIs.  If we do relax that
>> constraint though, we'll need to careful during code review and unit
>> testing because those bugs can be difficult to track down.  The other nice
>> thing that the defensive copy gets us, particularly in the case of the
>> ImmutablePath, is that it isolates ImmutablePath from whatever the subclass
>> of set was that the caller passed in.  I think that's what is causing the
>> serialization test failure in this case since the caller passed in an
>> unmodifiable set.
>> --Ted
>> On Fri, Aug 5, 2016, 2:31 PM Marko Rodriguez  wrote:
>>> Hello,
>>> 
>>> This is cool. Check out also ImmutablePath.extend(labels) as that is
>>> ultimately what Traverser.addLabels() calls. We have a lot of set copying
>>> and I don’t know if its needed (as you seem to be demonstrating). What I
>>> don’t like about your solution is the explicit reference to the
>>> B_L_P…Traverser in AbstractStep. See if you can work your solution without
>>> it.
>>> 
>>> Good luck,
>>> Marko.
>>> 
>>> http://markorodriguez.com
>>> 
>>> 
>>> 
>>> > On Aug 5, 2016, at 12:44 PM, pieter-gmail 
>>> wrote:
>>> >
>>> > Sorry forgot to add a rather important part.
>>> >
>>> > I changed ImmutablePath's constructor to
>>> >
>>> >private ImmutablePath(final ImmutablePathImpl previousPath, final
>>> > Object currentObject, final Set currentLabels) {
>>> >this.previousPath = previousPath;
>>> >this.currentObject = currentObject;
>>> >this.currentLabels = currentLabels;
>>> > //this.currentLabels.addAll(currentLabels);
>>> >}
>>> >
>>> > Setting the collection directly as oppose to `addAll`
>>> >
>>> > Thanks
>>> > Pieter
>>> >
>>> >
>>> > On 05/08/2016 20:40, pieter-gmail wrote:
>>> >> Hi,
>>> >>
>>> >> I have been optimizing Sqlg of late and eventually arrived at TinkerPop
>>> >> code.
>>> >>
>>> >> The gremlin in particular that I am interested is path queries.
>>> >>
>>> >> Here is the test that I am running in jmh.
>>> >>
>>> >>//@Setup
>>> >>Vertex a = graph.addVertex(T.label, "A", "name", "a1");
>>> >>for (int i = 1; i < 1_000_001; i++) {
>>> >>Vertex b = graph.addVertex(T.label, "B", "name", "name_" +
>>> i);
>>> >>a.addEdge("outB", b);
>>> >>for (int j = 0; j < 1; j++) {
>>> >>Vertex c = graph.addVertex(T.label, "C", "name", "name_"
>>> >> + i + " " + j);
>>> >>b.addEdge("outC", c);
>>> >>}
>>> >>}
>>> >>
>>> >> And the query being benchmarked is
>>> >>
>>> >>GraphTraversal traversal =
>>> >> g.V(a).as("a").out().as("b").out().as("c").path();
>>> >>while (traversal.hasNext()) {
>>> >>Path path = traversal.next();
>>> >>}
>>> >>
>>> >> Before the optimization, (as things are now)
>>> >>
>>> >> Benchmark Mode Cnt  Score Error
>>> Units
>>> >> GremlinPathBenchmark.g_path  avgt  100  1.086 ± 0.020   s/op
>>> >>
>>> >> The optimization I did is in A

Re: path query optimization

2016-08-07 Thread Pieter Martin

Here is what I have come up with so far.

The idea is that `Traverser.split(r, step)` already copies the labels to the
traverser so there is no need to call `Traverser.addLabels(labels)` again.

I removed the `Traverser.addLabels(labels)` call from `AbstractStep`.
For the traversers that do not call `Traverer.split(r, step)` I manually added
the `traverser.addLabels(labels)` call in `processNextStart()`. This was done
by fixing test failures rather than searching and investigating all calls to
`Traverser.split()`.

The following steps needed the labels to be added manually.

`AggregateStep`
`CollectingBarrierStep`
`FilterStep`
`SideEffectStep`
`StartStep`
`NoOpBarrierStep`

Further seeing as `Step.getLabels()` already returns a unmodifiable collection 
and
`ImmutablePath` is well immutable there is no need for it to have its own copy
of the labels. I set the labels directly on the path as oppose to making a copy.

`TinkerGraphProcessComputerTest` passes.
`TinkerGraphProcessStandardTest` passes.
`TinkerGraphStructureStandardTest` has one failure.
`SerializationTest$GryoTest.shouldSerializePathAsDetached` fails with
`java.lang.IllegalArgumentException: Class is not registered:
java.util.Collections$UnmodifiableSet`

Let me know what you think.

Thanks
Pieter


Excerpts from Ted Wilmes's message of August 7, 2016 12:16 :

Neat find Pieter.  With regards to the update to ImmutablePath, I think
that defensive copying of inbound collections is generally a good idea but
if we can target specific areas where we can reap big gains from not
creating new collections it may be worth it to relax that constraint,
especially if we're only talking about internal APIs.  If we do relax that
constraint though, we'll need to careful during code review and unit
testing because those bugs can be difficult to track down.  The other nice
thing that the defensive copy gets us, particularly in the case of the
ImmutablePath, is that it isolates ImmutablePath from whatever the subclass
of set was that the caller passed in.  I think that's what is causing the
serialization test failure in this case since the caller passed in an
unmodifiable set.

--Ted

On Fri, Aug 5, 2016, 2:31 PM Marko Rodriguez  wrote:


Hello,

This is cool. Check out also ImmutablePath.extend(labels) as that is
ultimately what Traverser.addLabels() calls. We have a lot of set copying
and I don’t know if its needed (as you seem to be demonstrating). What I
don’t like about your solution is the explicit reference to the
B_L_P…Traverser in AbstractStep. See if you can work your solution without
it.

Good luck,
Marko.

http://markorodriguez.com



> On Aug 5, 2016, at 12:44 PM, pieter-gmail 
wrote:
>
> Sorry forgot to add a rather important part.
>
> I changed ImmutablePath's constructor to
>
>private ImmutablePath(final ImmutablePathImpl previousPath, final
> Object currentObject, final Set currentLabels) {
>this.previousPath = previousPath;
>this.currentObject = currentObject;
>this.currentLabels = currentLabels;
> //this.currentLabels.addAll(currentLabels);
>}
>
> Setting the collection directly as oppose to `addAll`
>
> Thanks
> Pieter
>
>
> On 05/08/2016 20:40, pieter-gmail wrote:
>> Hi,
>>
>> I have been optimizing Sqlg of late and eventually arrived at TinkerPop
>> code.
>>
>> The gremlin in particular that I am interested is path queries.
>>
>> Here is the test that I am running in jmh.
>>
>>//@Setup
>>Vertex a = graph.addVertex(T.label, "A", "name", "a1");
>>for (int i = 1; i < 1_000_001; i++) {
>>Vertex b = graph.addVertex(T.label, "B", "name", "name_" +
i);
>>a.addEdge("outB", b);
>>for (int j = 0; j < 1; j++) {
>>Vertex c = graph.addVertex(T.label, "C", "name", "name_"
>> + i + " " + j);
>>b.addEdge("outC", c);
>>}
>>}
>>
>> And the query being benchmarked is
>>
>>GraphTraversal traversal =
>> g.V(a).as("a").out().as("b").out().as("c").path();
>>while (traversal.hasNext()) {
>>Path path = traversal.next();
>>}
>>
>> Before the optimization, (as things are now)
>>
>> Benchmark Mode Cnt  Score Error
 Units
>> GremlinPathBenchmark.g_path  avgt  100  1.086 ± 0.020   s/op
>>
>> The optimization I did is in AbstractStep.prepareTraversalForNextStep,
>> to not call addLabels() for path gremlins as the labels are known by the
>> step and do not change again so there is not need to keep adding them.
>>
>>private final Traverser.Admin prepareTraversalForNextStep(final
>> Traverser.Admin traverser) {
>>if (!this.traverserStepIdAndLabelsSetByChild) {
>>traverser.setStepId(this.nextStep.getId());
>>if (traverser instanceof B_LP_O_P_S_SE_SL_Traverser) {
>>} else {
>>traverser.addLabels(this.labels);
>>}
>>}
>>return traverser;
>>}
>>
>> After optimization,
>>

Re: path query optimization

2016-08-06 Thread Ted Wilmes
Neat find Pieter.  With regards to the update to ImmutablePath, I think
that defensive copying of inbound collections is generally a good idea but
if we can target specific areas where we can reap big gains from not
creating new collections it may be worth it to relax that constraint,
especially if we're only talking about internal APIs.  If we do relax that
constraint though, we'll need to careful during code review and unit
testing because those bugs can be difficult to track down.  The other nice
thing that the defensive copy gets us, particularly in the case of the
ImmutablePath, is that it isolates ImmutablePath from whatever the subclass
of set was that the caller passed in.  I think that's what is causing the
serialization test failure in this case since the caller passed in an
unmodifiable set.

--Ted

On Fri, Aug 5, 2016, 2:31 PM Marko Rodriguez  wrote:

> Hello,
>
> This is cool. Check out also ImmutablePath.extend(labels) as that is
> ultimately what Traverser.addLabels() calls. We have a lot of set copying
> and I don’t know if its needed (as you seem to be demonstrating). What I
> don’t like about your solution is the explicit reference to the
> B_L_P…Traverser in AbstractStep. See if you can work your solution without
> it.
>
> Good luck,
> Marko.
>
> http://markorodriguez.com
>
>
>
> > On Aug 5, 2016, at 12:44 PM, pieter-gmail 
> wrote:
> >
> > Sorry forgot to add a rather important part.
> >
> > I changed ImmutablePath's constructor to
> >
> >private ImmutablePath(final ImmutablePathImpl previousPath, final
> > Object currentObject, final Set currentLabels) {
> >this.previousPath = previousPath;
> >this.currentObject = currentObject;
> >this.currentLabels = currentLabels;
> > //this.currentLabels.addAll(currentLabels);
> >}
> >
> > Setting the collection directly as oppose to `addAll`
> >
> > Thanks
> > Pieter
> >
> >
> > On 05/08/2016 20:40, pieter-gmail wrote:
> >> Hi,
> >>
> >> I have been optimizing Sqlg of late and eventually arrived at TinkerPop
> >> code.
> >>
> >> The gremlin in particular that I am interested is path queries.
> >>
> >> Here is the test that I am running in jmh.
> >>
> >>//@Setup
> >>Vertex a = graph.addVertex(T.label, "A", "name", "a1");
> >>for (int i = 1; i < 1_000_001; i++) {
> >>Vertex b = graph.addVertex(T.label, "B", "name", "name_" +
> i);
> >>a.addEdge("outB", b);
> >>for (int j = 0; j < 1; j++) {
> >>Vertex c = graph.addVertex(T.label, "C", "name", "name_"
> >> + i + " " + j);
> >>b.addEdge("outC", c);
> >>}
> >>}
> >>
> >> And the query being benchmarked is
> >>
> >>GraphTraversal traversal =
> >> g.V(a).as("a").out().as("b").out().as("c").path();
> >>while (traversal.hasNext()) {
> >>Path path = traversal.next();
> >>}
> >>
> >> Before the optimization, (as things are now)
> >>
> >> Benchmark Mode Cnt  Score Error
>  Units
> >> GremlinPathBenchmark.g_path  avgt  100  1.086 ± 0.020   s/op
> >>
> >> The optimization I did is in AbstractStep.prepareTraversalForNextStep,
> >> to not call addLabels() for path gremlins as the labels are known by the
> >> step and do not change again so there is not need to keep adding them.
> >>
> >>private final Traverser.Admin prepareTraversalForNextStep(final
> >> Traverser.Admin traverser) {
> >>if (!this.traverserStepIdAndLabelsSetByChild) {
> >>traverser.setStepId(this.nextStep.getId());
> >>if (traverser instanceof B_LP_O_P_S_SE_SL_Traverser) {
> >>} else {
> >>traverser.addLabels(this.labels);
> >>}
> >>}
> >>return traverser;
> >>}
> >>
> >> After optimization,
> >>
> >> Benchmark Mode Cnt  Score Error
>  Units
> >> GremlinPathBenchmark.g_path  avgt  100  0.680 ± 0.004   s/op
> >>
> >> 1.086 vs 0.689 seconds for the traversal.
> >>
> >> I ran the Structured and Process test suites. 2 tests are failing with
> >> this optimization.
> >>
> >> InjectTest.g_VX1X_out_name_injectXdanielX_asXaX_mapXlengthX_path fails
> with
> >>
> >> "java.lang.IllegalArgumentException: The step with label a does not
> exist"
> >>
> >> and
> >>
> >> SerializationTest.shouldSerializePathAsDetached fails with
> >>
> >> "Caused by: java.lang.IllegalArgumentException: Class is not
> registered:
> >> java.util.Collections$UnmodifiableSet"
> >>
> >> Before investigating the failures is this optimization worth pursuing?
> >>
> >> Thanks
> >> Pieter
> >>
> >
>
>


Re: path query optimization

2016-08-05 Thread Marko Rodriguez
Hello,

This is cool. Check out also ImmutablePath.extend(labels) as that is ultimately 
what Traverser.addLabels() calls. We have a lot of set copying and I don’t know 
if its needed (as you seem to be demonstrating). What I don’t like about your 
solution is the explicit reference to the B_L_P…Traverser in AbstractStep. See 
if you can work your solution without it.

Good luck,
Marko.

http://markorodriguez.com



> On Aug 5, 2016, at 12:44 PM, pieter-gmail  wrote:
> 
> Sorry forgot to add a rather important part.
> 
> I changed ImmutablePath's constructor to
> 
>private ImmutablePath(final ImmutablePathImpl previousPath, final
> Object currentObject, final Set currentLabels) {
>this.previousPath = previousPath;
>this.currentObject = currentObject;
>this.currentLabels = currentLabels;
> //this.currentLabels.addAll(currentLabels);
>}
> 
> Setting the collection directly as oppose to `addAll`
> 
> Thanks
> Pieter
> 
> 
> On 05/08/2016 20:40, pieter-gmail wrote:
>> Hi,
>> 
>> I have been optimizing Sqlg of late and eventually arrived at TinkerPop
>> code.
>> 
>> The gremlin in particular that I am interested is path queries.
>> 
>> Here is the test that I am running in jmh.
>> 
>>//@Setup
>>Vertex a = graph.addVertex(T.label, "A", "name", "a1");
>>for (int i = 1; i < 1_000_001; i++) {
>>Vertex b = graph.addVertex(T.label, "B", "name", "name_" + i);
>>a.addEdge("outB", b);
>>for (int j = 0; j < 1; j++) {
>>Vertex c = graph.addVertex(T.label, "C", "name", "name_"
>> + i + " " + j);
>>b.addEdge("outC", c);
>>}  
>>}
>> 
>> And the query being benchmarked is
>> 
>>GraphTraversal traversal =
>> g.V(a).as("a").out().as("b").out().as("c").path();
>>while (traversal.hasNext()) {
>>Path path = traversal.next();
>>}
>> 
>> Before the optimization, (as things are now)
>> 
>> Benchmark Mode Cnt  Score Error   Units
>> GremlinPathBenchmark.g_path  avgt  100  1.086 ± 0.020   s/op
>> 
>> The optimization I did is in AbstractStep.prepareTraversalForNextStep,
>> to not call addLabels() for path gremlins as the labels are known by the
>> step and do not change again so there is not need to keep adding them.
>> 
>>private final Traverser.Admin prepareTraversalForNextStep(final
>> Traverser.Admin traverser) {
>>if (!this.traverserStepIdAndLabelsSetByChild) {
>>traverser.setStepId(this.nextStep.getId());
>>if (traverser instanceof B_LP_O_P_S_SE_SL_Traverser) {
>>} else {
>>traverser.addLabels(this.labels);
>>}  
>>}  
>>return traverser;
>>} 
>> 
>> After optimization,
>> 
>> Benchmark Mode Cnt  Score Error   Units
>> GremlinPathBenchmark.g_path  avgt  100  0.680 ± 0.004   s/op
>> 
>> 1.086 vs 0.689 seconds for the traversal.
>> 
>> I ran the Structured and Process test suites. 2 tests are failing with
>> this optimization.
>> 
>> InjectTest.g_VX1X_out_name_injectXdanielX_asXaX_mapXlengthX_path fails with
>> 
>> "java.lang.IllegalArgumentException: The step with label a does not exist"
>> 
>> and
>> 
>> SerializationTest.shouldSerializePathAsDetached fails with
>> 
>> "Caused by: java.lang.IllegalArgumentException: Class is not registered:
>> java.util.Collections$UnmodifiableSet"
>> 
>> Before investigating the failures is this optimization worth pursuing?
>> 
>> Thanks
>> Pieter
>> 
> 



Re: path query optimization

2016-08-05 Thread pieter-gmail
Sorry forgot to add a rather important part.

I changed ImmutablePath's constructor to

private ImmutablePath(final ImmutablePathImpl previousPath, final
Object currentObject, final Set currentLabels) {
this.previousPath = previousPath;
this.currentObject = currentObject;
this.currentLabels = currentLabels;
//this.currentLabels.addAll(currentLabels);
}

Setting the collection directly as oppose to `addAll`

Thanks
Pieter


On 05/08/2016 20:40, pieter-gmail wrote:
> Hi,
>
> I have been optimizing Sqlg of late and eventually arrived at TinkerPop
> code.
>
> The gremlin in particular that I am interested is path queries.
>
> Here is the test that I am running in jmh.
>
> //@Setup
> Vertex a = graph.addVertex(T.label, "A", "name", "a1");
> for (int i = 1; i < 1_000_001; i++) {
> Vertex b = graph.addVertex(T.label, "B", "name", "name_" + i);
> a.addEdge("outB", b);
> for (int j = 0; j < 1; j++) {
> Vertex c = graph.addVertex(T.label, "C", "name", "name_"
> + i + " " + j);
> b.addEdge("outC", c);
> }  
> }
>
> And the query being benchmarked is
>
> GraphTraversal traversal =
> g.V(a).as("a").out().as("b").out().as("c").path();
> while (traversal.hasNext()) {
> Path path = traversal.next();
> }
>
> Before the optimization, (as things are now)
>
> Benchmark Mode Cnt  Score Error   Units
> GremlinPathBenchmark.g_path  avgt  100  1.086 ± 0.020   s/op
>
> The optimization I did is in AbstractStep.prepareTraversalForNextStep,
> to not call addLabels() for path gremlins as the labels are known by the
> step and do not change again so there is not need to keep adding them.
>
> private final Traverser.Admin prepareTraversalForNextStep(final
> Traverser.Admin traverser) {
> if (!this.traverserStepIdAndLabelsSetByChild) {
> traverser.setStepId(this.nextStep.getId());
> if (traverser instanceof B_LP_O_P_S_SE_SL_Traverser) {
> } else {
> traverser.addLabels(this.labels);
> }  
> }  
> return traverser;
> } 
>
> After optimization,
>
> Benchmark Mode Cnt  Score Error   Units
> GremlinPathBenchmark.g_path  avgt  100  0.680 ± 0.004   s/op
>
> 1.086 vs 0.689 seconds for the traversal.
>
> I ran the Structured and Process test suites. 2 tests are failing with
> this optimization.
>
> InjectTest.g_VX1X_out_name_injectXdanielX_asXaX_mapXlengthX_path fails with
>
> "java.lang.IllegalArgumentException: The step with label a does not exist"
>
> and
>
> SerializationTest.shouldSerializePathAsDetached fails with
>
> "Caused by: java.lang.IllegalArgumentException: Class is not registered:
> java.util.Collections$UnmodifiableSet"
>
> Before investigating the failures is this optimization worth pursuing?
>
> Thanks
> Pieter
>



RE: path query optimization

2016-08-05 Thread pieter-gmail
Hi,

I have been optimizing Sqlg of late and eventually arrived at TinkerPop
code.

The gremlin in particular that I am interested is path queries.

Here is the test that I am running in jmh.

//@Setup
Vertex a = graph.addVertex(T.label, "A", "name", "a1");
for (int i = 1; i < 1_000_001; i++) {
Vertex b = graph.addVertex(T.label, "B", "name", "name_" + i);
a.addEdge("outB", b);
for (int j = 0; j < 1; j++) {
Vertex c = graph.addVertex(T.label, "C", "name", "name_"
+ i + " " + j);
b.addEdge("outC", c);
}  
}

And the query being benchmarked is

GraphTraversal traversal =
g.V(a).as("a").out().as("b").out().as("c").path();
while (traversal.hasNext()) {
Path path = traversal.next();
}

Before the optimization, (as things are now)

Benchmark Mode Cnt  Score Error   Units
GremlinPathBenchmark.g_path  avgt  100  1.086 ± 0.020   s/op

The optimization I did is in AbstractStep.prepareTraversalForNextStep,
to not call addLabels() for path gremlins as the labels are known by the
step and do not change again so there is not need to keep adding them.

private final Traverser.Admin prepareTraversalForNextStep(final
Traverser.Admin traverser) {
if (!this.traverserStepIdAndLabelsSetByChild) {
traverser.setStepId(this.nextStep.getId());
if (traverser instanceof B_LP_O_P_S_SE_SL_Traverser) {
} else {
traverser.addLabels(this.labels);
}  
}  
return traverser;
} 

After optimization,

Benchmark Mode Cnt  Score Error   Units
GremlinPathBenchmark.g_path  avgt  100  0.680 ± 0.004   s/op

1.086 vs 0.689 seconds for the traversal.

I ran the Structured and Process test suites. 2 tests are failing with
this optimization.

InjectTest.g_VX1X_out_name_injectXdanielX_asXaX_mapXlengthX_path fails with

"java.lang.IllegalArgumentException: The step with label a does not exist"

and

SerializationTest.shouldSerializePathAsDetached fails with

"Caused by: java.lang.IllegalArgumentException: Class is not registered:
java.util.Collections$UnmodifiableSet"

Before investigating the failures is this optimization worth pursuing?

Thanks
Pieter