Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/use-dataguide into lp:zorba
1. The following query returns the wrong result: jsoniq version "1.0"; declare variable $doc1 := " { \"foo\" : { \"name\" : \"moto\", \"price\" : 100 }, \"boo\" : { \"name\" : \"car\", \"price\" : 1000 } } "; ( let $v := exactly-one(jn:parse-json($doc1)) return if ($v.foo.name eq "moto") then $v else () ).boo 2. You have now lost the dataguide that used to be produced in test dataguide-28.jq -- https://code.launchpad.net/~zorba-coders/zorba/use-dataguide/+merge/176385 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/use-dataguide into lp:zorba
Review: Approve -- https://code.launchpad.net/~zorba-coders/zorba/use-dataguide/+merge/176385 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/use-dataguide into lp:zorba
Paul, I've also addressed your comments on the merge proposal. -- -- https://code.launchpad.net/~zorba-coders/zorba/use-dataguide/+merge/176385 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/use-dataguide into lp:zorba
I have fixed issues 1 through 3. Regarding point 4: your approach, if I understood it correctly, will only build the dataflow information, but not the dataguide itself. To compute it, it would require an additional pass through the expression tree and an additional data structure. Regarding the copying in the current code -- first of all, the expr class has a dataguide_cb_t which is only an rchandle, so most of copying you see is just copying of pointers. In cases where data from json "sources" cannot reach expressions, just NULL pointers are propagated. The operation that is costly in the current code is the cloning of dataguides, which is done in a clone-on-modify fashion. That portion could be improved through saving only incremental information instead of cloning and modifying an entire dataguide. -- -- https://code.launchpad.net/~zorba-coders/zorba/use-dataguide/+merge/176385 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/use-dataguide into lp:zorba
Review: Needs Fixing 1. The following query constructs the wrong dataguide (and returns the wrong result): jsoniq version "1.0"; declare variable $doc1 := " { \"foo\" : { \"name\" : \"moto\", \"price\" : 100 }, \"boo\" : { \"name\" : \"car\", \"price\" : 1000 } } "; let $v := exactly-one(jn:parse-json($doc1)) return if ($v.foo.name eq "moto") then $v else () 2. The following query does not construct any dataguide at all, although it should: jsoniq version "1.0"; import module namespace ddl = "http://www.zorba-xquery.com/modules/store/dynamic/collections/ddl";; import module namespace dml = "http://www.zorba-xquery.com/modules/store/dynamic/collections/dml";; declare variable $test external := 1; ddl:create(xs:QName("sales")); dml:insert-last(xs:QName("sales"), ( { "product" : { "name" : "broiler", "price" : 100 }, "category1" : { "category3" : { "category4" : "value4" } } } ) ); ( exactly-one ( if ($test) then dml:collection(xs:QName("sales"))[1] else dml:collection(xs:QName("sales"))[2] ) ).category1 3. Window variables are not taken into account in getClauseVar 4. The implementation of the JsonDataguide rule does too much copying and allocations of dataguides. I have the feeling that this is not necessary. For example, what about the following approach? - There is only one dataguide_cb allocated per JsonDataguide application. It contains one entry for each json "source" encountered during the application of the rule. - There is also a single map from expr* to std::vector. The map contains one entry for each expr that may receive return json items coming from a json "source". The value of the entry is the set of "sources" for the key expr. I may be missing something, but if this works, it will eliminate all allocations and copies. -- https://code.launchpad.net/~zorba-coders/zorba/use-dataguide/+merge/176385 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/use-dataguide into lp:zorba
Review: Needs Fixing 1. The following query constructs the wrong dataguide (and returns the wrong result): jsoniq version "1.0"; declare variable $doc1 := " { \"foo\" : { \"name\" : \"moto\", \"price\" : 100 }, \"boo\" : { \"name\" : \"car\", \"price\" : 1000 } } "; let $v := exactly-one(jn:parse-json($doc1)) return if ($v.foo.name eq "moto") then $v else () 2. The following query does not construct any dataguide at all, although it should: jsoniq version "1.0"; import module namespace ddl = "http://www.zorba-xquery.com/modules/store/dynamic/collections/ddl";; import module namespace dml = "http://www.zorba-xquery.com/modules/store/dynamic/collections/dml";; declare variable $test external := 1; ddl:create(xs:QName("sales")); dml:insert-last(xs:QName("sales"), ( { "product" : { "name" : "broiler", "price" : 100 }, "category1" : { "category3" : { "category4" : "value4" } } } ) ); ( exactly-one ( if ($test) then dml:collection(xs:QName("sales"))[1] else dml:collection(xs:QName("sales"))[2] ) ).category1 3. Window variables are not taken into account in getClauseVar 4. The implementation of the JsonDataguide rule does too much copying and allocations of dataguides. I have the feeling that this is not necessary. For example, what about the following approach? - There is only one dataguide_cb allocated per JsonDataguide application. It contains one entry for each json "source" encountered during the application of the rule. - There is also a single map from expr* to std::vector. The map contains one entry for each expr that may receive return json items coming from a json "source". The value of the entry is the set of "sources" for the key expr. I may be missing something, but if this works, it will eliminate all allocations and copies. -- https://code.launchpad.net/~zorba-coders/zorba/use-dataguide/+merge/176385 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/use-dataguide into lp:zorba
Review: Needs Fixing Please match the rest of the code style, e.g.: s/NULL/nullptr/ s/dataguide/dataguide_/ -- https://code.launchpad.net/~zorba-coders/zorba/use-dataguide/+merge/176385 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/use-dataguide into lp:zorba
Review: Approve I approve the changes now. I think that this still needs a little work if we start pushing-down the dataguide into collections. -- https://code.launchpad.net/~zorba-coders/zorba/use-dataguide/+merge/176385 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/use-dataguide into lp:zorba
> The way I see this is not a problem with the fn functions. The problem is > related to the fact how the objects are used. In this case, the objects are > used to construct a new object. The dataguide needs to handle that. Similar to > serialization, this means that the entire object is needed. Yes, this is being handled and the usage of objects is being tracked. The process() function in the dataguide computation has a parameter for that -- propagates_to_output. You've actually seen it in action -- it is pessimistic and assumes all fn: functions propagate their input to the output. That is why the parameter to count() was assumed to be used and you've seen no pruning. I've now made fn:count() a special case and it's input is no longer considered used for output and its parameter is allowed to be pruned. All the other functions are handled as if they propagate their input and skipping is allowed only where it's indeed possible. -- -- https://code.launchpad.net/~zorba-coders/zorba/use-dataguide/+merge/176385 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/use-dataguide into lp:zorba
Review: Needs Information > > I don't understand why. In my example it wouldn't. The count functions could > > count > > the number of empty objects or objects that contain only the STREET field. > > I've looked into the test and here are the issues: > 1) The fn:count() does not have the %explores-json annotation. I've confused > it with the JSONiq array size() function. > > 2) Currently the dataguide does not distinguish between any of the fn: > functions. So for example if you modify the query: > > for $obj in parse-json(f:read-text(fn:resolve-uri("citylots-small.json"))) > group by $s := $obj.STREET > return { > street : $s, > count : count($obj), > seq : subsequence($obj, 1, 100) > } > > then if you prune the input to have only the .STREET field you'll get an > incorrect result. So currently the argument to subsequence() is "explored" and > so the dataguide is "*" in this case. The same happens with count(). > > 3) fn:count() is a special case as it only cares for the number of objects and > not their contents. I think another annotation could be added, which would be > the opposite of %explores-json and which would indicate that the function > ignores the details of the object. Or I could just add some code in the > dataguide collection to handle fn:count() separately. The way I see this is not a problem with the fn functions. The problem is related to the fact how the objects are used. In this case, the objects are used to construct a new object. The dataguide needs to handle that. Similar to serialization, this means that the entire object is needed. -- https://code.launchpad.net/~zorba-coders/zorba/use-dataguide/+merge/176385 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/use-dataguide into lp:zorba
> I have no idea what problem this MP is supposed to solve; nor do I know what a > "data guide" is. Paul, I've made some changes to the JSON loader so that it skips creating nodes that are not in a given "template" (== dataguide). Since you've written the loader, could you please review only the changes to it and let me know if you find any problems, issues or anything else? -- -- https://code.launchpad.net/~zorba-coders/zorba/use-dataguide/+merge/176385 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/use-dataguide into lp:zorba
I meant the "STREET" field. I've used "price" in the testcase. -- https://code.launchpad.net/~zorba-coders/zorba/use-dataguide/+merge/176385 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/use-dataguide into lp:zorba
Ok, it wasn't too much work -- I've added a special handler for fn:count() in the dataguide code and now the example you gave prunes all the fields except "price". -- -- https://code.launchpad.net/~zorba-coders/zorba/use-dataguide/+merge/176385 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/use-dataguide into lp:zorba
> I don't understand why. In my example it wouldn't. The count functions could > count > the number of empty objects or objects that contain only the STREET field. I've looked into the test and here are the issues: 1) The fn:count() does not have the %explores-json annotation. I've confused it with the JSONiq array size() function. 2) Currently the dataguide does not distinguish between any of the fn: functions. So for example if you modify the query: for $obj in parse-json(f:read-text(fn:resolve-uri("citylots-small.json"))) group by $s := $obj.STREET return { street : $s, count : count($obj), seq : subsequence($obj, 1, 100) } then if you prune the input to have only the .STREET field you'll get an incorrect result. So currently the argument to subsequence() is "explored" and so the dataguide is "*" in this case. The same happens with count(). 3) fn:count() is a special case as it only cares for the number of objects and not their contents. I think another annotation could be added, which would be the opposite of %explores-json and which would indicate that the function ignores the details of the object. Or I could just add some code in the dataguide collection to handle fn:count() separately. -- -- https://code.launchpad.net/~zorba-coders/zorba/use-dataguide/+merge/176385 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/use-dataguide into lp:zorba
Review: Needs Information I have no idea what problem this MP is supposed to solve; nor do I know what a "data guide" is. -- https://code.launchpad.net/~zorba-coders/zorba/use-dataguide/+merge/176385 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/use-dataguide into lp:zorba
Review: Needs Information > > - In the following query, no dataguide seems to be pushed into the parser. > > Why? > > The count() function is marked with the %explores-json annotation, because > pruning objects that reach the function will modify the returned result. That > is why the dataguide is empty. I don't understand why. In my example it wouldn't. The count functions could count the number of empty objects or objects that contain only the STREET field. -- https://code.launchpad.net/~zorba-coders/zorba/use-dataguide/+merge/176385 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/use-dataguide into lp:zorba
> I have tried some basic queries and there is a huge performance improvement. > This is great. > However, I detected a memory leak and have one question. > > - memory leak in translator.cpp:4823 I've fixed it. > > - In the following query, no dataguide seems to be pushed into the parser. > Why? The count() function is marked with the %explores-json annotation, because pruning objects that reach the function will modify the returned result. That is why the dataguide is empty. -- -- https://code.launchpad.net/~zorba-coders/zorba/use-dataguide/+merge/176385 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/use-dataguide into lp:zorba
Review: Needs Fixing I have tried some basic queries and there is a huge performance improvement. This is great. However, I detected a memory leak and have one question. - memory leak in translator.cpp:4823 ==20325== 88 (32 direct, 56 indirect) bytes in 1 blocks are definitely lost in loss record 4 of 4 ==20325==at 0x4C2B1C7: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==20325==by 0x55B8897: zorba::TranslatorImpl::begin_visit(zorba::AnnotationListParsenode const&) (translator.cpp:4823) ==20325==by 0x556447D: zorba::AnnotationListParsenode::accept(zorba::parsenode_visitor&) const (parsenodes.cpp:903) ==20325==by 0x55B18A8: zorba::TranslatorImpl::preprocessVFOList(zorba::VFO_DeclList const&) (translator.cpp:3904) ==20325==by 0x55A9951: zorba::TranslatorImpl::begin_visit(zorba::Prolog const&) (translator.cpp:2889) ==20325==by 0x5562201: zorba::Prolog::accept(zorba::parsenode_visitor&) const (parsenodes.cpp:262) ==20325==by 0x5561E7A: zorba::LibraryModule::accept(zorba::parsenode_visitor&) const (parsenodes.cpp:174) ==20325==by 0x559CCD8: zorba::translate_aux(zorba::TranslatorImpl*, zorba::parsenode const&, zorba::static_context*, unsigned long, zorba::ModulesInfo*, std::map, std::allocator > >, zorba::rstring, std::allocator > >, std::less, std::allocator > > >, std::allocator, std::allocator > > const, zorba::rstring, std::allocator > > > > > const&, bool, zorba::StaticContextConsts::xquery_version_t) (translator.cpp:16343) ==20325==by 0x55AF5C4: zorba::TranslatorImpl::end_visit(zorba::ModuleImport const&, void*) (translator.cpp:3589) ==20325==by 0x55635D3: zorba::ModuleImport::accept(zorba::parsenode_visitor&) const (parsenodes.cpp:667) ==20325==by 0x55A99BB: zorba::TranslatorImpl::begin_visit(zorba::Prolog const&) (translator.cpp:2899) - In the following query, no dataguide seems to be pushed into the parser. Why? import module namespace f = "http://expath.org/ns/file";; for $obj in parse-json(f:read-text(fn:resolve-uri("citylots-small.json"))) group by $s := $obj.STREET return { street : $s, count : count($obj) } -- https://code.launchpad.net/~zorba-coders/zorba/use-dataguide/+merge/176385 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/use-dataguide into lp:zorba
Review: Approve -- https://code.launchpad.net/~zorba-coders/zorba/use-dataguide/+merge/176385 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp