Re: some parent documents
Lucene tends to omit full scans as possible with leap-frogging on skiplists. It will enumerate all *matching* docs O(m) and rank every result with O(log(page size)). ie O(m log p). Early I remember that BJQ enumerated all matching children even most time it's enough find only one, potentially it's a room for improvement but it never be a problem. On Fri, Apr 6, 2018 at 4:15 PM, Arturas Mazeikawrote: > Hi Mikhail et al, > > I must say that this complexity question is still bugging me, and I wonder > if it is possible to get even partial answers in Big-O notation.. > > Say that we have N (for example 10^6) documents, each having 10 SKUs and > each in turn having 10 storage as well as every product having 10 vendors. > Consider then answer to be 1% large (there are 10 000 documents satisfying > the query). What would be the complexity of answering it? > > Cheers, > Arturas > > On Thu, Apr 5, 2018 at 11:47 AM, Arturas Mazeika > wrote: > > > Hi Mikhail et al, > > > > Thanks a lot for sharing the code snippet. I would not have been able to > > dig this Java file myself to investigate the complexity of the search > > query. Scanning the code I get a feeling that it is well structured and > > well thought of. There is a concept like advance (Parent Approximation) > as > > well as ParentPhaseTwo, matches, matchCost, BlockJoinScorer, Explanation, > > Query rewriting. Is there a documentation available how the architecture > > looks like and what school of thought/doctrine used here? > > > > W.r.t. to my complexity question, I expected to see an answer in the > Big-O > > notation (rather than as Java code). Typically one makes assumptions > there > > about the key parameters (e.g., number of Products to be N_P, number of > > SKUs to be N_Sk, number of storages to be N_St, number of vendors to be > > N_V, JOIN Selectivities (in terms of percentage) be p(P,SK), p(SK,ST), > > p(P,V) between the corresponding entities and computes a formula. > > > > What is the complexity of this query in big-O notation? > > > > Cheers, > > Arturas > > > > > > > > On Wed, Apr 4, 2018 at 6:16 PM, Mikhail Khludnev > wrote: > > > >> > > >> > What's happening under the hood of > >> > solr in answering query [1] from [2]? > >> > >> https://github.com/apache/lucene-solr/blob/master/lucene/ > >> join/src/java/org/apache/lucene/search/join/ToParentBlo > >> ckJoinQuery.java#L178 > >> > >> On Wed, Apr 4, 2018 at 3:39 PM, Arturas Mazeika > >> wrote: > >> > >> > Hi Mikhail et al, > >> > > >> > Thanks a lot for a very thorough answer. This is an impressive piece > of > >> > knowledge you just shared. > >> > > >> > Not surprisingly, I was caught unprepared by the 'v=...' part of the > >> > answer. This brought me to the links you posted (starts with http). > From > >> > those links I went to the more updated link (starts with https), which > >> > brought me to other very resourceful links. Combined with some > >> meditation > >> > session, it came into my mind that it is not possible to express block > >> > queries using mathematical logic only. The format of the input > document > >> is > >> > deeply built into the query expression and answering. Expressing these > >> > queries mathematically / logically may give an impression that solr is > >> > capable of answering (NP-?) hard problems. I have a feeling though > that > >> > solr answers to queries in polynomial (or even almost linear) times. > >> > > >> > Just to connect the remaining dots.. What's happening under the hood > of > >> > solr in answering query [1] from [2]? Is it really so that inverted > >> index > >> > is used to identify the vectors of ids, that are scanned linearly in a > >> hope > >> > to get matches on _root_ and other internal variables? > >> > > >> > [1] q=+{!parent which=type_s:product v=$skuq} +{!parent > >> > which=type_s:product v=$vendorq}=+COLOR_s:Blue +SIZE_s:XL > +{!parent > >> > which=type_s:sku v='+QTY_i:[10 TO *] +STATE_s:CA'}=+NAME_s: > Bob > >> > +PRICE_i:[20 TO 25] > >> > [2] > >> > https://blog.griddynamics.com/searching-grandchildren-and- > >> > siblings-with-solr-block-join/ > >> > > >> > Thanks! > >> > Arturas > >> > > >> > On Wed, Apr 4, 2018 at 12:36 PM, Mikhail Khludnev > >> wrote: > >> > > >> > > q=+{!parent which=ntype:p v='+msg:Hello +person:Arturas'} +{!parent > >> > which= > >> > > ntype:p v='+msg:ciao +person:Vai'} > >> > > > >> > > On Wed, Apr 4, 2018 at 12:19 PM, Arturas Mazeika > > >> > > wrote: > >> > > > >> > > > Hi Mikhail et al, > >> > > > > >> > > > It seems to me that the nested documents must include nodes that > >> encode > >> > > the > >> > > > level of nodes (within the document). Therefore, the minimal > example > >> > must > >> > > > include the node type. Is the following structure sufficient? > >> > > > > >> > > > { > >> > > > "id":1, > >> > > > "ntype":"p", > >> > > > "_childDocuments_": > >> > >
Re: some parent documents
Hi Mikhail et al, I must say that this complexity question is still bugging me, and I wonder if it is possible to get even partial answers in Big-O notation.. Say that we have N (for example 10^6) documents, each having 10 SKUs and each in turn having 10 storage as well as every product having 10 vendors. Consider then answer to be 1% large (there are 10 000 documents satisfying the query). What would be the complexity of answering it? Cheers, Arturas On Thu, Apr 5, 2018 at 11:47 AM, Arturas Mazeikawrote: > Hi Mikhail et al, > > Thanks a lot for sharing the code snippet. I would not have been able to > dig this Java file myself to investigate the complexity of the search > query. Scanning the code I get a feeling that it is well structured and > well thought of. There is a concept like advance (Parent Approximation) as > well as ParentPhaseTwo, matches, matchCost, BlockJoinScorer, Explanation, > Query rewriting. Is there a documentation available how the architecture > looks like and what school of thought/doctrine used here? > > W.r.t. to my complexity question, I expected to see an answer in the Big-O > notation (rather than as Java code). Typically one makes assumptions there > about the key parameters (e.g., number of Products to be N_P, number of > SKUs to be N_Sk, number of storages to be N_St, number of vendors to be > N_V, JOIN Selectivities (in terms of percentage) be p(P,SK), p(SK,ST), > p(P,V) between the corresponding entities and computes a formula. > > What is the complexity of this query in big-O notation? > > Cheers, > Arturas > > > > On Wed, Apr 4, 2018 at 6:16 PM, Mikhail Khludnev wrote: > >> > >> > What's happening under the hood of >> > solr in answering query [1] from [2]? >> >> https://github.com/apache/lucene-solr/blob/master/lucene/ >> join/src/java/org/apache/lucene/search/join/ToParentBlo >> ckJoinQuery.java#L178 >> >> On Wed, Apr 4, 2018 at 3:39 PM, Arturas Mazeika >> wrote: >> >> > Hi Mikhail et al, >> > >> > Thanks a lot for a very thorough answer. This is an impressive piece of >> > knowledge you just shared. >> > >> > Not surprisingly, I was caught unprepared by the 'v=...' part of the >> > answer. This brought me to the links you posted (starts with http). From >> > those links I went to the more updated link (starts with https), which >> > brought me to other very resourceful links. Combined with some >> meditation >> > session, it came into my mind that it is not possible to express block >> > queries using mathematical logic only. The format of the input document >> is >> > deeply built into the query expression and answering. Expressing these >> > queries mathematically / logically may give an impression that solr is >> > capable of answering (NP-?) hard problems. I have a feeling though that >> > solr answers to queries in polynomial (or even almost linear) times. >> > >> > Just to connect the remaining dots.. What's happening under the hood of >> > solr in answering query [1] from [2]? Is it really so that inverted >> index >> > is used to identify the vectors of ids, that are scanned linearly in a >> hope >> > to get matches on _root_ and other internal variables? >> > >> > [1] q=+{!parent which=type_s:product v=$skuq} +{!parent >> > which=type_s:product v=$vendorq}=+COLOR_s:Blue +SIZE_s:XL +{!parent >> > which=type_s:sku v='+QTY_i:[10 TO *] +STATE_s:CA'}=+NAME_s:Bob >> > +PRICE_i:[20 TO 25] >> > [2] >> > https://blog.griddynamics.com/searching-grandchildren-and- >> > siblings-with-solr-block-join/ >> > >> > Thanks! >> > Arturas >> > >> > On Wed, Apr 4, 2018 at 12:36 PM, Mikhail Khludnev >> wrote: >> > >> > > q=+{!parent which=ntype:p v='+msg:Hello +person:Arturas'} +{!parent >> > which= >> > > ntype:p v='+msg:ciao +person:Vai'} >> > > >> > > On Wed, Apr 4, 2018 at 12:19 PM, Arturas Mazeika >> > > wrote: >> > > >> > > > Hi Mikhail et al, >> > > > >> > > > It seems to me that the nested documents must include nodes that >> encode >> > > the >> > > > level of nodes (within the document). Therefore, the minimal example >> > must >> > > > include the node type. Is the following structure sufficient? >> > > > >> > > > { >> > > > "id":1, >> > > > "ntype":"p", >> > > > "_childDocuments_": >> > > > [ >> > > > {"id":"1_1", "ntype":"c", "person":"Vai", "time":"3:14", >> > > > "msg":"Hello"}, >> > > > {"id":"1_2", "ntype":"c", "person":"Arturas", "time":"3:14", >> > > > "msg":"Hello"}, >> > > > {"id":"1_3", "ntype":"c", "person":"Vai", "time":"3:15", >> > > > "msg":"Coz Mathias is working on another system- different >> screen."}, >> > > > {"id":"1_4", "ntype":"c", "person":"Vai", "time":"3:15", >> > > > "msg":"It can get annoying"}, >> > > > {"id":"1_5", "ntype":"c", "person":"Arturas", "time":"3:15", >> > > > "msg":"Thank you. this is very nice of you"}, >> > > > {"id":"1_6", "ntype":"c",
Re: some parent documents
Hi Mikhail et al, Thanks a lot for sharing the code snippet. I would not have been able to dig this Java file myself to investigate the complexity of the search query. Scanning the code I get a feeling that it is well structured and well thought of. There is a concept like advance (Parent Approximation) as well as ParentPhaseTwo, matches, matchCost, BlockJoinScorer, Explanation, Query rewriting. Is there a documentation available how the architecture looks like and what school of thought/doctrine used here? W.r.t. to my complexity question, I expected to see an answer in the Big-O notation (rather than as Java code). Typically one makes assumptions there about the key parameters (e.g., number of Products to be N_P, number of SKUs to be N_Sk, number of storages to be N_St, number of vendors to be N_V, JOIN Selectivities (in terms of percentage) be p(P,SK), p(SK,ST), p(P,V) between the corresponding entities and computes a formula. What is the complexity of this query in big-O notation? Cheers, Arturas On Wed, Apr 4, 2018 at 6:16 PM, Mikhail Khludnevwrote: > > > > What's happening under the hood of > > solr in answering query [1] from [2]? > > https://github.com/apache/lucene-solr/blob/master/ > lucene/join/src/java/org/apache/lucene/search/join/ > ToParentBlockJoinQuery.java#L178 > > On Wed, Apr 4, 2018 at 3:39 PM, Arturas Mazeika wrote: > > > Hi Mikhail et al, > > > > Thanks a lot for a very thorough answer. This is an impressive piece of > > knowledge you just shared. > > > > Not surprisingly, I was caught unprepared by the 'v=...' part of the > > answer. This brought me to the links you posted (starts with http). From > > those links I went to the more updated link (starts with https), which > > brought me to other very resourceful links. Combined with some meditation > > session, it came into my mind that it is not possible to express block > > queries using mathematical logic only. The format of the input document > is > > deeply built into the query expression and answering. Expressing these > > queries mathematically / logically may give an impression that solr is > > capable of answering (NP-?) hard problems. I have a feeling though that > > solr answers to queries in polynomial (or even almost linear) times. > > > > Just to connect the remaining dots.. What's happening under the hood of > > solr in answering query [1] from [2]? Is it really so that inverted index > > is used to identify the vectors of ids, that are scanned linearly in a > hope > > to get matches on _root_ and other internal variables? > > > > [1] q=+{!parent which=type_s:product v=$skuq} +{!parent > > which=type_s:product v=$vendorq}=+COLOR_s:Blue +SIZE_s:XL +{!parent > > which=type_s:sku v='+QTY_i:[10 TO *] +STATE_s:CA'}=+NAME_s:Bob > > +PRICE_i:[20 TO 25] > > [2] > > https://blog.griddynamics.com/searching-grandchildren-and- > > siblings-with-solr-block-join/ > > > > Thanks! > > Arturas > > > > On Wed, Apr 4, 2018 at 12:36 PM, Mikhail Khludnev > wrote: > > > > > q=+{!parent which=ntype:p v='+msg:Hello +person:Arturas'} +{!parent > > which= > > > ntype:p v='+msg:ciao +person:Vai'} > > > > > > On Wed, Apr 4, 2018 at 12:19 PM, Arturas Mazeika > > > wrote: > > > > > > > Hi Mikhail et al, > > > > > > > > It seems to me that the nested documents must include nodes that > encode > > > the > > > > level of nodes (within the document). Therefore, the minimal example > > must > > > > include the node type. Is the following structure sufficient? > > > > > > > > { > > > > "id":1, > > > > "ntype":"p", > > > > "_childDocuments_": > > > > [ > > > > {"id":"1_1", "ntype":"c", "person":"Vai", "time":"3:14", > > > > "msg":"Hello"}, > > > > {"id":"1_2", "ntype":"c", "person":"Arturas", "time":"3:14", > > > > "msg":"Hello"}, > > > > {"id":"1_3", "ntype":"c", "person":"Vai", "time":"3:15", > > > > "msg":"Coz Mathias is working on another system- different screen."}, > > > > {"id":"1_4", "ntype":"c", "person":"Vai", "time":"3:15", > > > > "msg":"It can get annoying"}, > > > > {"id":"1_5", "ntype":"c", "person":"Arturas", "time":"3:15", > > > > "msg":"Thank you. this is very nice of you"}, > > > > {"id":"1_6", "ntype":"c", "person":"Vai", "time":"3:16", > > > > "msg":"ciao"}, > > > > {"id":"1_7", "ntype":"c", "person":"Arturas", "time":"3:16", > > > > "msg":"ciao"} > > > > ] > > > > }, > > > > { > > > > "id":2, > > > > "ntype":"p", > > > > "_childDocuments_": > > > > [ > > > > {"id":"2_1", "ntype":"c", "person":"Vai", "time":"4:14", > > > > "msg":"Hi"}, > > > > {"id":"2_2", "ntype":"c", "person":"Arturas", "time":"4:14", > > > > "msg":"IBM Watson"}, > > > > {"id":"2_3", "ntype":"c", "person":"Vai", "time":"4:15", > > > > "msg":"need to retain content"}, > > > > {"id":"2_4", "ntype":"c", "person":"Vai", "time":"4:15", >
Re: some parent documents
> > What's happening under the hood of > solr in answering query [1] from [2]? https://github.com/apache/lucene-solr/blob/master/lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java#L178 On Wed, Apr 4, 2018 at 3:39 PM, Arturas Mazeikawrote: > Hi Mikhail et al, > > Thanks a lot for a very thorough answer. This is an impressive piece of > knowledge you just shared. > > Not surprisingly, I was caught unprepared by the 'v=...' part of the > answer. This brought me to the links you posted (starts with http). From > those links I went to the more updated link (starts with https), which > brought me to other very resourceful links. Combined with some meditation > session, it came into my mind that it is not possible to express block > queries using mathematical logic only. The format of the input document is > deeply built into the query expression and answering. Expressing these > queries mathematically / logically may give an impression that solr is > capable of answering (NP-?) hard problems. I have a feeling though that > solr answers to queries in polynomial (or even almost linear) times. > > Just to connect the remaining dots.. What's happening under the hood of > solr in answering query [1] from [2]? Is it really so that inverted index > is used to identify the vectors of ids, that are scanned linearly in a hope > to get matches on _root_ and other internal variables? > > [1] q=+{!parent which=type_s:product v=$skuq} +{!parent > which=type_s:product v=$vendorq}=+COLOR_s:Blue +SIZE_s:XL +{!parent > which=type_s:sku v='+QTY_i:[10 TO *] +STATE_s:CA'}=+NAME_s:Bob > +PRICE_i:[20 TO 25] > [2] > https://blog.griddynamics.com/searching-grandchildren-and- > siblings-with-solr-block-join/ > > Thanks! > Arturas > > On Wed, Apr 4, 2018 at 12:36 PM, Mikhail Khludnev wrote: > > > q=+{!parent which=ntype:p v='+msg:Hello +person:Arturas'} +{!parent > which= > > ntype:p v='+msg:ciao +person:Vai'} > > > > On Wed, Apr 4, 2018 at 12:19 PM, Arturas Mazeika > > wrote: > > > > > Hi Mikhail et al, > > > > > > It seems to me that the nested documents must include nodes that encode > > the > > > level of nodes (within the document). Therefore, the minimal example > must > > > include the node type. Is the following structure sufficient? > > > > > > { > > > "id":1, > > > "ntype":"p", > > > "_childDocuments_": > > > [ > > > {"id":"1_1", "ntype":"c", "person":"Vai", "time":"3:14", > > > "msg":"Hello"}, > > > {"id":"1_2", "ntype":"c", "person":"Arturas", "time":"3:14", > > > "msg":"Hello"}, > > > {"id":"1_3", "ntype":"c", "person":"Vai", "time":"3:15", > > > "msg":"Coz Mathias is working on another system- different screen."}, > > > {"id":"1_4", "ntype":"c", "person":"Vai", "time":"3:15", > > > "msg":"It can get annoying"}, > > > {"id":"1_5", "ntype":"c", "person":"Arturas", "time":"3:15", > > > "msg":"Thank you. this is very nice of you"}, > > > {"id":"1_6", "ntype":"c", "person":"Vai", "time":"3:16", > > > "msg":"ciao"}, > > > {"id":"1_7", "ntype":"c", "person":"Arturas", "time":"3:16", > > > "msg":"ciao"} > > > ] > > > }, > > > { > > > "id":2, > > > "ntype":"p", > > > "_childDocuments_": > > > [ > > > {"id":"2_1", "ntype":"c", "person":"Vai", "time":"4:14", > > > "msg":"Hi"}, > > > {"id":"2_2", "ntype":"c", "person":"Arturas", "time":"4:14", > > > "msg":"IBM Watson"}, > > > {"id":"2_3", "ntype":"c", "person":"Vai", "time":"4:15", > > > "msg":"need to retain content"}, > > > {"id":"2_4", "ntype":"c", "person":"Vai", "time":"4:15", > > > "msg":"It can get annoying"}, > > > {"id":"2_5", "ntype":"c", "person":"Arturas", "time":"4:15", > > > "msg":"You can make all your meetings more access"}, > > > {"id":"2_6", "ntype":"c", "person":"Vai", "time":"4:16", > > > "msg":"Make every meeting a Skype meeting"}, > > > {"id":"2_7", "ntype":"c", "person":"Arturas", "time":"4:16", > > > "msg":"ciao"} > > > ] > > > } > > > > > > How would a query look like that has a Hello from Person Arturas and > ciao > > > from Person Vai? > > > > > > Cheers, > > > Arturas > > > > > > > > > On Tue, Apr 3, 2018 at 5:21 PM, Arturas Mazeika > > wrote: > > > > > > > Hi Mikhail, > > > > > > > > Thanks a lot for the reply. > > > > > > > > You mentioned that > > > > > > > > q=+{!parent which.. v='+text:hello +person:A'} +{!parent > > > > which..v='+text:ciao +person:B'} > > > > > > > > is the way to go. How would it look like precisely for the following > > > > collection? > > > > > > > > { > > > > "id":1, > > > > "_childDocuments_": > > > > [ > > > > {"id":"1_1", "person":"Vai" , "time":"3:14", > > > > "msg":"Hello"}, > > > > {"id":"1_2", "person":"Arturas" , "time":"3:14", > > > > "msg":"Hello"}, > > > > {"id":"1_3", "person":"Vai"
Re: some parent documents
Hi Mikhail et al, Thanks a lot for a very thorough answer. This is an impressive piece of knowledge you just shared. Not surprisingly, I was caught unprepared by the 'v=...' part of the answer. This brought me to the links you posted (starts with http). From those links I went to the more updated link (starts with https), which brought me to other very resourceful links. Combined with some meditation session, it came into my mind that it is not possible to express block queries using mathematical logic only. The format of the input document is deeply built into the query expression and answering. Expressing these queries mathematically / logically may give an impression that solr is capable of answering (NP-?) hard problems. I have a feeling though that solr answers to queries in polynomial (or even almost linear) times. Just to connect the remaining dots.. What's happening under the hood of solr in answering query [1] from [2]? Is it really so that inverted index is used to identify the vectors of ids, that are scanned linearly in a hope to get matches on _root_ and other internal variables? [1] q=+{!parent which=type_s:product v=$skuq} +{!parent which=type_s:product v=$vendorq}=+COLOR_s:Blue +SIZE_s:XL +{!parent which=type_s:sku v='+QTY_i:[10 TO *] +STATE_s:CA'}=+NAME_s:Bob +PRICE_i:[20 TO 25] [2] https://blog.griddynamics.com/searching-grandchildren-and-siblings-with-solr-block-join/ Thanks! Arturas On Wed, Apr 4, 2018 at 12:36 PM, Mikhail Khludnevwrote: > q=+{!parent which=ntype:p v='+msg:Hello +person:Arturas'} +{!parent which= > ntype:p v='+msg:ciao +person:Vai'} > > On Wed, Apr 4, 2018 at 12:19 PM, Arturas Mazeika > wrote: > > > Hi Mikhail et al, > > > > It seems to me that the nested documents must include nodes that encode > the > > level of nodes (within the document). Therefore, the minimal example must > > include the node type. Is the following structure sufficient? > > > > { > > "id":1, > > "ntype":"p", > > "_childDocuments_": > > [ > > {"id":"1_1", "ntype":"c", "person":"Vai", "time":"3:14", > > "msg":"Hello"}, > > {"id":"1_2", "ntype":"c", "person":"Arturas", "time":"3:14", > > "msg":"Hello"}, > > {"id":"1_3", "ntype":"c", "person":"Vai", "time":"3:15", > > "msg":"Coz Mathias is working on another system- different screen."}, > > {"id":"1_4", "ntype":"c", "person":"Vai", "time":"3:15", > > "msg":"It can get annoying"}, > > {"id":"1_5", "ntype":"c", "person":"Arturas", "time":"3:15", > > "msg":"Thank you. this is very nice of you"}, > > {"id":"1_6", "ntype":"c", "person":"Vai", "time":"3:16", > > "msg":"ciao"}, > > {"id":"1_7", "ntype":"c", "person":"Arturas", "time":"3:16", > > "msg":"ciao"} > > ] > > }, > > { > > "id":2, > > "ntype":"p", > > "_childDocuments_": > > [ > > {"id":"2_1", "ntype":"c", "person":"Vai", "time":"4:14", > > "msg":"Hi"}, > > {"id":"2_2", "ntype":"c", "person":"Arturas", "time":"4:14", > > "msg":"IBM Watson"}, > > {"id":"2_3", "ntype":"c", "person":"Vai", "time":"4:15", > > "msg":"need to retain content"}, > > {"id":"2_4", "ntype":"c", "person":"Vai", "time":"4:15", > > "msg":"It can get annoying"}, > > {"id":"2_5", "ntype":"c", "person":"Arturas", "time":"4:15", > > "msg":"You can make all your meetings more access"}, > > {"id":"2_6", "ntype":"c", "person":"Vai", "time":"4:16", > > "msg":"Make every meeting a Skype meeting"}, > > {"id":"2_7", "ntype":"c", "person":"Arturas", "time":"4:16", > > "msg":"ciao"} > > ] > > } > > > > How would a query look like that has a Hello from Person Arturas and ciao > > from Person Vai? > > > > Cheers, > > Arturas > > > > > > On Tue, Apr 3, 2018 at 5:21 PM, Arturas Mazeika > wrote: > > > > > Hi Mikhail, > > > > > > Thanks a lot for the reply. > > > > > > You mentioned that > > > > > > q=+{!parent which.. v='+text:hello +person:A'} +{!parent > > > which..v='+text:ciao +person:B'} > > > > > > is the way to go. How would it look like precisely for the following > > > collection? > > > > > > { > > > "id":1, > > > "_childDocuments_": > > > [ > > > {"id":"1_1", "person":"Vai" , "time":"3:14", > > > "msg":"Hello"}, > > > {"id":"1_2", "person":"Arturas" , "time":"3:14", > > > "msg":"Hello"}, > > > {"id":"1_3", "person":"Vai" , "time":"3:15", "msg":"Coz > > > Mathias is working on another system- different screen."}, > > > {"id":"1_4", "person":"Vai" , "time":"3:15", "msg":"It > > can > > > get annoying"}, > > > {"id":"1_5", "person":"Arturas" , "time":"3:15", > "msg":"Thank > > > you. this is very nice of you"}, > > > {"id":"1_6", "person":"Vai" , "time":"3:16", > > "msg":"ciao"}, > > > {"id":"1_7", "person":"Arturas" , "time":"3:16", > > "msg":"ciao"} > > > ] > > > }, > > > { > >
Re: some parent documents
q=+{!parent which=ntype:p v='+msg:Hello +person:Arturas'} +{!parent which= ntype:p v='+msg:ciao +person:Vai'} On Wed, Apr 4, 2018 at 12:19 PM, Arturas Mazeikawrote: > Hi Mikhail et al, > > It seems to me that the nested documents must include nodes that encode the > level of nodes (within the document). Therefore, the minimal example must > include the node type. Is the following structure sufficient? > > { > "id":1, > "ntype":"p", > "_childDocuments_": > [ > {"id":"1_1", "ntype":"c", "person":"Vai", "time":"3:14", > "msg":"Hello"}, > {"id":"1_2", "ntype":"c", "person":"Arturas", "time":"3:14", > "msg":"Hello"}, > {"id":"1_3", "ntype":"c", "person":"Vai", "time":"3:15", > "msg":"Coz Mathias is working on another system- different screen."}, > {"id":"1_4", "ntype":"c", "person":"Vai", "time":"3:15", > "msg":"It can get annoying"}, > {"id":"1_5", "ntype":"c", "person":"Arturas", "time":"3:15", > "msg":"Thank you. this is very nice of you"}, > {"id":"1_6", "ntype":"c", "person":"Vai", "time":"3:16", > "msg":"ciao"}, > {"id":"1_7", "ntype":"c", "person":"Arturas", "time":"3:16", > "msg":"ciao"} > ] > }, > { > "id":2, > "ntype":"p", > "_childDocuments_": > [ > {"id":"2_1", "ntype":"c", "person":"Vai", "time":"4:14", > "msg":"Hi"}, > {"id":"2_2", "ntype":"c", "person":"Arturas", "time":"4:14", > "msg":"IBM Watson"}, > {"id":"2_3", "ntype":"c", "person":"Vai", "time":"4:15", > "msg":"need to retain content"}, > {"id":"2_4", "ntype":"c", "person":"Vai", "time":"4:15", > "msg":"It can get annoying"}, > {"id":"2_5", "ntype":"c", "person":"Arturas", "time":"4:15", > "msg":"You can make all your meetings more access"}, > {"id":"2_6", "ntype":"c", "person":"Vai", "time":"4:16", > "msg":"Make every meeting a Skype meeting"}, > {"id":"2_7", "ntype":"c", "person":"Arturas", "time":"4:16", > "msg":"ciao"} > ] > } > > How would a query look like that has a Hello from Person Arturas and ciao > from Person Vai? > > Cheers, > Arturas > > > On Tue, Apr 3, 2018 at 5:21 PM, Arturas Mazeika wrote: > > > Hi Mikhail, > > > > Thanks a lot for the reply. > > > > You mentioned that > > > > q=+{!parent which.. v='+text:hello +person:A'} +{!parent > > which..v='+text:ciao +person:B'} > > > > is the way to go. How would it look like precisely for the following > > collection? > > > > { > > "id":1, > > "_childDocuments_": > > [ > > {"id":"1_1", "person":"Vai" , "time":"3:14", > > "msg":"Hello"}, > > {"id":"1_2", "person":"Arturas" , "time":"3:14", > > "msg":"Hello"}, > > {"id":"1_3", "person":"Vai" , "time":"3:15", "msg":"Coz > > Mathias is working on another system- different screen."}, > > {"id":"1_4", "person":"Vai" , "time":"3:15", "msg":"It > can > > get annoying"}, > > {"id":"1_5", "person":"Arturas" , "time":"3:15", "msg":"Thank > > you. this is very nice of you"}, > > {"id":"1_6", "person":"Vai" , "time":"3:16", > "msg":"ciao"}, > > {"id":"1_7", "person":"Arturas" , "time":"3:16", > "msg":"ciao"} > > ] > > }, > > { > > "id":2, > > "_childDocuments_": > > [ > > {"id":"2_1", "person":"Vai" , "time":"4:14", > > "msg":"Hello"}, > > {"id":"2_2", "person":"Arturas" , "time":"4:14", "msg":"IBM > > Watson"}, > > {"id":"2_3", "person":"Vai" , "time":"4:15", "msg":"need > > to retain content"}, > > {"id":"2_4", "person":"Vai" , "time":"4:15", "msg":"It > can > > get annoying"}, > > {"id":"2_5", "person":"Arturas" , "time":"4:15", "msg":"You > > can make all your meetings more access"}, > > {"id":"2_6", "person":"Vai" , "time":"4:16", "msg":"Make > > every meeting a Skype meeting"}, > > {"id":"2_7", "person":"Arturas" , "time":"4:16", > "msg":"ciao"} > > ] > > } > > > > Cheers, > > Arturas > > > > > > On Tue, Apr 3, 2018 at 4:33 PM, Mikhail Khludnev > wrote: > > > >> Hello, Arturas. > >> > >> TLDR; Please find inline below. > >> > >> On Tue, Apr 3, 2018 at 5:14 PM, Arturas Mazeika > >> wrote: > >> > >> > Hi Solr Fans, > >> > > >> > I am trying to make sense of information retrieval using expressions > >> like > >> > "some parent", "*only parent*", " *all parent*". I am also trying to > >> > understand the syntax "!parent which" and "!child of". On the > technical > >> > level, I am reading the following documents: > >> > > >> > [1] > >> > https://lucene.apache.org/solr/guide/7_2/other-parsers. > >> > html#block-join-query-parsers > >> > [2] > >> > https://lucene.apache.org/solr/guide/7_2/uploading-data- > >> > with-index-handlers.html#nested-child-documents > >> > [3] http://yonik.com/solr-nested-objects/ > >> > > >> > and I am confused to read: > >> > >
Re: some parent documents
Hi Mikhail et al, It seems to me that the nested documents must include nodes that encode the level of nodes (within the document). Therefore, the minimal example must include the node type. Is the following structure sufficient? { "id":1, "ntype":"p", "_childDocuments_": [ {"id":"1_1", "ntype":"c", "person":"Vai", "time":"3:14", "msg":"Hello"}, {"id":"1_2", "ntype":"c", "person":"Arturas", "time":"3:14", "msg":"Hello"}, {"id":"1_3", "ntype":"c", "person":"Vai", "time":"3:15", "msg":"Coz Mathias is working on another system- different screen."}, {"id":"1_4", "ntype":"c", "person":"Vai", "time":"3:15", "msg":"It can get annoying"}, {"id":"1_5", "ntype":"c", "person":"Arturas", "time":"3:15", "msg":"Thank you. this is very nice of you"}, {"id":"1_6", "ntype":"c", "person":"Vai", "time":"3:16", "msg":"ciao"}, {"id":"1_7", "ntype":"c", "person":"Arturas", "time":"3:16", "msg":"ciao"} ] }, { "id":2, "ntype":"p", "_childDocuments_": [ {"id":"2_1", "ntype":"c", "person":"Vai", "time":"4:14", "msg":"Hi"}, {"id":"2_2", "ntype":"c", "person":"Arturas", "time":"4:14", "msg":"IBM Watson"}, {"id":"2_3", "ntype":"c", "person":"Vai", "time":"4:15", "msg":"need to retain content"}, {"id":"2_4", "ntype":"c", "person":"Vai", "time":"4:15", "msg":"It can get annoying"}, {"id":"2_5", "ntype":"c", "person":"Arturas", "time":"4:15", "msg":"You can make all your meetings more access"}, {"id":"2_6", "ntype":"c", "person":"Vai", "time":"4:16", "msg":"Make every meeting a Skype meeting"}, {"id":"2_7", "ntype":"c", "person":"Arturas", "time":"4:16", "msg":"ciao"} ] } How would a query look like that has a Hello from Person Arturas and ciao from Person Vai? Cheers, Arturas On Tue, Apr 3, 2018 at 5:21 PM, Arturas Mazeikawrote: > Hi Mikhail, > > Thanks a lot for the reply. > > You mentioned that > > q=+{!parent which.. v='+text:hello +person:A'} +{!parent > which..v='+text:ciao +person:B'} > > is the way to go. How would it look like precisely for the following > collection? > > { > "id":1, > "_childDocuments_": > [ > {"id":"1_1", "person":"Vai" , "time":"3:14", > "msg":"Hello"}, > {"id":"1_2", "person":"Arturas" , "time":"3:14", > "msg":"Hello"}, > {"id":"1_3", "person":"Vai" , "time":"3:15", "msg":"Coz > Mathias is working on another system- different screen."}, > {"id":"1_4", "person":"Vai" , "time":"3:15", "msg":"It can > get annoying"}, > {"id":"1_5", "person":"Arturas" , "time":"3:15", "msg":"Thank > you. this is very nice of you"}, > {"id":"1_6", "person":"Vai" , "time":"3:16", "msg":"ciao"}, > {"id":"1_7", "person":"Arturas" , "time":"3:16", "msg":"ciao"} > ] > }, > { > "id":2, > "_childDocuments_": > [ > {"id":"2_1", "person":"Vai" , "time":"4:14", > "msg":"Hello"}, > {"id":"2_2", "person":"Arturas" , "time":"4:14", "msg":"IBM > Watson"}, > {"id":"2_3", "person":"Vai" , "time":"4:15", "msg":"need > to retain content"}, > {"id":"2_4", "person":"Vai" , "time":"4:15", "msg":"It can > get annoying"}, > {"id":"2_5", "person":"Arturas" , "time":"4:15", "msg":"You > can make all your meetings more access"}, > {"id":"2_6", "person":"Vai" , "time":"4:16", "msg":"Make > every meeting a Skype meeting"}, > {"id":"2_7", "person":"Arturas" , "time":"4:16", "msg":"ciao"} > ] > } > > Cheers, > Arturas > > > On Tue, Apr 3, 2018 at 4:33 PM, Mikhail Khludnev wrote: > >> Hello, Arturas. >> >> TLDR; Please find inline below. >> >> On Tue, Apr 3, 2018 at 5:14 PM, Arturas Mazeika >> wrote: >> >> > Hi Solr Fans, >> > >> > I am trying to make sense of information retrieval using expressions >> like >> > "some parent", "*only parent*", " *all parent*". I am also trying to >> > understand the syntax "!parent which" and "!child of". On the technical >> > level, I am reading the following documents: >> > >> > [1] >> > https://lucene.apache.org/solr/guide/7_2/other-parsers. >> > html#block-join-query-parsers >> > [2] >> > https://lucene.apache.org/solr/guide/7_2/uploading-data- >> > with-index-handlers.html#nested-child-documents >> > [3] http://yonik.com/solr-nested-objects/ >> > >> > and I am confused to read: >> > >> > This parser takes a query that matches some parent documents and returns >> > their children. The syntax for this parser is: q={!child >> > of=}. The parameter allParents is a filter that >> > matches *only parent documents*; here you would define the field and >> value >> > that you used to identify *all parent documents*. The parameter >> someParents >> > identifies a query that will match some of the parent documents. The >> output >> > is the children. >> > >>
Re: some parent documents
Hi Mikhail, Thanks a lot for the reply. You mentioned that q=+{!parent which.. v='+text:hello +person:A'} +{!parent which..v='+text:ciao +person:B'} is the way to go. How would it look like precisely for the following collection? { "id":1, "_childDocuments_": [ {"id":"1_1", "person":"Vai" , "time":"3:14", "msg":"Hello"}, {"id":"1_2", "person":"Arturas" , "time":"3:14", "msg":"Hello"}, {"id":"1_3", "person":"Vai" , "time":"3:15", "msg":"Coz Mathias is working on another system- different screen."}, {"id":"1_4", "person":"Vai" , "time":"3:15", "msg":"It can get annoying"}, {"id":"1_5", "person":"Arturas" , "time":"3:15", "msg":"Thank you. this is very nice of you"}, {"id":"1_6", "person":"Vai" , "time":"3:16", "msg":"ciao"}, {"id":"1_7", "person":"Arturas" , "time":"3:16", "msg":"ciao"} ] }, { "id":2, "_childDocuments_": [ {"id":"2_1", "person":"Vai" , "time":"4:14", "msg":"Hello"}, {"id":"2_2", "person":"Arturas" , "time":"4:14", "msg":"IBM Watson"}, {"id":"2_3", "person":"Vai" , "time":"4:15", "msg":"need to retain content"}, {"id":"2_4", "person":"Vai" , "time":"4:15", "msg":"It can get annoying"}, {"id":"2_5", "person":"Arturas" , "time":"4:15", "msg":"You can make all your meetings more access"}, {"id":"2_6", "person":"Vai" , "time":"4:16", "msg":"Make every meeting a Skype meeting"}, {"id":"2_7", "person":"Arturas" , "time":"4:16", "msg":"ciao"} ] } Cheers, Arturas On Tue, Apr 3, 2018 at 4:33 PM, Mikhail Khludnevwrote: > Hello, Arturas. > > TLDR; Please find inline below. > > On Tue, Apr 3, 2018 at 5:14 PM, Arturas Mazeika wrote: > > > Hi Solr Fans, > > > > I am trying to make sense of information retrieval using expressions like > > "some parent", "*only parent*", " *all parent*". I am also trying to > > understand the syntax "!parent which" and "!child of". On the technical > > level, I am reading the following documents: > > > > [1] > > https://lucene.apache.org/solr/guide/7_2/other-parsers. > > html#block-join-query-parsers > > [2] > > https://lucene.apache.org/solr/guide/7_2/uploading-data- > > with-index-handlers.html#nested-child-documents > > [3] http://yonik.com/solr-nested-objects/ > > > > and I am confused to read: > > > > This parser takes a query that matches some parent documents and returns > > their children. The syntax for this parser is: q={!child > > of=}. The parameter allParents is a filter that > > matches *only parent documents*; here you would define the field and > value > > that you used to identify *all parent documents*. The parameter > someParents > > identifies a query that will match some of the parent documents. The > output > > is the children. > > > > The first sentence talks about "matching" but does not define what that > > means (and why it is only some parents matching?). The second sentence > > introduces a syntax of the parser, but blurs the understanding as "some" > > and "all" of parents are combined into one sentence. My understanding is > > that all documents are retrieve that satisfy a query. The query must > > express some constraints on the parent node and some on the child node. I > > have a feeling that "only parent documents" reads "criteria is formulated > > over the parent part of {parent document}->{child document} of entity. > > My simplified conceptual world of solr looks in the following way: > > > > 1. Every document has an ID. > > 2. Every document may have additional attributes > > 3. Text attributes is what's at stake in solr. Sure we can search for > > products that costs at most X, but this is the added functionality. For > > simplicity I am neglecting those here. > > 4. The user has an information need. She expresses it with (key)words and > > hopes to find matching documents. For simplicity, I am skipping all > issues > > related to the information presentation of the documents > > 5. Analysis chain (and inverse index) are the key technologies solr is > > based upon. Once the chain-processing is applied, mathematical logic > kicks > > in, retrieving the documents (that are a set of processed, normalized, > > enriched tokens) matching the query (processed, normalized and enriched > > tokens). Clearly, the logic function can be a fancy one (at least one of > > query token is in the document set of tokens, etc.), ranking is used to > > sort the results. > > 6. A nested document concept is introduced in solr. It needs to be > uploaded > > into the index structure using a specific handlers [2]. A nested > documents > > is a tree. A root may contain children documents, which may be parents of > > grandchildren documents. > > 7. Querying nested documents is supported in the following manner: > > 7.1 Child documents are return that satisfies {parent > >
Re: some parent documents
Hello, Arturas. TLDR; Please find inline below. On Tue, Apr 3, 2018 at 5:14 PM, Arturas Mazeikawrote: > Hi Solr Fans, > > I am trying to make sense of information retrieval using expressions like > "some parent", "*only parent*", " *all parent*". I am also trying to > understand the syntax "!parent which" and "!child of". On the technical > level, I am reading the following documents: > > [1] > https://lucene.apache.org/solr/guide/7_2/other-parsers. > html#block-join-query-parsers > [2] > https://lucene.apache.org/solr/guide/7_2/uploading-data- > with-index-handlers.html#nested-child-documents > [3] http://yonik.com/solr-nested-objects/ > > and I am confused to read: > > This parser takes a query that matches some parent documents and returns > their children. The syntax for this parser is: q={!child > of=}. The parameter allParents is a filter that > matches *only parent documents*; here you would define the field and value > that you used to identify *all parent documents*. The parameter someParents > identifies a query that will match some of the parent documents. The output > is the children. > > The first sentence talks about "matching" but does not define what that > means (and why it is only some parents matching?). The second sentence > introduces a syntax of the parser, but blurs the understanding as "some" > and "all" of parents are combined into one sentence. My understanding is > that all documents are retrieve that satisfy a query. The query must > express some constraints on the parent node and some on the child node. I > have a feeling that "only parent documents" reads "criteria is formulated > over the parent part of {parent document}->{child document} of entity. > My simplified conceptual world of solr looks in the following way: > > 1. Every document has an ID. > 2. Every document may have additional attributes > 3. Text attributes is what's at stake in solr. Sure we can search for > products that costs at most X, but this is the added functionality. For > simplicity I am neglecting those here. > 4. The user has an information need. She expresses it with (key)words and > hopes to find matching documents. For simplicity, I am skipping all issues > related to the information presentation of the documents > 5. Analysis chain (and inverse index) are the key technologies solr is > based upon. Once the chain-processing is applied, mathematical logic kicks > in, retrieving the documents (that are a set of processed, normalized, > enriched tokens) matching the query (processed, normalized and enriched > tokens). Clearly, the logic function can be a fancy one (at least one of > query token is in the document set of tokens, etc.), ranking is used to > sort the results. > 6. A nested document concept is introduced in solr. It needs to be uploaded > into the index structure using a specific handlers [2]. A nested documents > is a tree. A root may contain children documents, which may be parents of > grandchildren documents. > 7. Querying nested documents is supported in the following manner: > 7.1 Child documents are return that satisfies {parent > document}->{document} > 7.2 Parent documents are return that satisfy {document}->{child > document} > > Would I be very wrong to have this conceptual picture? > > From this point, the situation is a bit bury in my head. At the core, I do > not really understand what "a document" is anymore (since the complete json > or xml, so is a sub-json and sub-xml are documents, every document must > have an ID, does that meant the the subdocuments must have and ID too, or > sub-ids are also fine?), how to formulate mathematical expressions over > documents and what it means that the document satisfies my (key)word query? > Can we define a document to be the largest entity of information that does > not contain any other nested documents [4]? If this is defined and > communicated like this already where can I find it? There is a use of the > clarification, as the concept of the document means different things in > different contexts (e.g., you can update only the "complete document" in > the index vs. parent document, etc.). > > Is it possible to formulate what's going on using mathematical logic? Can > one express something like > > { give documents d : d is a document, d is parent of document c, d > satisfies logical criteria C1,,CN, c satisfies logical criteria > C1',...,CM'} > { give documents c : c is a document, d is parent of document c, d > satisfies logical criteria C1,,CN, c satisfies logical criteria > C1',...,CM'} > > here the meaning of document is as in definition [4] above. > > 1. Is it possible to retrieve all parent documents that have two children > c1 and c2? Consider a document that is a skype chat, and children are > individual lines of communication in the chat. I would be looking for the > (parent) documents that have "hello" said by person A and "ciao" said by > person B (as two different