[jira] [Commented] (SOLR-12638) Support atomic updates of nested/child documents for nested-enabled schema

2019-04-09 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814045#comment-16814045
 ] 

mosh commented on SOLR-12638:
-

After inspecting the test results it seems like the only failure was due to 
[org.apache.solr.update.processor.CategoryRoutedAliasUpdateProcessorTest.testSliceRouting|https://builds.apache.org/job/PreCommit-SOLR-Build/367/testReport/org.apache.solr.update.processor/CategoryRoutedAliasUpdateProcessorTest/testSliceRouting/]
 reaching max GC overhead.
{code:java}
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=39740, name=Connection evictor, state=RUNNABLE, 
group=TGRP-CategoryRoutedAliasUpdateProcessorTest]
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded{code}
This issue is probably unrelated to this patch, since SOLR-13370 was opened to 
address this issue.
[~dsmiley],
WDTY?

> Support atomic updates of nested/child documents for nested-enabled schema
> --
>
> Key: SOLR-12638
> URL: https://issues.apache.org/jira/browse/SOLR-12638
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
> Attachments: SOLR-12638-delete-old-block-no-commit.patch, 
> SOLR-12638-nocommit.patch, SOLR-12638.patch, SOLR-12638.patch
>
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> I have been toying with the thought of using this transformer in conjunction 
> with NestedUpdateProcessor and AtomicUpdate to allow SOLR to completely 
> re-index the entire nested structure. This is just a thought, I am still 
> thinking about implementation details. Hopefully I will be able to post a 
> more concrete proposal soon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12638) Support atomic updates of nested/child documents for nested-enabled schema

2019-04-09 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814045#comment-16814045
 ] 

mosh edited comment on SOLR-12638 at 4/10/19 5:16 AM:
--

After inspecting the test results it seems like the only failure was due to 
[org.apache.solr.update.processor.CategoryRoutedAliasUpdateProcessorTest|https://builds.apache.org/job/PreCommit-SOLR-Build/367/testReport/org.apache.solr.update.processor/CategoryRoutedAliasUpdateProcessorTest/testSliceRouting/]
 reaching max GC overhead.
{code:java}
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=39740, name=Connection evictor, state=RUNNABLE, 
group=TGRP-CategoryRoutedAliasUpdateProcessorTest]
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded{code}
This issue is probably unrelated to this patch, since SOLR-13370 was opened to 
address this issue.
[~dsmiley],
WDTY?


was (Author: moshebla):
After inspecting the test results it seems like the only failure was due to 
[org.apache.solr.update.processor.CategoryRoutedAliasUpdateProcessorTest.testSliceRouting|https://builds.apache.org/job/PreCommit-SOLR-Build/367/testReport/org.apache.solr.update.processor/CategoryRoutedAliasUpdateProcessorTest/testSliceRouting/]
 reaching max GC overhead.
{code:java}
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=39740, name=Connection evictor, state=RUNNABLE, 
group=TGRP-CategoryRoutedAliasUpdateProcessorTest]
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded{code}
This issue is probably unrelated to this patch, since SOLR-13370 was opened to 
address this issue.
[~dsmiley],
WDTY?

> Support atomic updates of nested/child documents for nested-enabled schema
> --
>
> Key: SOLR-12638
> URL: https://issues.apache.org/jira/browse/SOLR-12638
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
> Attachments: SOLR-12638-delete-old-block-no-commit.patch, 
> SOLR-12638-nocommit.patch, SOLR-12638.patch, SOLR-12638.patch
>
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> I have been toying with the thought of using this transformer in conjunction 
> with NestedUpdateProcessor and AtomicUpdate to allow SOLR to completely 
> re-index the entire nested structure. This is just a thought, I am still 
> thinking about implementation details. Hopefully I will be able to post a 
> more concrete proposal soon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12638) Support atomic updates of nested/child documents for nested-enabled schema

2019-04-09 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814045#comment-16814045
 ] 

mosh edited comment on SOLR-12638 at 4/10/19 5:17 AM:
--

After inspecting the test results it seems like the only failure was due to 
[org.apache.solr.update.processor.CategoryRoutedAliasUpdateProcessorTest|https://builds.apache.org/job/PreCommit-SOLR-Build/367/testReport/junit.framework/TestSuite/org_apache_solr_update_processor_CategoryRoutedAliasUpdateProcessorTest_2/]
 reaching max GC overhead.
{code:java}
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=39740, name=Connection evictor, state=RUNNABLE, 
group=TGRP-CategoryRoutedAliasUpdateProcessorTest]
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded{code}
This issue is probably unrelated to this patch, since SOLR-13370 was opened to 
address this issue.
[~dsmiley],
WDTY?


was (Author: moshebla):
After inspecting the test results it seems like the only failure was due to 
[org.apache.solr.update.processor.CategoryRoutedAliasUpdateProcessorTest|https://builds.apache.org/job/PreCommit-SOLR-Build/367/testReport/org.apache.solr.update.processor/CategoryRoutedAliasUpdateProcessorTest/testSliceRouting/]
 reaching max GC overhead.
{code:java}
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=39740, name=Connection evictor, state=RUNNABLE, 
group=TGRP-CategoryRoutedAliasUpdateProcessorTest]
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded{code}
This issue is probably unrelated to this patch, since SOLR-13370 was opened to 
address this issue.
[~dsmiley],
WDTY?

> Support atomic updates of nested/child documents for nested-enabled schema
> --
>
> Key: SOLR-12638
> URL: https://issues.apache.org/jira/browse/SOLR-12638
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
> Attachments: SOLR-12638-delete-old-block-no-commit.patch, 
> SOLR-12638-nocommit.patch, SOLR-12638.patch, SOLR-12638.patch
>
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> I have been toying with the thought of using this transformer in conjunction 
> with NestedUpdateProcessor and AtomicUpdate to allow SOLR to completely 
> re-index the entire nested structure. This is just a thought, I am still 
> thinking about implementation details. Hopefully I will be able to post a 
> more concrete proposal soon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12993) Split the state.json into 2. a small frequently modified data + a large unmodified data

2019-04-17 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16819976#comment-16819976
 ] 

mosh commented on SOLR-12993:
-

{quote}or alternately we can just add this data (status, leader) to the LIR 
term files . That way , we don't need to create any new files
{quote}
ZkShardTerms(class that generates LIR files) resides in solr-core, while 
ZkStateReader is in solrJ.
 Since this proposal is to split state.json, there would be no way to find out 
which replica is the leader,
 since this information will reside inside the LIR term files.

I propose two possible forms of action:
 # Move ZkShardTerms to solrJ, and combine LIR terms
 # Create new files as proposed by [~noble.paul], which will contain a small 
subset of the split information.

[~noble.paul], [~gus_heck],
 WDYT?

> Split the state.json into 2. a small frequently modified data + a large 
> unmodified data
> ---
>
> Key: SOLR-12993
> URL: https://issues.apache.org/jira/browse/SOLR-12993
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Priority: Major
>
> This a just a proposal to minimize the ZK load and improve scalability of 
> very large clusters.
> Every time a small state change occurs for a collection/replica the following 
> file needs to be updated + read * n times (where n = no of replicas for this 
> collection ). The proposal is to split the main file into 2.
> {code}
> {"gettingstarted":{
> "pullReplicas":"0",
> "replicationFactor":"2",
> "router":{"name":"compositeId"},
> "maxShardsPerNode":"-1",
> "autoAddReplicas":"false",
> "nrtReplicas":"2",
> "tlogReplicas":"0",
> "shards":{
>   "shard1":{
> "range":"8000-",
>   
> "replicas":{
>   "core_node3":{
> "core":"gettingstarted_shard1_replica_n1",
> "base_url":"http://10.0.0.80:8983/solr";,
> "node_name":"10.0.0.80:8983_solr",
> "state":"active",
> "type":"NRT",
> "force_set_state":"false",
> "leader":"true"},
>   "core_node5":{
> "core":"gettingstarted_shard1_replica_n2",
> "base_url":"http://10.0.0.80:7574/solr";,
> "node_name":"10.0.0.80:7574_solr",
>  
> "type":"NRT",
> "force_set_state":"false"}}},
>   "shard2":{
> "range":"0-7fff",
> "state":"active",
> "replicas":{
>   "core_node7":{
> "core":"gettingstarted_shard2_replica_n4",
> "base_url":"http://10.0.0.80:7574/solr";,
> "node_name":"10.0.0.80:7574_solr",
>
> "type":"NRT",
> "force_set_state":"false"},
>   "core_node8":{
> "core":"gettingstarted_shard2_replica_n6",
> "base_url":"http://10.0.0.80:8983/solr";,
> "node_name":"10.0.0.80:8983_solr",
>  
> "type":"NRT",
> "force_set_state":"false",
> "leader":"true"}}
> {code}
> another file {{status.json}} which is frequently updated and small.
> {code}
> {
> "shard1": {
>   "state": "ACTIVE",
>   "core_node3": {"state": "active", "leader" : true},
>   "core_node5": {"state": "active"}
> },
> "shard2": {
>   "state": "active",
>   "core_node7": {"state": "active"},
>   "core_node8": {"state": "active", "leader" : true}}
>   }
> {code}
> Here the size of the file is roughly one tenth of the other file. This leads 
> to a dramatic reduction in the amount of data written/read to/from ZK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12993) Split the state.json into 2. a small frequently modified data + a large unmodified data

2019-04-17 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16819976#comment-16819976
 ] 

mosh edited comment on SOLR-12993 at 4/17/19 11:05 AM:
---

{quote}or alternately we can just add this data (status, leader) to the LIR 
term files . That way , we don't need to create any new files
{quote}
ZkShardTerms(class that generates LIR files) resides in solr-core, while 
ZkStateReader is in solrJ.
 Since this proposal is to split state.json, there would be no way to find out 
which replica is the leader,
 since this information will reside inside the LIR term files.

I propose two possible forms of action:
 # Move ZkShardTerms to solrJ, combining LIR terms, shard state status and 
leader.
 # Create new files as proposed by [~noble.paul], which will contain a small 
subset of the split information.

[~noble.paul], [~gus_heck],
 WDYT?


was (Author: moshebla):
{quote}or alternately we can just add this data (status, leader) to the LIR 
term files . That way , we don't need to create any new files
{quote}
ZkShardTerms(class that generates LIR files) resides in solr-core, while 
ZkStateReader is in solrJ.
 Since this proposal is to split state.json, there would be no way to find out 
which replica is the leader,
 since this information will reside inside the LIR term files.

I propose two possible forms of action:
 # Move ZkShardTerms to solrJ, and combine LIR terms
 # Create new files as proposed by [~noble.paul], which will contain a small 
subset of the split information.

[~noble.paul], [~gus_heck],
 WDYT?

> Split the state.json into 2. a small frequently modified data + a large 
> unmodified data
> ---
>
> Key: SOLR-12993
> URL: https://issues.apache.org/jira/browse/SOLR-12993
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Priority: Major
>
> This a just a proposal to minimize the ZK load and improve scalability of 
> very large clusters.
> Every time a small state change occurs for a collection/replica the following 
> file needs to be updated + read * n times (where n = no of replicas for this 
> collection ). The proposal is to split the main file into 2.
> {code}
> {"gettingstarted":{
> "pullReplicas":"0",
> "replicationFactor":"2",
> "router":{"name":"compositeId"},
> "maxShardsPerNode":"-1",
> "autoAddReplicas":"false",
> "nrtReplicas":"2",
> "tlogReplicas":"0",
> "shards":{
>   "shard1":{
> "range":"8000-",
>   
> "replicas":{
>   "core_node3":{
> "core":"gettingstarted_shard1_replica_n1",
> "base_url":"http://10.0.0.80:8983/solr";,
> "node_name":"10.0.0.80:8983_solr",
> "state":"active",
> "type":"NRT",
> "force_set_state":"false",
> "leader":"true"},
>   "core_node5":{
> "core":"gettingstarted_shard1_replica_n2",
> "base_url":"http://10.0.0.80:7574/solr";,
> "node_name":"10.0.0.80:7574_solr",
>  
> "type":"NRT",
> "force_set_state":"false"}}},
>   "shard2":{
> "range":"0-7fff",
> "state":"active",
> "replicas":{
>   "core_node7":{
> "core":"gettingstarted_shard2_replica_n4",
> "base_url":"http://10.0.0.80:7574/solr";,
> "node_name":"10.0.0.80:7574_solr",
>
> "type":"NRT",
> "force_set_state":"false"},
>   "core_node8":{
> "core":"gettingstarted_shard2_replica_n6",
> "base_url":"http://10.0.0.80:8983/solr";,
> "node_name":"10.0.0.80:8983_solr",
>  
> "type":"NRT",
> "force_set_state":"false",
> "leader":"true"}}
> {code}
> another file {{status.json}} which is frequently updated and small.
> {code}
> {
> "shard1": {
>   "state": "ACTIVE",
>   "core_node3": {"state": "active", "leader" : true},
>   "core_node5": {"state": "active"}
> },
> "shard2": {
>   "state": "active",
>   "core_node7": {"state": "active"},
>   "core_node8": {"state": "active", "leader" : true}}
>   }
> {code}
> Here the size of the file is roughly one tenth of the other file. This leads 
> to a dramatic reduction in the amount of data written/read to/from ZK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-12951) Should Child Doc Ids be unique?

2018-11-01 Thread mosh (JIRA)
mosh created SOLR-12951:
---

 Summary: Should Child Doc Ids be unique?
 Key: SOLR-12951
 URL: https://issues.apache.org/jira/browse/SOLR-12951
 Project: Solr
  Issue Type: Sub-task
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: mosh






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12951) Should Child Doc Ids be unique?

2018-11-01 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-12951:

Description: 
Currently there is no constraint regarding child document ids, which can repeat 
themselves in different documents.

[~dsmiley] has previously brought the topic of child doc Ids
{quote}make it mandatory that all child doc IDs start with a root doc ID then 
an exclamation then whatever.\{quote}
This has made me think, should we enforce child document Ids to be unique?
Perhaps chaining the root document id(which is unique) could aid?

> Should Child Doc Ids be unique?
> ---
>
> Key: SOLR-12951
> URL: https://issues.apache.org/jira/browse/SOLR-12951
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>
> Currently there is no constraint regarding child document ids, which can 
> repeat themselves in different documents.
> [~dsmiley] has previously brought the topic of child doc Ids
> {quote}make it mandatory that all child doc IDs start with a root doc ID then 
> an exclamation then whatever.\{quote}
> This has made me think, should we enforce child document Ids to be unique?
> Perhaps chaining the root document id(which is unique) could aid?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12951) Should Child Doc Ids be unique?

2018-11-01 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-12951:

Description: 
Currently there is no constraint regarding child document ids, which can repeat 
themselves in different documents.

[~dsmiley] has previously brought the topic of child doc Ids
{quote}make it mandatory that all child doc IDs start with a root doc ID then 
an exclamation then whatever.{quote}
This has made me think, should we enforce child document Ids to be unique?
Perhaps chaining the root document id(which is unique) could aid?

  was:
Currently there is no constraint regarding child document ids, which can repeat 
themselves in different documents.

[~dsmiley] has previously brought the topic of child doc Ids
{quote}make it mandatory that all child doc IDs start with a root doc ID then 
an exclamation then whatever.\{quote}
This has made me think, should we enforce child document Ids to be unique?
Perhaps chaining the root document id(which is unique) could aid?


> Should Child Doc Ids be unique?
> ---
>
> Key: SOLR-12951
> URL: https://issues.apache.org/jira/browse/SOLR-12951
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>
> Currently there is no constraint regarding child document ids, which can 
> repeat themselves in different documents.
> [~dsmiley] has previously brought the topic of child doc Ids
> {quote}make it mandatory that all child doc IDs start with a root doc ID then 
> an exclamation then whatever.{quote}
> This has made me think, should we enforce child document Ids to be unique?
> Perhaps chaining the root document id(which is unique) could aid?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-12952) TFIDF scorer uses max docs instead of num docs when using Edismax

2018-11-01 Thread mosh (JIRA)
mosh created SOLR-12952:
---

 Summary: TFIDF scorer uses max docs instead of num docs when using 
Edismax
 Key: SOLR-12952
 URL: https://issues.apache.org/jira/browse/SOLR-12952
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: mosh


I have recently noticed some odd behavior while using the edismax query parser.
The scores returned by documents seem to be affected by deleted documents, 
which have yet to be merged and completely removed from the index.
This causes different shards to return different scores for the same query.
Is this a bug, or am I missing something?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12952) TFIDF scorer uses max docs instead of num docs when using Edismax

2018-11-01 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-12952:

Description: 
I have recently noticed some odd behavior while using the edismax query parser.
 The scores returned by documents seem to be affected by deleted documents, 
which have yet to be merged and completely removed from the index.
 This causes different replicas to return different scores for the same query.
 Is this a bug, or am I missing something?

  was:
I have recently noticed some odd behavior while using the edismax query parser.
The scores returned by documents seem to be affected by deleted documents, 
which have yet to be merged and completely removed from the index.
This causes different shards to return different scores for the same query.
Is this a bug, or am I missing something?


> TFIDF scorer uses max docs instead of num docs when using Edismax
> -
>
> Key: SOLR-12952
> URL: https://issues.apache.org/jira/browse/SOLR-12952
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>
> I have recently noticed some odd behavior while using the edismax query 
> parser.
>  The scores returned by documents seem to be affected by deleted documents, 
> which have yet to be merged and completely removed from the index.
>  This causes different replicas to return different scores for the same query.
>  Is this a bug, or am I missing something?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-12958) Statistical Phrase Identifier should return phrases in single field

2018-11-04 Thread mosh (JIRA)
mosh created SOLR-12958:
---

 Summary: Statistical Phrase Identifier should return phrases in 
single field
 Key: SOLR-12958
 URL: https://issues.apache.org/jira/browse/SOLR-12958
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: mosh


It has come to my attention that the phrase identifier introduced in SOLR-9418 
does not return phrases that are found only in one of the fields specified in 
phrases.fields.
This has proved troublesome for our use case.
The offending line seems to be
{code:java}
final List validScoringPhrasesSorted = contextData.allPhrases.stream()
  .filter(p -> 0.0D < p.getTotalScore())
  .sorted(Comparator.comparing((p -> p.getTotalScore()), 
Collections.reverseOrder()))
  .collect(Collectors.toList());{code}
Since fields where the phrase is not present return -1.0, and fields that 
contain the phrase return a score in the range of 0.0 <= score >= 1.0, the 
total score turn out negative, and the phrase gets filtered.
I changed separated the filters to 2 distinct cases:
# Filter out single word phrases (*phrases.singleWordPhrases* is set to false)
# Include single word phrases (*phrases.singleWordPhrases* is set to true)

This can be observed by this change to the component's logid:
{code}if(!rb.req.getParams().getBool(PHRASE_MATCH_SINGLE_WORD, false)) {
  // filter single word phrases
  phraseStream = contextData.allPhrases.stream()
  .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
fieldScore > 0.0D));
} else {
  // include single word phrases, which return a constant score of 0.0
  phraseStream = contextData.allPhrases.stream()
  .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
fieldScore >= 0.0D));
}{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12958) Statistical Phrase Identifier should return phrases in single field

2018-11-04 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-12958:

Description: 
It has come to my attention that the phrase identifier introduced in SOLR-9418 
does not return phrases that are found in only one of the fields specified by 
phrases.fields.
 This has proved troublesome for our use case.
 The offending line seems to be
{code:java}
final List validScoringPhrasesSorted = contextData.allPhrases.stream()
  .filter(p -> 0.0D < p.getTotalScore())
  .sorted(Comparator.comparing((p -> p.getTotalScore()), 
Collections.reverseOrder()))
  .collect(Collectors.toList());{code}
Since fields where the phrase is not present return -1.0, and fields that 
contain the phrase return a score in the range of 0.0 <= score >= 1.0, the 
total score turn out negative, and the phrase gets filtered.
 I changed separated the filters to 2 distinct cases:
 # Filter out single word phrases (*phrases.singleWordPhrases* is set to false)
 # Include single word phrases (*phrases.singleWordPhrases* is set to true)

This can be observed by this change to the component's logid:
{code:java}
if(!rb.req.getParams().getBool(PHRASE_MATCH_SINGLE_WORD, false)) {
  // filter single word phrases
  phraseStream = contextData.allPhrases.stream()
  .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
fieldScore > 0.0D));
} else {
  // include single word phrases, which return a constant score of 0.0
  phraseStream = contextData.allPhrases.stream()
  .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
fieldScore >= 0.0D));
}{code}

  was:
It has come to my attention that the phrase identifier introduced in SOLR-9418 
does not return phrases that are found only in one of the fields specified in 
phrases.fields.
 This has proved troublesome for our use case.
 The offending line seems to be
{code:java}
final List validScoringPhrasesSorted = contextData.allPhrases.stream()
  .filter(p -> 0.0D < p.getTotalScore())
  .sorted(Comparator.comparing((p -> p.getTotalScore()), 
Collections.reverseOrder()))
  .collect(Collectors.toList());{code}
Since fields where the phrase is not present return -1.0, and fields that 
contain the phrase return a score in the range of 0.0 <= score >= 1.0, the 
total score turn out negative, and the phrase gets filtered.
 I changed separated the filters to 2 distinct cases:
 # Filter out single word phrases (*phrases.singleWordPhrases* is set to false)
 # Include single word phrases (*phrases.singleWordPhrases* is set to true)

This can be observed by this change to the component's logid:
{code:java}
if(!rb.req.getParams().getBool(PHRASE_MATCH_SINGLE_WORD, false)) {
  // filter single word phrases
  phraseStream = contextData.allPhrases.stream()
  .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
fieldScore > 0.0D));
} else {
  // include single word phrases, which return a constant score of 0.0
  phraseStream = contextData.allPhrases.stream()
  .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
fieldScore >= 0.0D));
}{code}


> Statistical Phrase Identifier should return phrases in single field
> ---
>
> Key: SOLR-12958
> URL: https://issues.apache.org/jira/browse/SOLR-12958
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>  Labels: phrase, phrasequery
>
> It has come to my attention that the phrase identifier introduced in 
> SOLR-9418 does not return phrases that are found in only one of the fields 
> specified by phrases.fields.
>  This has proved troublesome for our use case.
>  The offending line seems to be
> {code:java}
> final List validScoringPhrasesSorted = contextData.allPhrases.stream()
>   .filter(p -> 0.0D < p.getTotalScore())
>   .sorted(Comparator.comparing((p -> p.getTotalScore()), 
> Collections.reverseOrder()))
>   .collect(Collectors.toList());{code}
> Since fields where the phrase is not present return -1.0, and fields that 
> contain the phrase return a score in the range of 0.0 <= score >= 1.0, the 
> total score turn out negative, and the phrase gets filtered.
>  I changed separated the filters to 2 distinct cases:
>  # Filter out single word phrases (*phrases.singleWordPhrases* is set to 
> false)
>  # Include single word phrases (*phrases.singleWordPhrases* is set to true)
> This can be observed by this change to the component's logid:
> {code:java}
> if(!rb.req.getParams().getBool(PHRASE_MATCH_SINGLE_WORD, false)) {
>   // filter single word phrases
>   phraseStream = contextData.allPhrases.stream()
>   .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
> fieldScore > 

[jira] [Updated] (SOLR-12958) Statistical Phrase Identifier should return phrases in single field

2018-11-04 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-12958:

Description: 
It has come to my attention that the phrase identifier introduced in SOLR-9418 
does not return phrases that are found only in one of the fields specified in 
phrases.fields.
 This has proved troublesome for our use case.
 The offending line seems to be
{code:java}
final List validScoringPhrasesSorted = contextData.allPhrases.stream()
  .filter(p -> 0.0D < p.getTotalScore())
  .sorted(Comparator.comparing((p -> p.getTotalScore()), 
Collections.reverseOrder()))
  .collect(Collectors.toList());{code}
Since fields where the phrase is not present return -1.0, and fields that 
contain the phrase return a score in the range of 0.0 <= score >= 1.0, the 
total score turn out negative, and the phrase gets filtered.
 I changed separated the filters to 2 distinct cases:
 # Filter out single word phrases (*phrases.singleWordPhrases* is set to false)
 # Include single word phrases (*phrases.singleWordPhrases* is set to true)

This can be observed by this change to the component's logid:
{code:java}
if(!rb.req.getParams().getBool(PHRASE_MATCH_SINGLE_WORD, false)) {
  // filter single word phrases
  phraseStream = contextData.allPhrases.stream()
  .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
fieldScore > 0.0D));
} else {
  // include single word phrases, which return a constant score of 0.0
  phraseStream = contextData.allPhrases.stream()
  .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
fieldScore >= 0.0D));
}{code}

  was:
It has come to my attention that the phrase identifier introduced in SOLR-9418 
does not return phrases that are found only in one of the fields specified in 
phrases.fields.
This has proved troublesome for our use case.
The offending line seems to be
{code:java}
final List validScoringPhrasesSorted = contextData.allPhrases.stream()
  .filter(p -> 0.0D < p.getTotalScore())
  .sorted(Comparator.comparing((p -> p.getTotalScore()), 
Collections.reverseOrder()))
  .collect(Collectors.toList());{code}
Since fields where the phrase is not present return -1.0, and fields that 
contain the phrase return a score in the range of 0.0 <= score >= 1.0, the 
total score turn out negative, and the phrase gets filtered.
I changed separated the filters to 2 distinct cases:
# Filter out single word phrases (*phrases.singleWordPhrases* is set to false)
# Include single word phrases (*phrases.singleWordPhrases* is set to true)

This can be observed by this change to the component's logid:
{code}if(!rb.req.getParams().getBool(PHRASE_MATCH_SINGLE_WORD, false)) {
  // filter single word phrases
  phraseStream = contextData.allPhrases.stream()
  .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
fieldScore > 0.0D));
} else {
  // include single word phrases, which return a constant score of 0.0
  phraseStream = contextData.allPhrases.stream()
  .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
fieldScore >= 0.0D));
}{code}



> Statistical Phrase Identifier should return phrases in single field
> ---
>
> Key: SOLR-12958
> URL: https://issues.apache.org/jira/browse/SOLR-12958
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>  Labels: phrase, phrasequery
>
> It has come to my attention that the phrase identifier introduced in 
> SOLR-9418 does not return phrases that are found only in one of the fields 
> specified in phrases.fields.
>  This has proved troublesome for our use case.
>  The offending line seems to be
> {code:java}
> final List validScoringPhrasesSorted = contextData.allPhrases.stream()
>   .filter(p -> 0.0D < p.getTotalScore())
>   .sorted(Comparator.comparing((p -> p.getTotalScore()), 
> Collections.reverseOrder()))
>   .collect(Collectors.toList());{code}
> Since fields where the phrase is not present return -1.0, and fields that 
> contain the phrase return a score in the range of 0.0 <= score >= 1.0, the 
> total score turn out negative, and the phrase gets filtered.
>  I changed separated the filters to 2 distinct cases:
>  # Filter out single word phrases (*phrases.singleWordPhrases* is set to 
> false)
>  # Include single word phrases (*phrases.singleWordPhrases* is set to true)
> This can be observed by this change to the component's logid:
> {code:java}
> if(!rb.req.getParams().getBool(PHRASE_MATCH_SINGLE_WORD, false)) {
>   // filter single word phrases
>   phraseStream = contextData.allPhrases.stream()
>   .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
> fieldScore > 0.

[jira] [Updated] (SOLR-12958) Statistical Phrase Identifier should return phrases in single field

2018-11-04 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-12958:

Affects Version/s: master (8.0)
   7.5

> Statistical Phrase Identifier should return phrases in single field
> ---
>
> Key: SOLR-12958
> URL: https://issues.apache.org/jira/browse/SOLR-12958
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.5, master (8.0)
>Reporter: mosh
>Priority: Major
>  Labels: phrase, phrasequery
> Attachments: SOLR-12958.patch
>
>
> It has come to my attention that the phrase identifier introduced in 
> SOLR-9418 does not return phrases that are found in only one of the fields 
> specified by phrases.fields.
>  This has proved troublesome for our use case.
>  The offending line seems to be
> {code:java}
> final List validScoringPhrasesSorted = contextData.allPhrases.stream()
>   .filter(p -> 0.0D < p.getTotalScore())
>   .sorted(Comparator.comparing((p -> p.getTotalScore()), 
> Collections.reverseOrder()))
>   .collect(Collectors.toList());{code}
> Since fields where the phrase is not present return -1.0, and fields that 
> contain the phrase return a score in the range of 0.0 <= score >= 1.0, the 
> total score turn out negative, and the phrase gets filtered.
>  I changed separated the filters to 2 distinct cases:
>  # Filter out single word phrases (*phrases.singleWordPhrases* is set to 
> false)
>  # Include single word phrases (*phrases.singleWordPhrases* is set to true)
> This can be observed by this change to the component's logid:
> {code:java}
> if(!rb.req.getParams().getBool(PHRASE_MATCH_SINGLE_WORD, false)) {
>   // filter single word phrases
>   phraseStream = contextData.allPhrases.stream()
>   .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
> fieldScore > 0.0D));
> } else {
>   // include single word phrases, which return a constant score of 0.0
>   phraseStream = contextData.allPhrases.stream()
>   .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
> fieldScore >= 0.0D));
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12952) TFIDF scorer uses max docs instead of num docs when using Edismax

2018-12-11 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16717253#comment-16717253
 ] 

mosh commented on SOLR-12952:
-

[~elyograg],
 {quote}If you use TLOG/PULL replica types (new in 7.x) then all replicas will 
be absolutely identical, and this can't happen.{quote}
Thanks a lot,
we have changed our replication type to TLOG/,
and noticed more consistent behaviour, as expected.
Oddly enough,
around once a week,
we notice a small difference in the maxDoc values between our 2 
replicas(replication factor 2),
which we manually deal with via index optimize.
Is this expected or abnormal behaviour?
If so,
should we file a bug?

Thanks in advance.

> TFIDF scorer uses max docs instead of num docs when using Edismax
> -
>
> Key: SOLR-12952
> URL: https://issues.apache.org/jira/browse/SOLR-12952
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>
> I have recently noticed some odd behavior while using the edismax query 
> parser.
>  The scores returned by documents seem to be affected by deleted documents, 
> which have yet to be merged and completely removed from the index.
>  This causes different replicas to return different scores for the same query.
>  Is this a bug, or am I missing something?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5211) updating parent as childless makes old children orphans

2018-11-13 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16685263#comment-16685263
 ] 

mosh commented on SOLR-5211:


{quote}The absence of _root_ on a non-nested doc has been a problem while 
working on SOLR-12638.
{quote}
Yes, this has been a major setback.
 A rename could be done, what did you have in mind though?
 Is there any scenario where differentiating between the new and old schema 
might be beneficial?
 Nonetheless, currently,
 the approval of this ticket might prove to be a blocker for SOLR-12638.

> updating parent as childless makes old children orphans
> ---
>
> Key: SOLR-5211
> URL: https://issues.apache.org/jira/browse/SOLR-5211
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.5, 6.0
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
> Attachments: SOLR-5211.patch, SOLR-5211.patch, SOLR-5211.patch
>
>
> if I have parent with children in the index, I can send update omitting 
> children. as a result old children become orphaned. 
> I suppose separate \_root_ fields makes much trouble. I propose to extend 
> notion of uniqueKey, and let it spans across blocks that makes updates 
> unambiguous.  
> WDYT? Do you like to see a test proves this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5211) updating parent as childless makes old children orphans

2018-11-15 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16688241#comment-16688241
 ] 

mosh commented on SOLR-5211:


[~osavrasov],
Are you planning on continuing your work on this feature?

> updating parent as childless makes old children orphans
> ---
>
> Key: SOLR-5211
> URL: https://issues.apache.org/jira/browse/SOLR-5211
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.5, 6.0
>Reporter: Mikhail Khludnev
>Assignee: David Smiley
>Priority: Blocker
> Fix For: master (8.0)
>
> Attachments: SOLR-5211.patch, SOLR-5211.patch, SOLR-5211.patch
>
>
> if I have parent with children in the index, I can send update omitting 
> children. as a result old children become orphaned. 
> I suppose separate \_root_ fields makes much trouble. I propose to extend 
> notion of uniqueKey, and let it spans across blocks that makes updates 
> unambiguous.  
> WDYT? Do you like to see a test proves this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12519) Support Deeply Nested Docs In Child Documents Transformer

2018-07-28 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16561031#comment-16561031
 ] 

mosh edited comment on SOLR-12519 at 7/29/18 6:21 AM:
--

It has dawned upon me that we do not get matching children's descendant 
documents when there is a child Filter for a specific field.
 I was thinking perhaps we should use the paths in the filter, rewind to the 
document proceeding the previous parent, and check each document if its path 
contains the filter. That way we can add the matching child documents 
descendants.
 WDYT?


was (Author: moshebla):
It has dawned upon me that we do not get matching children's descendant 
documents when there is a child Filter for a specific field.
I was thinking perhaps we should use the paths in the filter, rewind to the 
document proceeding the previous parent, and check each document if its path 
conatins the filter. That way we can add the matching child documents 
descendants.
WDYT?

> Support Deeply Nested Docs In Child Documents Transformer
> -
>
> Key: SOLR-12519
> URL: https://issues.apache.org/jira/browse/SOLR-12519
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12519-no-commit.patch
>
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> As discussed in SOLR-12298, to make use of the meta-data fields in 
> SOLR-12441, there needs to be a smarter child document transformer, which 
> provides the ability to rebuild the original nested documents' structure.
>  In addition, I also propose the transformer will also have the ability to 
> bring only some of the original hierarchy, to prevent unnecessary block join 
> queries. e.g.
> {code}  {"a": "b", "c": [ {"e": "f"}, {"e": "g"} , {"h": "i"} ]} {code}
>  Incase my query is for all the children of "a:b", which contain the key "e" 
> in them, the query will be broken in to two parts:
>  1. The parent query "a:b"
>  2. The child query "e:*".
> If the only children flag is on, the transformer will return the following 
> documents:
>  {code}[ {"e": "f"}, {"e": "g"} ]{code}
> In case the flag was not turned on(perhaps the default state), the whole 
> document hierarchy will be returned, containing only the matching children:
> {code}{"a": "b", "c": [ {"e": "f"}, {"e": "g"} ]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12519) Support Deeply Nested Docs In Child Documents Transformer

2018-07-28 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16561031#comment-16561031
 ] 

mosh commented on SOLR-12519:
-

It has dawned upon me that we do not get matching children's descendant 
documents when there is a child Filter for a specific field.
I was thinking perhaps we should use the paths in the filter, rewind to the 
document proceeding the previous parent, and check each document if its path 
conatins the filter. That way we can add the matching child documents 
descendants.
WDYT?

> Support Deeply Nested Docs In Child Documents Transformer
> -
>
> Key: SOLR-12519
> URL: https://issues.apache.org/jira/browse/SOLR-12519
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12519-no-commit.patch
>
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> As discussed in SOLR-12298, to make use of the meta-data fields in 
> SOLR-12441, there needs to be a smarter child document transformer, which 
> provides the ability to rebuild the original nested documents' structure.
>  In addition, I also propose the transformer will also have the ability to 
> bring only some of the original hierarchy, to prevent unnecessary block join 
> queries. e.g.
> {code}  {"a": "b", "c": [ {"e": "f"}, {"e": "g"} , {"h": "i"} ]} {code}
>  Incase my query is for all the children of "a:b", which contain the key "e" 
> in them, the query will be broken in to two parts:
>  1. The parent query "a:b"
>  2. The child query "e:*".
> If the only children flag is on, the transformer will return the following 
> documents:
>  {code}[ {"e": "f"}, {"e": "g"} ]{code}
> In case the flag was not turned on(perhaps the default state), the whole 
> document hierarchy will be returned, containing only the matching children:
> {code}{"a": "b", "c": [ {"e": "f"}, {"e": "g"} ]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12519) Support Deeply Nested Docs In Child Documents Transformer

2018-08-01 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565139#comment-16565139
 ] 

mosh commented on SOLR-12519:
-

I'm not sure whether this belongs here, but I have been toying with the thought 
of using this transformer in conjunction with NestedUpdateProcessor and 
AtomicUpdate to allow SOLR to completely re-index the entire nested structure. 
Would this be acceptable, or would the performance costs be too high? This is 
just a thought, I am still thinking about implementation details. Hopefully I 
will be able to post a more concrete proposal soon.

> Support Deeply Nested Docs In Child Documents Transformer
> -
>
> Key: SOLR-12519
> URL: https://issues.apache.org/jira/browse/SOLR-12519
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12519-no-commit.patch
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> As discussed in SOLR-12298, to make use of the meta-data fields in 
> SOLR-12441, there needs to be a smarter child document transformer, which 
> provides the ability to rebuild the original nested documents' structure.
>  In addition, I also propose the transformer will also have the ability to 
> bring only some of the original hierarchy, to prevent unnecessary block join 
> queries. e.g.
> {code}  {"a": "b", "c": [ {"e": "f"}, {"e": "g"} , {"h": "i"} ]} {code}
>  Incase my query is for all the children of "a:b", which contain the key "e" 
> in them, the query will be broken in to two parts:
>  1. The parent query "a:b"
>  2. The child query "e:*".
> If the only children flag is on, the transformer will return the following 
> documents:
>  {code}[ {"e": "f"}, {"e": "g"} ]{code}
> In case the flag was not turned on(perhaps the default state), the whole 
> document hierarchy will be returned, containing only the matching children:
> {code}{"a": "b", "c": [ {"e": "f"}, {"e": "g"} ]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12586) Replace use of Joda Time with Java 8 java.time

2018-08-01 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565705#comment-16565705
 ] 

mosh commented on SOLR-12586:
-

Hey,
just opened a pull request,
fingers crossed that I changed all the needed config files.
Since I'm quite new to Solr I hope I did not make too much of a mess.

> Replace use of Joda Time with Java 8 java.time
> --
>
> Key: SOLR-12586
> URL: https://issues.apache.org/jira/browse/SOLR-12586
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We're using Joda Time, a dependency in a couple places.  Now that we are on 
> Java 8, we ought to drop the dependency, using the equivalent java.time 
> package instead.  As I understand it, Joda time more or less was incorporated 
> to Java as java.time with some fairly minor differences.
> Usages:
>  * ConfigSetService
>  * ParseDateFieldUpdateProcessorFactory
> And some related tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Deleted] (SOLR-12586) Replace use of Joda Time with Java 8 java.time

2018-08-01 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-12586:

Comment: was deleted

(was: Hey,
just opened a pull request,
fingers crossed that I changed all the needed config files.
Since I'm quite new to Solr I hope I did not make too much of a mess.)

> Replace use of Joda Time with Java 8 java.time
> --
>
> Key: SOLR-12586
> URL: https://issues.apache.org/jira/browse/SOLR-12586
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We're using Joda Time, a dependency in a couple places.  Now that we are on 
> Java 8, we ought to drop the dependency, using the equivalent java.time 
> package instead.  As I understand it, Joda time more or less was incorporated 
> to Java as java.time with some fairly minor differences.
> Usages:
>  * ConfigSetService
>  * ParseDateFieldUpdateProcessorFactory
> And some related tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12485) Xml loader should save the relationship of children

2018-08-02 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16566999#comment-16566999
 ] 

mosh commented on SOLR-12485:
-

Hey,
just opened a pull request with a basic test which adds labelled children in an 
xml document.

> Xml loader should save the relationship of children
> ---
>
> Key: SOLR-12485
> URL: https://issues.apache.org/jira/browse/SOLR-12485
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Once SolrInputDocument supports labeled child documents, XmlLoader should add 
> the child document to the map while saving its key name, to maintain the 
> child's relationship to its parent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12209) add Paging Streaming Expression

2018-08-05 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569485#comment-16569485
 ] 

mosh commented on SOLR-12209:
-

Hey [~joel.bernstein],
was wondering whether you could have another peek at the patch.
Thanks in advance.

> add Paging Streaming Expression
> ---
>
> Key: SOLR-12209
> URL: https://issues.apache.org/jira/browse/SOLR-12209
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12209.patch, SOLR-12209.patch, SOLR-12209.patch
>
>
> Currently the closest streaming expression that allows some sort of 
> pagination is top.
> I propose we add a new streaming expression, which is based on the 
> RankedStream class to add offset to the stream. currently it can only be done 
> in code by reading the stream until the desired offset is reached.
> The new expression will be used as such:
> {{paging(rows=3, search(collection1, q="*:*", qt="/export", 
> fl="id,a_s,a_i,a_f", sort="a_f desc, a_i desc"), sort="a_f asc, a_i asc", 
> start=100)}}
> {{this will offset the returned stream by 100 documents}}
>  
> [~joel.bernstein] what to you think?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12361) Change _childDocuments to Map

2018-06-09 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507255#comment-16507255
 ] 

mosh edited comment on SOLR-12361 at 6/10/18 5:36 AM:
--

{quote}What do you think mosh? How should I refer to you in CHANGES.txt?
{quote}
I like it, this patch looks ready.
 You can simply refer to me as Moshe Bla.


was (Author: moshebla):
{quote}What do you think mosh? How should I refer to you in CHANGES.txt?
{quote}
I like it, this patch looks ready.
 You can simply refer to me as MosheBla.

> Change _childDocuments to Map
> -
>
> Key: SOLR-12361
> URL: https://issues.apache.org/jira/browse/SOLR-12361
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12361.patch, SOLR-12361.patch, SOLR-12361.patch, 
> SOLR-12361.patch
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> During the discussion on SOLR-12298, there was a proposal to change 
> _childDocuments in SolrDocumentBase to a Map, to incorporate the relationship 
> between the parent and its child documents.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12361) Change _childDocuments to Map

2018-06-09 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507255#comment-16507255
 ] 

mosh commented on SOLR-12361:
-

{quote}What do you think mosh? How should I refer to you in CHANGES.txt?
{quote}
I like it, this patch looks ready.
 You can simply refer to me as MosheBla.

> Change _childDocuments to Map
> -
>
> Key: SOLR-12361
> URL: https://issues.apache.org/jira/browse/SOLR-12361
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12361.patch, SOLR-12361.patch, SOLR-12361.patch, 
> SOLR-12361.patch
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> During the discussion on SOLR-12298, there was a proposal to change 
> _childDocuments in SolrDocumentBase to a Map, to incorporate the relationship 
> between the parent and its child documents.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12209) add Paging Streaming Expression

2018-06-12 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509484#comment-16509484
 ] 

mosh commented on SOLR-12209:
-

Sorry for the nag,
but is there anything new to report [~joel.bernstein]?

> add Paging Streaming Expression
> ---
>
> Key: SOLR-12209
> URL: https://issues.apache.org/jira/browse/SOLR-12209
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12209.patch, SOLR-12209.patch, SOLR-12209.patch
>
>
> Currently the closest streaming expression that allows some sort of 
> pagination is top.
> I propose we add a new streaming expression, which is based on the 
> RankedStream class to add offset to the stream. currently it can only be done 
> in code by reading the stream until the desired offset is reached.
> The new expression will be used as such:
> {{paging(rows=3, search(collection1, q="*:*", qt="/export", 
> fl="id,a_s,a_i,a_f", sort="a_f desc, a_i desc"), sort="a_f asc, a_i asc", 
> start=100)}}
> {{this will offset the returned stream by 100 documents}}
>  
> [~joel.bernstein] what to you think?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12362) JSON loader should save the relationship of children

2018-06-13 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-12362:

Summary: JSON loader should save the relationship of children  (was: JSON 
and XML loaders should save the relationship of children)

> JSON loader should save the relationship of children
> 
>
> Key: SOLR-12362
> URL: https://issues.apache.org/jira/browse/SOLR-12362
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Once _childDocuments in SolrInputDocument is changed to a Map, the JsonLoader 
> and XmlLoader should add the child document to the map while saving its key 
> name, to maintain the child's relationship to its parent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12362) JSON loader should save the relationship of children

2018-06-13 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-12362:

Description: Once _childDocuments in SolrInputDocument is changed to a Map, 
the JsonLoader should add the child document to the map while saving its key 
name, to maintain the child's relationship to its parent.  (was: Once 
_childDocuments in SolrInputDocument is changed to a Map, the JsonLoader and 
XmlLoader should add the child document to the map while saving its key name, 
to maintain the child's relationship to its parent.)

> JSON loader should save the relationship of children
> 
>
> Key: SOLR-12362
> URL: https://issues.apache.org/jira/browse/SOLR-12362
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Once _childDocuments in SolrInputDocument is changed to a Map, the JsonLoader 
> should add the child document to the map while saving its key name, to 
> maintain the child's relationship to its parent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12362) JSON loader should save the relationship of children

2018-06-13 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511968#comment-16511968
 ] 

mosh commented on SOLR-12362:
-

I think it's a better idea to separate the two. I'll open a new sub-task.

> JSON loader should save the relationship of children
> 
>
> Key: SOLR-12362
> URL: https://issues.apache.org/jira/browse/SOLR-12362
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Once _childDocuments in SolrInputDocument is changed to a Map, the JsonLoader 
> should add the child document to the map while saving its key name, to 
> maintain the child's relationship to its parent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-12485) Xml loader should save the relationship of children

2018-06-13 Thread mosh (JIRA)
mosh created SOLR-12485:
---

 Summary: Xml loader should save the relationship of children
 Key: SOLR-12485
 URL: https://issues.apache.org/jira/browse/SOLR-12485
 Project: Solr
  Issue Type: Sub-task
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: mosh


Once _childDocuments in SolrInputDocument is changed to a Map, XmlLoader should 
add the child document to the map while saving its key name, to maintain the 
child's relationship to its parent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12362) JSON loader should save the relationship of children

2018-06-13 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-12362:

Description: Once _childDocuments in SolrInputDocument is changed to a Map, 
JsonLoader should add the child document to the map while saving its key name, 
to maintain the child's relationship to its parent.  (was: Once _childDocuments 
in SolrInputDocument is changed to a Map, the JsonLoader should add the child 
document to the map while saving its key name, to maintain the child's 
relationship to its parent.)

> JSON loader should save the relationship of children
> 
>
> Key: SOLR-12362
> URL: https://issues.apache.org/jira/browse/SOLR-12362
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Once _childDocuments in SolrInputDocument is changed to a Map, JsonLoader 
> should add the child document to the map while saving its key name, to 
> maintain the child's relationship to its parent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12362) JSON loader should save the relationship of children

2018-06-13 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511968#comment-16511968
 ] 

mosh edited comment on SOLR-12362 at 6/14/18 4:20 AM:
--

I think it's a better idea to separate the two. I'll open a new sub-task.
It's easier to prioritize smaller packages.


was (Author: moshebla):
I think it's a better idea to separate the two. I'll open a new sub-task.

> JSON loader should save the relationship of children
> 
>
> Key: SOLR-12362
> URL: https://issues.apache.org/jira/browse/SOLR-12362
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Once _childDocuments in SolrInputDocument is changed to a Map, JsonLoader 
> should add the child document to the map while saving its key name, to 
> maintain the child's relationship to its parent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12485) Xml loader should save the relationship of children

2018-06-13 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-12485:

Description: Once SolrInputDocument supports labeled child documents, 
XmlLoader should add the child document to the map while saving its key name, 
to maintain the child's relationship to its parent.  (was: Once _childDocuments 
in SolrInputDocument is changed to a Map, XmlLoader should add the child 
document to the map while saving its key name, to maintain the child's 
relationship to its parent.)

> Xml loader should save the relationship of children
> ---
>
> Key: SOLR-12485
> URL: https://issues.apache.org/jira/browse/SOLR-12485
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>
> Once SolrInputDocument supports labeled child documents, XmlLoader should add 
> the child document to the map while saving its key name, to maintain the 
> child's relationship to its parent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5211) updating parent as childless makes old children orphans

2018-11-21 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16695587#comment-16695587
 ] 

mosh commented on SOLR-5211:


{quote}[junit4] - org.apache.solr.schema.MultiTermTest.testQueryCopiedToMulti
 [junit4] - org.apache.solr.schema.MultiTermTest.testDefaultCopiedToMulti
{quote}
I could not reproduce the failures in these tests.
 Were the exceptions reproducible using a particular seed?
 I ran the test suite using this command:
{code}
ant clean test -Dtestcase=MultiTermTest{code}

> updating parent as childless makes old children orphans
> ---
>
> Key: SOLR-5211
> URL: https://issues.apache.org/jira/browse/SOLR-5211
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.5, 6.0
>Reporter: Mikhail Khludnev
>Assignee: David Smiley
>Priority: Blocker
> Fix For: master (8.0)
>
> Attachments: SOLR-5211.patch, SOLR-5211.patch, SOLR-5211.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> if I have parent with children in the index, I can send update omitting 
> children. as a result old children become orphaned. 
> I suppose separate \_root_ fields makes much trouble. I propose to extend 
> notion of uniqueKey, and let it spans across blocks that makes updates 
> unambiguous.  
> WDYT? Do you like to see a test proves this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9379) Need cloud & RTG testing of [child] transformer

2018-11-21 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16695593#comment-16695593
 ] 

mosh commented on SOLR-9379:


SOLR-12638 uses RTG and ChildDocTransformer in its tests, including its cloud 
based tests.

> Need cloud & RTG testing of [child] transformer
> ---
>
> Key: SOLR-9379
> URL: https://issues.apache.org/jira/browse/SOLR-9379
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Priority: Major
>
> Spun off of SOLR-9314...
> We don't seem to have any cloud based tests of the {{\[child\]}} 
> DocTransformer, or any tests of using it with RTG (on either committed or 
> uncommited docs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5211) updating parent as childless makes old children orphans

2018-11-24 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-5211:
---
Attachment: SOLR-5211.patch

> updating parent as childless makes old children orphans
> ---
>
> Key: SOLR-5211
> URL: https://issues.apache.org/jira/browse/SOLR-5211
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.5, 6.0
>Reporter: Mikhail Khludnev
>Assignee: David Smiley
>Priority: Blocker
> Fix For: master (8.0)
>
> Attachments: SOLR-5211.patch, SOLR-5211.patch, SOLR-5211.patch, 
> SOLR-5211.patch
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> if I have parent with children in the index, I can send update omitting 
> children. as a result old children become orphaned. 
> I suppose separate \_root_ fields makes much trouble. I propose to extend 
> notion of uniqueKey, and let it spans across blocks that makes updates 
> unambiguous.  
> WDYT? Do you like to see a test proves this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5211) updating parent as childless makes old children orphans

2018-11-24 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-5211:
---
Attachment: (was: SOLR-5211.patch)

> updating parent as childless makes old children orphans
> ---
>
> Key: SOLR-5211
> URL: https://issues.apache.org/jira/browse/SOLR-5211
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.5, 6.0
>Reporter: Mikhail Khludnev
>Assignee: David Smiley
>Priority: Blocker
> Fix For: master (8.0)
>
> Attachments: SOLR-5211.patch, SOLR-5211.patch, SOLR-5211.patch
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> if I have parent with children in the index, I can send update omitting 
> children. as a result old children become orphaned. 
> I suppose separate \_root_ fields makes much trouble. I propose to extend 
> notion of uniqueKey, and let it spans across blocks that makes updates 
> unambiguous.  
> WDYT? Do you like to see a test proves this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5211) updating parent as childless makes old children orphans

2018-11-25 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-5211:
---
Attachment: SOLR-5211.patch

> updating parent as childless makes old children orphans
> ---
>
> Key: SOLR-5211
> URL: https://issues.apache.org/jira/browse/SOLR-5211
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.5, 6.0
>Reporter: Mikhail Khludnev
>Assignee: David Smiley
>Priority: Blocker
> Fix For: master (8.0)
>
> Attachments: SOLR-5211.patch, SOLR-5211.patch, SOLR-5211.patch, 
> SOLR-5211.patch
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> if I have parent with children in the index, I can send update omitting 
> children. as a result old children become orphaned. 
> I suppose separate \_root_ fields makes much trouble. I propose to extend 
> notion of uniqueKey, and let it spans across blocks that makes updates 
> unambiguous.  
> WDYT? Do you like to see a test proves this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5211) updating parent as childless makes old children orphans

2018-11-25 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16698186#comment-16698186
 ] 

mosh commented on SOLR-5211:


Changed assertions in the following test suites within the newly uploaded patch 
file(also pushed to PR):
# TestRandomFlRTGCloud
# AtomicUpdatesTest
# RootFieldTest

> updating parent as childless makes old children orphans
> ---
>
> Key: SOLR-5211
> URL: https://issues.apache.org/jira/browse/SOLR-5211
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.5, 6.0
>Reporter: Mikhail Khludnev
>Assignee: David Smiley
>Priority: Blocker
> Fix For: master (8.0)
>
> Attachments: SOLR-5211.patch, SOLR-5211.patch, SOLR-5211.patch, 
> SOLR-5211.patch
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> if I have parent with children in the index, I can send update omitting 
> children. as a result old children become orphaned. 
> I suppose separate \_root_ fields makes much trouble. I propose to extend 
> notion of uniqueKey, and let it spans across blocks that makes updates 
> unambiguous.  
> WDYT? Do you like to see a test proves this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5211) updating parent as childless makes old children orphans

2018-11-25 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16698466#comment-16698466
 ] 

mosh commented on SOLR-5211:


I uploaded another patch in which RootFieldTest uses schema.xml and 
schema15.xml(instead of schema11),

since the "name" field is not defined in schema11.xml.

> updating parent as childless makes old children orphans
> ---
>
> Key: SOLR-5211
> URL: https://issues.apache.org/jira/browse/SOLR-5211
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.5, 6.0
>Reporter: Mikhail Khludnev
>Assignee: David Smiley
>Priority: Blocker
> Fix For: master (8.0)
>
> Attachments: SOLR-5211.patch, SOLR-5211.patch, SOLR-5211.patch, 
> SOLR-5211.patch, SOLR-5211.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> if I have parent with children in the index, I can send update omitting 
> children. as a result old children become orphaned. 
> I suppose separate \_root_ fields makes much trouble. I propose to extend 
> notion of uniqueKey, and let it spans across blocks that makes updates 
> unambiguous.  
> WDYT? Do you like to see a test proves this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5211) updating parent as childless makes old children orphans

2018-11-25 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-5211:
---
Attachment: SOLR-5211.patch

> updating parent as childless makes old children orphans
> ---
>
> Key: SOLR-5211
> URL: https://issues.apache.org/jira/browse/SOLR-5211
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.5, 6.0
>Reporter: Mikhail Khludnev
>Assignee: David Smiley
>Priority: Blocker
> Fix For: master (8.0)
>
> Attachments: SOLR-5211.patch, SOLR-5211.patch, SOLR-5211.patch, 
> SOLR-5211.patch, SOLR-5211.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> if I have parent with children in the index, I can send update omitting 
> children. as a result old children become orphaned. 
> I suppose separate \_root_ fields makes much trouble. I propose to extend 
> notion of uniqueKey, and let it spans across blocks that makes updates 
> unambiguous.  
> WDYT? Do you like to see a test proves this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5211) updating parent as childless makes old children orphans

2018-11-25 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-5211:
---
Attachment: SOLR-5211.patch

> updating parent as childless makes old children orphans
> ---
>
> Key: SOLR-5211
> URL: https://issues.apache.org/jira/browse/SOLR-5211
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.5, 6.0
>Reporter: Mikhail Khludnev
>Assignee: David Smiley
>Priority: Blocker
> Fix For: master (8.0)
>
> Attachments: SOLR-5211.patch, SOLR-5211.patch, SOLR-5211.patch, 
> SOLR-5211.patch, SOLR-5211.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> if I have parent with children in the index, I can send update omitting 
> children. as a result old children become orphaned. 
> I suppose separate \_root_ fields makes much trouble. I propose to extend 
> notion of uniqueKey, and let it spans across blocks that makes updates 
> unambiguous.  
> WDYT? Do you like to see a test proves this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5211) updating parent as childless makes old children orphans

2018-11-25 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-5211:
---
Attachment: (was: SOLR-5211.patch)

> updating parent as childless makes old children orphans
> ---
>
> Key: SOLR-5211
> URL: https://issues.apache.org/jira/browse/SOLR-5211
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.5, 6.0
>Reporter: Mikhail Khludnev
>Assignee: David Smiley
>Priority: Blocker
> Fix For: master (8.0)
>
> Attachments: SOLR-5211.patch, SOLR-5211.patch, SOLR-5211.patch, 
> SOLR-5211.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> if I have parent with children in the index, I can send update omitting 
> children. as a result old children become orphaned. 
> I suppose separate \_root_ fields makes much trouble. I propose to extend 
> notion of uniqueKey, and let it spans across blocks that makes updates 
> unambiguous.  
> WDYT? Do you like to see a test proves this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5211) updating parent as childless makes old children orphans

2018-11-25 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16698466#comment-16698466
 ] 

mosh edited comment on SOLR-5211 at 11/26/18 5:31 AM:
--

I uploaded another patch in which RootFieldTest uses schema.xml and 
schema15.xml(instead of schema11),

since the "name" field, which is used throughout this particular test suite, is 
not defined in schema11.xml.


was (Author: moshebla):
I uploaded another patch in which RootFieldTest uses schema.xml and 
schema15.xml(instead of schema11),

since the "name" field is not defined in schema11.xml.

> updating parent as childless makes old children orphans
> ---
>
> Key: SOLR-5211
> URL: https://issues.apache.org/jira/browse/SOLR-5211
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.5, 6.0
>Reporter: Mikhail Khludnev
>Assignee: David Smiley
>Priority: Blocker
> Fix For: master (8.0)
>
> Attachments: SOLR-5211.patch, SOLR-5211.patch, SOLR-5211.patch, 
> SOLR-5211.patch, SOLR-5211.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> if I have parent with children in the index, I can send update omitting 
> children. as a result old children become orphaned. 
> I suppose separate \_root_ fields makes much trouble. I propose to extend 
> notion of uniqueKey, and let it spans across blocks that makes updates 
> unambiguous.  
> WDYT? Do you like to see a test proves this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5211) updating parent as childless makes old children orphans

2018-11-25 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-5211:
---
Attachment: SOLR-5211.patch

> updating parent as childless makes old children orphans
> ---
>
> Key: SOLR-5211
> URL: https://issues.apache.org/jira/browse/SOLR-5211
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.5, 6.0
>Reporter: Mikhail Khludnev
>Assignee: David Smiley
>Priority: Blocker
> Fix For: master (8.0)
>
> Attachments: SOLR-5211.patch, SOLR-5211.patch, SOLR-5211.patch, 
> SOLR-5211.patch, SOLR-5211.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> if I have parent with children in the index, I can send update omitting 
> children. as a result old children become orphaned. 
> I suppose separate \_root_ fields makes much trouble. I propose to extend 
> notion of uniqueKey, and let it spans across blocks that makes updates 
> unambiguous.  
> WDYT? Do you like to see a test proves this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5211) updating parent as childless makes old children orphans

2018-11-25 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-5211:
---
Attachment: (was: SOLR-5211.patch)

> updating parent as childless makes old children orphans
> ---
>
> Key: SOLR-5211
> URL: https://issues.apache.org/jira/browse/SOLR-5211
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.5, 6.0
>Reporter: Mikhail Khludnev
>Assignee: David Smiley
>Priority: Blocker
> Fix For: master (8.0)
>
> Attachments: SOLR-5211.patch, SOLR-5211.patch, SOLR-5211.patch, 
> SOLR-5211.patch, SOLR-5211.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> if I have parent with children in the index, I can send update omitting 
> children. as a result old children become orphaned. 
> I suppose separate \_root_ fields makes much trouble. I propose to extend 
> notion of uniqueKey, and let it spans across blocks that makes updates 
> unambiguous.  
> WDYT? Do you like to see a test proves this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12209) add Paging Streaming Expression

2018-11-27 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700338#comment-16700338
 ] 

mosh commented on SOLR-12209:
-

ping [~joel.bernstein]?

> add Paging Streaming Expression
> ---
>
> Key: SOLR-12209
> URL: https://issues.apache.org/jira/browse/SOLR-12209
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12209.patch, SOLR-12209.patch, SOLR-12209.patch
>
>
> Currently the closest streaming expression that allows some sort of 
> pagination is top.
> I propose we add a new streaming expression, which is based on the 
> RankedStream class to add offset to the stream. currently it can only be done 
> in code by reading the stream until the desired offset is reached.
> The new expression will be used as such:
> {{paging(rows=3, search(collection1, q="*:*", qt="/export", 
> fl="id,a_s,a_i,a_f", sort="a_f desc, a_i desc"), sort="a_f asc, a_i asc", 
> start=100)}}
> {{this will offset the returned stream by 100 documents}}
>  
> [~joel.bernstein] what to you think?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5211) updating parent as childless makes old children orphans

2018-11-27 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-5211:
---
Attachment: (was: SOLR-5211.patch)

> updating parent as childless makes old children orphans
> ---
>
> Key: SOLR-5211
> URL: https://issues.apache.org/jira/browse/SOLR-5211
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.5, 6.0
>Reporter: Mikhail Khludnev
>Assignee: David Smiley
>Priority: Blocker
> Fix For: master (8.0)
>
> Attachments: SOLR-5211.patch, SOLR-5211.patch, SOLR-5211.patch, 
> SOLR-5211.patch, SOLR-5211.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> if I have parent with children in the index, I can send update omitting 
> children. as a result old children become orphaned. 
> I suppose separate \_root_ fields makes much trouble. I propose to extend 
> notion of uniqueKey, and let it spans across blocks that makes updates 
> unambiguous.  
> WDYT? Do you like to see a test proves this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5211) updating parent as childless makes old children orphans

2018-11-27 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700438#comment-16700438
 ] 

mosh commented on SOLR-5211:


[~dsmiley],

I fixed RootFieldTest, and double checked that it passes with both 
useRootSchema as true and false.
I did not see when I pulled the latest master some of the schema names had 
changed.
I uploaded a new patch.

> updating parent as childless makes old children orphans
> ---
>
> Key: SOLR-5211
> URL: https://issues.apache.org/jira/browse/SOLR-5211
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.5, 6.0
>Reporter: Mikhail Khludnev
>Assignee: David Smiley
>Priority: Blocker
> Fix For: master (8.0)
>
> Attachments: SOLR-5211.patch, SOLR-5211.patch, SOLR-5211.patch, 
> SOLR-5211.patch, SOLR-5211.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> if I have parent with children in the index, I can send update omitting 
> children. as a result old children become orphaned. 
> I suppose separate \_root_ fields makes much trouble. I propose to extend 
> notion of uniqueKey, and let it spans across blocks that makes updates 
> unambiguous.  
> WDYT? Do you like to see a test proves this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5211) updating parent as childless makes old children orphans

2018-11-27 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-5211:
---
Attachment: SOLR-5211.patch

> updating parent as childless makes old children orphans
> ---
>
> Key: SOLR-5211
> URL: https://issues.apache.org/jira/browse/SOLR-5211
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.5, 6.0
>Reporter: Mikhail Khludnev
>Assignee: David Smiley
>Priority: Blocker
> Fix For: master (8.0)
>
> Attachments: SOLR-5211.patch, SOLR-5211.patch, SOLR-5211.patch, 
> SOLR-5211.patch, SOLR-5211.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> if I have parent with children in the index, I can send update omitting 
> children. as a result old children become orphaned. 
> I suppose separate \_root_ fields makes much trouble. I propose to extend 
> notion of uniqueKey, and let it spans across blocks that makes updates 
> unambiguous.  
> WDYT? Do you like to see a test proves this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12209) add Paging Streaming Expression

2018-11-29 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16703115#comment-16703115
 ] 

mosh commented on SOLR-12209:
-

Uploaded a new patch file for the latest master.

> add Paging Streaming Expression
> ---
>
> Key: SOLR-12209
> URL: https://issues.apache.org/jira/browse/SOLR-12209
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12209.patch, SOLR-12209.patch, SOLR-12209.patch, 
> SOLR-12209.patch
>
>
> Currently the closest streaming expression that allows some sort of 
> pagination is top.
> I propose we add a new streaming expression, which is based on the 
> RankedStream class to add offset to the stream. currently it can only be done 
> in code by reading the stream until the desired offset is reached.
> The new expression will be used as such:
> {{paging(rows=3, search(collection1, q="*:*", qt="/export", 
> fl="id,a_s,a_i,a_f", sort="a_f desc, a_i desc"), sort="a_f asc, a_i asc", 
> start=100)}}
> {{this will offset the returned stream by 100 documents}}
>  
> [~joel.bernstein] what to you think?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12209) add Paging Streaming Expression

2018-11-29 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-12209:

Attachment: SOLR-12209.patch

> add Paging Streaming Expression
> ---
>
> Key: SOLR-12209
> URL: https://issues.apache.org/jira/browse/SOLR-12209
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12209.patch, SOLR-12209.patch, SOLR-12209.patch, 
> SOLR-12209.patch
>
>
> Currently the closest streaming expression that allows some sort of 
> pagination is top.
> I propose we add a new streaming expression, which is based on the 
> RankedStream class to add offset to the stream. currently it can only be done 
> in code by reading the stream until the desired offset is reached.
> The new expression will be used as such:
> {{paging(rows=3, search(collection1, q="*:*", qt="/export", 
> fl="id,a_s,a_i,a_f", sort="a_f desc, a_i desc"), sort="a_f asc, a_i asc", 
> start=100)}}
> {{this will offset the returned stream by 100 documents}}
>  
> [~joel.bernstein] what to you think?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12638) Support atomic updates of nested/child documents for nested-enabled schema

2018-10-20 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658100#comment-16658100
 ] 

mosh commented on SOLR-12638:
-

We have been testing this feature in-house, and have come across a problem 
regarding sharding when a document that is being updated is indexed inside a 
block,
and the collection being used has more than a single shard.
Right now when updating a document, an Id for the document has to be provided, 
in addition to the field which is being updated.
When the document that is being updated is inside a block, the update can be 
routed to the wrong shard, since the shard in which it is indexed was 
calculated according to the root document's Id. ex.
When this document:
{code:javascript} {"id": "1", "children": [{"id": "20", {"string_s": "ex"}]} 
{code}
Is being updated:
{code:javascript}{"id": "20", "grand_children": {"add": [{"id": "21", 
"string_s": "ex"}]}}{code}
The update can be routed to another shard, where the block does not exist, 
causing the update to be indexed to a different shard,
splitting our block in two pieces, existing in two separate shards.

Skimming through DistributedUpdateProcessor, I have suggestions for three 
different solutions.

# If the schema is nested, the the routing method(in 
DistributedUpdateProcessor) can check if the document exists in any 
shards(lookup by id),
find out whether it is inside a block(_root_) and route the update using the 
hash of _root_
# Very similar to the previous method, only the _root_ lookup is done if the 
document which is being updated is not found in the shard it was routed to, 
asking other shards if the document exists inside a block, re-routing the 
update command.
# The user provides the _root_, which is not the ideal case when it comes to 
user friendliness.

IMO the third option should be the last result, since it is the least user 
friendly out of the three options.
My only concern regarding the first two options are the performance hit it 
might cause.

WDYT [~dsmiley], [~caomanhdat]?

> Support atomic updates of nested/child documents for nested-enabled schema
> --
>
> Key: SOLR-12638
> URL: https://issues.apache.org/jira/browse/SOLR-12638
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12638-delete-old-block-no-commit.patch, 
> SOLR-12638-nocommit.patch
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> I have been toying with the thought of using this transformer in conjunction 
> with NestedUpdateProcessor and AtomicUpdate to allow SOLR to completely 
> re-index the entire nested structure. This is just a thought, I am still 
> thinking about implementation details. Hopefully I will be able to post a 
> more concrete proposal soon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12638) Support atomic updates of nested/child documents for nested-enabled schema

2018-10-20 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658100#comment-16658100
 ] 

mosh edited comment on SOLR-12638 at 10/21/18 6:56 AM:
---

We have been testing this feature in-house, and have come across a problem 
regarding sharding when a document that is being updated is indexed inside a 
block,
 and the collection being used has more than a single shard.
 Right now when updating a document, an Id for the document has to be provided, 
in addition to the field which is being updated.
 When the document that is being updated is inside a block, the update can be 
routed to the wrong shard, since the shard in which it is indexed was 
calculated according to the root document's Id. ex.
 When this document:
{code:javascript}
 {"id": "1", "children": [{"id": "20", {"string_s": "ex"}]} {code}
Is being updated:
{code:javascript}
{"id": "20", "grand_children": {"add": [{"id": "21", "string_s": "ex"}]}}{code}
The update can be routed to another shard, where the block does not exist, 
causing the update to be indexed to a different shard,
 splitting our block in two pieces, existing in two separate shards.

Skimming through DistributedUpdateProcessor, I have suggestions for three 
different solutions.
 # If the schema is nested, the the routing method(in 
DistributedUpdateProcessor) can check if the document exists in any 
shards(lookup by id),
 find out whether it is inside a block(_root_) and route the update using the 
hash of _root_
 # Very similar to the previous method, only the _root_ lookup is done if the 
document which is being updated is not found in the shard it was routed to, 
asking other shards if the document exists inside a block, re-routing the 
update command.
 # The user provides the _root_, which is not the ideal case when it comes to 
user friendliness.

IMO the third option should be the last result, since it is the least user 
friendly out of the three options.
 My only concern regarding the first two options are the performance hit it 
might cause.

Another concern which David has discussed is the implications on the update log.
Would ensuring DistributedUpdateProcessor is run before RunUpdateProcessor be 
of any help?
I must admit I am not very familiar with these features of Solr.

WDYT [~dsmiley], [~caomanhdat]?


was (Author: moshebla):
We have been testing this feature in-house, and have come across a problem 
regarding sharding when a document that is being updated is indexed inside a 
block,
and the collection being used has more than a single shard.
Right now when updating a document, an Id for the document has to be provided, 
in addition to the field which is being updated.
When the document that is being updated is inside a block, the update can be 
routed to the wrong shard, since the shard in which it is indexed was 
calculated according to the root document's Id. ex.
When this document:
{code:javascript} {"id": "1", "children": [{"id": "20", {"string_s": "ex"}]} 
{code}
Is being updated:
{code:javascript}{"id": "20", "grand_children": {"add": [{"id": "21", 
"string_s": "ex"}]}}{code}
The update can be routed to another shard, where the block does not exist, 
causing the update to be indexed to a different shard,
splitting our block in two pieces, existing in two separate shards.

Skimming through DistributedUpdateProcessor, I have suggestions for three 
different solutions.

# If the schema is nested, the the routing method(in 
DistributedUpdateProcessor) can check if the document exists in any 
shards(lookup by id),
find out whether it is inside a block(_root_) and route the update using the 
hash of _root_
# Very similar to the previous method, only the _root_ lookup is done if the 
document which is being updated is not found in the shard it was routed to, 
asking other shards if the document exists inside a block, re-routing the 
update command.
# The user provides the _root_, which is not the ideal case when it comes to 
user friendliness.

IMO the third option should be the last result, since it is the least user 
friendly out of the three options.
My only concern regarding the first two options are the performance hit it 
might cause.

WDYT [~dsmiley], [~caomanhdat]?

> Support atomic updates of nested/child documents for nested-enabled schema
> --
>
> Key: SOLR-12638
> URL: https://issues.apache.org/jira/browse/SOLR-12638
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12638-delete-old-block-no-commit.patch, 
> SOLR-12638-nocommit.patch
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> I have been toying with the thought of using this transf

[jira] [Created] (SOLR-12890) Vector Search in Solr (Umbrella Issue)

2018-10-21 Thread mosh (JIRA)
mosh created SOLR-12890:
---

 Summary: Vector Search in Solr (Umbrella Issue)
 Key: SOLR-12890
 URL: https://issues.apache.org/jira/browse/SOLR-12890
 Project: Solr
  Issue Type: New Feature
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: mosh


We have recently come across a need to index documents containing vectors using 
solr, and have even worked on a small POC. We used an URP to calculate the 
LSH(we chose to use the superbit algorithm, but the code is designed in a way 
the algorithm picked can be easily chagned), and stored the vector in either 
sparse or dense forms, in a binary field.
Perhaps an addition of an LSH URP in conjunction with a query parser that uses 
the same properties to calculate LSH(or maybe ktree, or some other algorithm 
all together) should be considered as a Solr feature?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12890) Vector Search in Solr (Umbrella Issue)

2018-10-21 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658118#comment-16658118
 ] 

mosh commented on SOLR-12890:
-

We have been experimenting with this new use case to query vectors inside Solr,
having run using this [POC|https://github.com/moshebla/solr-vector-scoring].
This has worked for us since we have an algorithm that runs outside of Solr, 
which generates vectors for different inputs in our data pipeline,
and sends the enriched documents to Solr for indexing.
The LSH hash is then calculated in index time, and the vector data is encoded 
to binary format in either sparse or dense form(this is configurable).
Perhaps this use case could be extended and optimized, and ultimately be 
included in Solr.

> Vector Search in Solr (Umbrella Issue)
> --
>
> Key: SOLR-12890
> URL: https://issues.apache.org/jira/browse/SOLR-12890
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>
> We have recently come across a need to index documents containing vectors 
> using solr, and have even worked on a small POC. We used an URP to calculate 
> the LSH(we chose to use the superbit algorithm, but the code is designed in a 
> way the algorithm picked can be easily chagned), and stored the vector in 
> either sparse or dense forms, in a binary field.
> Perhaps an addition of an LSH URP in conjunction with a query parser that 
> uses the same properties to calculate LSH(or maybe ktree, or some other 
> algorithm all together) should be considered as a Solr feature?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12890) Vector Search in Solr (Umbrella Issue)

2018-10-21 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658118#comment-16658118
 ] 

mosh edited comment on SOLR-12890 at 10/21/18 7:31 AM:
---

We have been experimenting with this new use case to query vectors inside Solr,
having run using this [POC|https://github.com/moshebla/solr-vector-scoring].
This has worked for us since we have an algorithm that runs outside of Solr, 
which generates vectors for different inputs in our data pipeline,
and sends the enriched documents to Solr for indexing.
The LSH hash is then calculated in index time, and the vector data is encoded 
to binary format in either sparse or dense form(this is configurable).

The query parser is passed a certain vector, and the LSH hash for the provided  
vector is then calculated and documents which contain a similar vector are 
queried. The user can then choose to run on the topNDocs a full cosine 
similarity(Or any other, provided we add different scorers), to get more 
precise scores for the results.

Hopefully this use case could be extended, optimized, and ultimately be 
included in Solr.


was (Author: moshebla):
We have been experimenting with this new use case to query vectors inside Solr,
having run using this [POC|https://github.com/moshebla/solr-vector-scoring].
This has worked for us since we have an algorithm that runs outside of Solr, 
which generates vectors for different inputs in our data pipeline,
and sends the enriched documents to Solr for indexing.
The LSH hash is then calculated in index time, and the vector data is encoded 
to binary format in either sparse or dense form(this is configurable).
Perhaps this use case could be extended and optimized, and ultimately be 
included in Solr.

> Vector Search in Solr (Umbrella Issue)
> --
>
> Key: SOLR-12890
> URL: https://issues.apache.org/jira/browse/SOLR-12890
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>
> We have recently come across a need to index documents containing vectors 
> using solr, and have even worked on a small POC. We used an URP to calculate 
> the LSH(we chose to use the superbit algorithm, but the code is designed in a 
> way the algorithm picked can be easily chagned), and stored the vector in 
> either sparse or dense forms, in a binary field.
> Perhaps an addition of an LSH URP in conjunction with a query parser that 
> uses the same properties to calculate LSH(or maybe ktree, or some other 
> algorithm all together) should be considered as a Solr feature?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12890) Vector Search in Solr (Umbrella Issue)

2018-10-21 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658118#comment-16658118
 ] 

mosh edited comment on SOLR-12890 at 10/21/18 8:09 AM:
---

We have been experimenting with this new use case to query vectors inside Solr,
using this [POC|https://github.com/moshebla/solr-vector-scoring].
This has worked for us since we have an algorithm that runs outside of Solr, 
which generates vectors for different inputs in our data pipeline,
and sends the enriched documents to Solr for indexing.
The LSH hash is then calculated in index time, and the vector data is encoded 
to binary format in either sparse or dense form(this is configurable).

The query parser is passed a certain vector, and the LSH hash for the provided  
vector is then calculated and documents which contain a similar vector are 
queried. The user can then choose to run on the topNDocs a full cosine 
similarity(Or any other, provided we add different scorers), to get more 
precise scores for the results.

Hopefully this use case could be extended, optimized, and ultimately be 
included in Solr.


was (Author: moshebla):
We have been experimenting with this new use case to query vectors inside Solr,
having run using this [POC|https://github.com/moshebla/solr-vector-scoring].
This has worked for us since we have an algorithm that runs outside of Solr, 
which generates vectors for different inputs in our data pipeline,
and sends the enriched documents to Solr for indexing.
The LSH hash is then calculated in index time, and the vector data is encoded 
to binary format in either sparse or dense form(this is configurable).

The query parser is passed a certain vector, and the LSH hash for the provided  
vector is then calculated and documents which contain a similar vector are 
queried. The user can then choose to run on the topNDocs a full cosine 
similarity(Or any other, provided we add different scorers), to get more 
precise scores for the results.

Hopefully this use case could be extended, optimized, and ultimately be 
included in Solr.

> Vector Search in Solr (Umbrella Issue)
> --
>
> Key: SOLR-12890
> URL: https://issues.apache.org/jira/browse/SOLR-12890
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>
> We have recently come across a need to index documents containing vectors 
> using solr, and have even worked on a small POC. We used an URP to calculate 
> the LSH(we chose to use the superbit algorithm, but the code is designed in a 
> way the algorithm picked can be easily chagned), and stored the vector in 
> either sparse or dense forms, in a binary field.
> Perhaps an addition of an LSH URP in conjunction with a query parser that 
> uses the same properties to calculate LSH(or maybe ktree, or some other 
> algorithm all together) should be considered as a Solr feature?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12638) Support atomic updates of nested/child documents for nested-enabled schema

2018-10-21 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658100#comment-16658100
 ] 

mosh edited comment on SOLR-12638 at 10/21/18 10:09 AM:


We have been testing this feature in-house, and have come across a problem 
regarding sharding when a document that is being updated is indexed inside a 
block,
 and the collection being used has more than a single shard.
 Right now when updating a document, an Id for the document has to be provided, 
in addition to the field which is being updated.
 When the document that is being updated is inside a block, the update can be 
routed to the wrong shard, since the shard in which it is indexed was 
calculated according to the root document's Id. ex.
 When this document:
{code:javascript}
 {"id": "1", "children": [{"id": "20", {"string_s": "ex"}]} {code}
Is being updated:
{code:javascript}
{"id": "20", "grand_children": {"add": [{"id": "21", "string_s": "ex"}]}}{code}
The update can be routed to another shard, where the block does not exist, 
causing the update to be indexed to a different shard,
 splitting our block in two pieces, existing in two separate shards.

Skimming through DistributedUpdateProcessor, I have suggestions for three 
different solutions.
 # If the schema is nested, the the routing method(in 
DistributedUpdateProcessor) can check if the document exists in any 
shards(lookup by id),
 find out whether it is inside a block(_root_) and route the update using the 
hash of _root_
 # Very similar to the previous method, only the _root_ lookup is done if the 
document which is being updated is not found in the shard it was routed to, 
asking other shards if the document exists inside a block, re-routing the 
update command.
 # The user provides the _root_, which is not the ideal case when it comes to 
user friendliness. This approach is very similar to [Elastic 
Search|https://www.elastic.co/guide/en/elasticsearch/guide/current/grandparents.html#CO285-1],
 which uses the *routing* parameter to route all children to the same shard.

IMO the third option is inferior to the first two, since it is the least user 
friendly out of the three options.
 My only concern regarding the first two options are the performance hit it 
might cause.

Another concern which David has discussed is the implications on the update log.
 Would ensuring DistributedUpdateProcessor is run before RunUpdateProcessor be 
of any help?
 I must admit I am not very familiar with these features of Solr.

WDYT [~dsmiley], [~caomanhdat]?


was (Author: moshebla):
We have been testing this feature in-house, and have come across a problem 
regarding sharding when a document that is being updated is indexed inside a 
block,
 and the collection being used has more than a single shard.
 Right now when updating a document, an Id for the document has to be provided, 
in addition to the field which is being updated.
 When the document that is being updated is inside a block, the update can be 
routed to the wrong shard, since the shard in which it is indexed was 
calculated according to the root document's Id. ex.
 When this document:
{code:javascript}
 {"id": "1", "children": [{"id": "20", {"string_s": "ex"}]} {code}
Is being updated:
{code:javascript}
{"id": "20", "grand_children": {"add": [{"id": "21", "string_s": "ex"}]}}{code}
The update can be routed to another shard, where the block does not exist, 
causing the update to be indexed to a different shard,
 splitting our block in two pieces, existing in two separate shards.

Skimming through DistributedUpdateProcessor, I have suggestions for three 
different solutions.
 # If the schema is nested, the the routing method(in 
DistributedUpdateProcessor) can check if the document exists in any 
shards(lookup by id),
 find out whether it is inside a block(_root_) and route the update using the 
hash of _root_
 # Very similar to the previous method, only the _root_ lookup is done if the 
document which is being updated is not found in the shard it was routed to, 
asking other shards if the document exists inside a block, re-routing the 
update command.
 # The user provides the _root_, which is not the ideal case when it comes to 
user friendliness.

IMO the third option should be the last result, since it is the least user 
friendly out of the three options.
 My only concern regarding the first two options are the performance hit it 
might cause.

Another concern which David has discussed is the implications on the update log.
Would ensuring DistributedUpdateProcessor is run before RunUpdateProcessor be 
of any help?
I must admit I am not very familiar with these features of Solr.

WDYT [~dsmiley], [~caomanhdat]?

> Support atomic updates of nested/child documents for nested-enabled schema
> --
>
> Key: SOLR-12638
>   

[jira] [Comment Edited] (SOLR-12638) Support atomic updates of nested/child documents for nested-enabled schema

2018-10-21 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658100#comment-16658100
 ] 

mosh edited comment on SOLR-12638 at 10/21/18 10:35 AM:


We have been testing this feature in-house, and have come across a problem 
regarding sharding when a document that is being updated is indexed inside a 
block,
 and the collection being used has more than a single shard.
 Right now when updating a document, an Id for the document has to be provided, 
in addition to the field which is being updated.
 When the document that is being updated is inside a block, the update can be 
routed to the wrong shard, since the shard in which it is indexed was 
calculated according to the root document's Id. ex.
 When this document:
{code:javascript}
 {"id": "1", "children": [{"id": "20", {"string_s": "ex"}]} {code}
Is being updated:
{code:javascript}
{"id": "20", "grand_children": {"add": [{"id": "21", "string_s": "ex"}]}}{code}
The update can be routed to another shard, where the block does not exist, 
causing the update to be indexed to a different shard,
 splitting our block in two pieces, existing in two separate shards.

Skimming through DistributedUpdateProcessor, I have suggestions for three 
different solutions.
 # If the schema is nested, the the routing method(in 
DistributedUpdateProcessor#setupRequest) can check if the document exists in 
any shards(lookup by id),
 find out whether it is inside a block(_root_) and route the update using the 
hash of _root_
 # Very similar to the previous method, only the _root_ lookup is done if the 
document which is being updated is not found in the shard it was routed to, 
asking other shards if the document exists inside a block, re-routing the 
update command.
 # The user provides the _root_, which is not the ideal case when it comes to 
user friendliness. This approach is very similar to [Elastic 
Search|https://www.elastic.co/guide/en/elasticsearch/guide/current/grandparents.html#CO285-1],
 which uses the *routing* parameter to route all children to the same shard.

IMO the third option is inferior to the first two, since it is the least user 
friendly out of the three options.
 My only concern regarding the first two options are the performance hit it 
might cause.

Another concern which David has discussed is the implications on the update log.
 Would ensuring DistributedUpdateProcessor is run before RunUpdateProcessor be 
of any help?
 I must admit I am not very familiar with these features of Solr.

WDYT [~dsmiley], [~caomanhdat]?


was (Author: moshebla):
We have been testing this feature in-house, and have come across a problem 
regarding sharding when a document that is being updated is indexed inside a 
block,
 and the collection being used has more than a single shard.
 Right now when updating a document, an Id for the document has to be provided, 
in addition to the field which is being updated.
 When the document that is being updated is inside a block, the update can be 
routed to the wrong shard, since the shard in which it is indexed was 
calculated according to the root document's Id. ex.
 When this document:
{code:javascript}
 {"id": "1", "children": [{"id": "20", {"string_s": "ex"}]} {code}
Is being updated:
{code:javascript}
{"id": "20", "grand_children": {"add": [{"id": "21", "string_s": "ex"}]}}{code}
The update can be routed to another shard, where the block does not exist, 
causing the update to be indexed to a different shard,
 splitting our block in two pieces, existing in two separate shards.

Skimming through DistributedUpdateProcessor, I have suggestions for three 
different solutions.
 # If the schema is nested, the the routing method(in 
DistributedUpdateProcessor) can check if the document exists in any 
shards(lookup by id),
 find out whether it is inside a block(_root_) and route the update using the 
hash of _root_
 # Very similar to the previous method, only the _root_ lookup is done if the 
document which is being updated is not found in the shard it was routed to, 
asking other shards if the document exists inside a block, re-routing the 
update command.
 # The user provides the _root_, which is not the ideal case when it comes to 
user friendliness. This approach is very similar to [Elastic 
Search|https://www.elastic.co/guide/en/elasticsearch/guide/current/grandparents.html#CO285-1],
 which uses the *routing* parameter to route all children to the same shard.

IMO the third option is inferior to the first two, since it is the least user 
friendly out of the three options.
 My only concern regarding the first two options are the performance hit it 
might cause.

Another concern which David has discussed is the implications on the update log.
 Would ensuring DistributedUpdateProcessor is run before RunUpdateProcessor be 
of any help?
 I must admit I am not very familiar with these features of Solr.

[jira] [Commented] (SOLR-12890) Vector Search in Solr (Umbrella Issue)

2018-10-21 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658512#comment-16658512
 ] 

mosh commented on SOLR-12890:
-

{quote}Is this different from just committed SOLR-12879?{quote}
The main difference is that MinHash can only be calculated for strings, while 
this use case is used for other hashes.
This POC is for indexing vectors, while SOLR-12879 is for comparing string by 
analysing their vector values.
The URP in this POC takes a vector string(either dense or sparse) e.g. 
0.11,0.22,0.5,0.72,4.66 ...
and calculates its LSH hash at index time (using superbit for now, but this can 
be extended in the future).
Perhaps the query parsers could be joined or have some kind of factory check 
the field type to pick the right query,
but I do not think the URP can be replaced at this time.

> Vector Search in Solr (Umbrella Issue)
> --
>
> Key: SOLR-12890
> URL: https://issues.apache.org/jira/browse/SOLR-12890
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>
> We have recently come across a need to index documents containing vectors 
> using solr, and have even worked on a small POC. We used an URP to calculate 
> the LSH(we chose to use the superbit algorithm, but the code is designed in a 
> way the algorithm picked can be easily chagned), and stored the vector in 
> either sparse or dense forms, in a binary field.
> Perhaps an addition of an LSH URP in conjunction with a query parser that 
> uses the same properties to calculate LSH(or maybe ktree, or some other 
> algorithm all together) should be considered as a Solr feature?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12890) Vector Search in Solr (Umbrella Issue)

2018-10-21 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658512#comment-16658512
 ] 

mosh edited comment on SOLR-12890 at 10/22/18 3:35 AM:
---

{quote}Is this different from just committed SOLR-12879?{quote}
The main difference is that MinHash can only be calculated for strings, while 
this use case is used for other hashes.
This POC is for indexing vectors, while SOLR-12879 is for comparing string by 
analysing their vector values.
The URP in this POC takes a vector string(either dense or sparse) e.g. 
0.11,0.22,0.5,0.72,4.66 ...
and calculates its LSH hash at index time (using superbit for now, but this can 
be extended in the future).
Perhaps the query parsers could be joined or have some kind of factory check 
the field type to pick the right query,
but I do not think the URP provided in this POC can be replaced at this time.
The cosine similarity algorithm which is run by the custom scorer in this POC 
is also unique to this approach.


was (Author: moshebla):
{quote}Is this different from just committed SOLR-12879?{quote}
The main difference is that MinHash can only be calculated for strings, while 
this use case is used for other hashes.
This POC is for indexing vectors, while SOLR-12879 is for comparing string by 
analysing their vector values.
The URP in this POC takes a vector string(either dense or sparse) e.g. 
0.11,0.22,0.5,0.72,4.66 ...
and calculates its LSH hash at index time (using superbit for now, but this can 
be extended in the future).
Perhaps the query parsers could be joined or have some kind of factory check 
the field type to pick the right query,
but I do not think the URP can be replaced at this time.

> Vector Search in Solr (Umbrella Issue)
> --
>
> Key: SOLR-12890
> URL: https://issues.apache.org/jira/browse/SOLR-12890
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>
> We have recently come across a need to index documents containing vectors 
> using solr, and have even worked on a small POC. We used an URP to calculate 
> the LSH(we chose to use the superbit algorithm, but the code is designed in a 
> way the algorithm picked can be easily chagned), and stored the vector in 
> either sparse or dense forms, in a binary field.
> Perhaps an addition of an LSH URP in conjunction with a query parser that 
> uses the same properties to calculate LSH(or maybe ktree, or some other 
> algorithm all together) should be considered as a Solr feature?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12638) Support atomic updates of nested/child documents for nested-enabled schema

2018-10-22 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658983#comment-16658983
 ] 

mosh commented on SOLR-12638:
-

{quote}Or, perhaps we insist the user send a _route_ parameter to /update which 
is otherwise only used in searches?{quote}
I like this option a lot better, since it makes the updated doc look cleaner.
Adding another field to the update command can seem a little confusing IMO,
since that field is not used to update the document in any way.
I just pushed a commit implementing this.

> Support atomic updates of nested/child documents for nested-enabled schema
> --
>
> Key: SOLR-12638
> URL: https://issues.apache.org/jira/browse/SOLR-12638
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12638-delete-old-block-no-commit.patch, 
> SOLR-12638-nocommit.patch
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> I have been toying with the thought of using this transformer in conjunction 
> with NestedUpdateProcessor and AtomicUpdate to allow SOLR to completely 
> re-index the entire nested structure. This is just a thought, I am still 
> thinking about implementation details. Hopefully I will be able to post a 
> more concrete proposal soon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-12937) ChildDocTransformer should have score and start param

2018-10-29 Thread mosh (JIRA)
mosh created SOLR-12937:
---

 Summary: ChildDocTransformer should have score and start param
 Key: SOLR-12937
 URL: https://issues.apache.org/jira/browse/SOLR-12937
 Project: Solr
  Issue Type: Sub-task
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: mosh


Users should be able to pass score and start params to ChildDocTransformer so 
they can use cursors for nested queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12937) ChildDocTransformer should have sort and start param

2018-10-29 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-12937:

Summary: ChildDocTransformer should have sort and start param  (was: 
ChildDocTransformer should have score and start param)

> ChildDocTransformer should have sort and start param
> 
>
> Key: SOLR-12937
> URL: https://issues.apache.org/jira/browse/SOLR-12937
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>
> Users should be able to pass score and start params to ChildDocTransformer so 
> they can use cursors for nested queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12937) ChildDocTransformer should have sort and start param

2018-10-29 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-12937:

Description: Users should be able to pass sort and start params to 
ChildDocTransformer so they can use cursors for nested queries.  (was: Users 
should be able to pass score and start params to ChildDocTransformer so they 
can use cursors for nested queries.)

> ChildDocTransformer should have sort and start param
> 
>
> Key: SOLR-12937
> URL: https://issues.apache.org/jira/browse/SOLR-12937
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>
> Users should be able to pass sort and start params to ChildDocTransformer so 
> they can use cursors for nested queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12519) Support Deeply Nested Docs In Child Documents Transformer

2018-07-19 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549116#comment-16549116
 ] 

mosh commented on SOLR-12519:
-

 
{quote}See org.apache.solr.search.join.BlockJoinParentQParser#getCachedFilter 
for a clue.
{quote}

 That sounds like the next thing on my todo list, right after implementing the 
filtering using the Lucene iterations we have discussed prior.

 BTW,
I have just pushed to the WIP pull request, I have made some progress and it is 
starting to get there.

 

> Support Deeply Nested Docs In Child Documents Transformer
> -
>
> Key: SOLR-12519
> URL: https://issues.apache.org/jira/browse/SOLR-12519
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12519-no-commit.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As discussed in SOLR-12298, to make use of the meta-data fields in 
> SOLR-12441, there needs to be a smarter child document transformer, which 
> provides the ability to rebuild the original nested documents' structure.
>  In addition, I also propose the transformer will also have the ability to 
> bring only some of the original hierarchy, to prevent unnecessary block join 
> queries. e.g.
> {code}  {"a": "b", "c": [ {"e": "f"}, {"e": "g"} , {"h": "i"} ]} {code}
>  Incase my query is for all the children of "a:b", which contain the key "e" 
> in them, the query will be broken in to two parts:
>  1. The parent query "a:b"
>  2. The child query "e:*".
> If the only children flag is on, the transformer will return the following 
> documents:
>  {code}[ {"e": "f"}, {"e": "g"} ]{code}
> In case the flag was not turned on(perhaps the default state), the whole 
> document hierarchy will be returned, containing only the matching children:
> {code}{"a": "b", "c": [ {"e": "f"}, {"e": "g"} ]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12519) Support Deeply Nested Docs In Child Documents Transformer

2018-07-19 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549116#comment-16549116
 ] 

mosh edited comment on SOLR-12519 at 7/19/18 11:36 AM:
---

 
{quote}See org.apache.solr.search.join.BlockJoinParentQParser#getCachedFilter 
for a clue.
{quote}
That sounds like the next thing on my todo list, right after implementing the 
filtering using Lucene iterations we have discussed prior.

BTW,
 I have just pushed to the WIP pull request, I have made some progress and it 
is starting to get there.

 


was (Author: moshebla):
 
{quote}See org.apache.solr.search.join.BlockJoinParentQParser#getCachedFilter 
for a clue.
{quote}

 That sounds like the next thing on my todo list, right after implementing the 
filtering using the Lucene iterations we have discussed prior.

 BTW,
I have just pushed to the WIP pull request, I have made some progress and it is 
starting to get there.

 

> Support Deeply Nested Docs In Child Documents Transformer
> -
>
> Key: SOLR-12519
> URL: https://issues.apache.org/jira/browse/SOLR-12519
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12519-no-commit.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As discussed in SOLR-12298, to make use of the meta-data fields in 
> SOLR-12441, there needs to be a smarter child document transformer, which 
> provides the ability to rebuild the original nested documents' structure.
>  In addition, I also propose the transformer will also have the ability to 
> bring only some of the original hierarchy, to prevent unnecessary block join 
> queries. e.g.
> {code}  {"a": "b", "c": [ {"e": "f"}, {"e": "g"} , {"h": "i"} ]} {code}
>  Incase my query is for all the children of "a:b", which contain the key "e" 
> in them, the query will be broken in to two parts:
>  1. The parent query "a:b"
>  2. The child query "e:*".
> If the only children flag is on, the transformer will return the following 
> documents:
>  {code}[ {"e": "f"}, {"e": "g"} ]{code}
> In case the flag was not turned on(perhaps the default state), the whole 
> document hierarchy will be returned, containing only the matching children:
> {code}{"a": "b", "c": [ {"e": "f"}, {"e": "g"} ]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12441) Add deeply nested documents URP

2018-07-04 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532718#comment-16532718
 ] 

mosh commented on SOLR-12441:
-

I have been thinking about a special case where a child documents within an 
array has other children.
Perhaps we should keep inside the _NEST_PATH_ field the child documents' index 
in the hierarchy. e.g. {code:javascript}{"_NEST_PATH_": 
"book/pages[3]/footnote"}{code}
WDYT?

> Add deeply nested documents URP
> ---
>
> Key: SOLR-12441
> URL: https://issues.apache.org/jira/browse/SOLR-12441
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> As discussed in 
> [SOLR-12298|https://issues.apache.org/jira/browse/SOLR-12298], there ought to 
> be an URP to add metadata fields to childDocuments in order to allow a 
> transformer to rebuild the original document hierarchy.
> {quote}I propose we add the following fields:
>  # __nestParent__
>  # _nestLevel_
>  # __nestPath__
> __nestParent__: This field wild will store the document's parent docId, to be 
> used for building the whole hierarchy, using a new document transformer, as 
> suggested by Jan on the mailing list.
> _nestLevel_: This field will store the level of the specified field in the 
> document, using an int value. This field can be used for the parentFilter, 
> eliminating the need to provide a parentFilter, which will be set by default 
> as "_level_:queriedFieldLevel".
> _nestLevel_: This field will contain the full path, separated by a specific 
> reserved char e.g., '.'
>  for example: "first.second.third".
>  This will enable users to search for a specific path, or provide a regular 
> expression to search for fields sharing the same name in different levels of 
> the document, filtering using the level key if needed.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12441) Add deeply nested documents URP

2018-07-04 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532718#comment-16532718
 ] 

mosh edited comment on SOLR-12441 at 7/4/18 1:20 PM:
-

I have been thinking about a special case where a child documents within an 
array has other children.
 Perhaps we should keep inside the _NEST_PATH_ field the child documents' index 
in the hierarchy. e.g.
{code:javascript}
{"_NEST_PATH_": "book/pages[3]/footnote"}{code}
Or perhaps it should be:
{"_NEST_PATH_": "book[0]/pages[3]/footnote[0]"}{code}
WDYT?


was (Author: moshebla):
I have been thinking about a special case where a child documents within an 
array has other children.
Perhaps we should keep inside the _NEST_PATH_ field the child documents' index 
in the hierarchy. e.g. {code:javascript}{"_NEST_PATH_": 
"book/pages[3]/footnote"}{code}
WDYT?

> Add deeply nested documents URP
> ---
>
> Key: SOLR-12441
> URL: https://issues.apache.org/jira/browse/SOLR-12441
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> As discussed in 
> [SOLR-12298|https://issues.apache.org/jira/browse/SOLR-12298], there ought to 
> be an URP to add metadata fields to childDocuments in order to allow a 
> transformer to rebuild the original document hierarchy.
> {quote}I propose we add the following fields:
>  # __nestParent__
>  # _nestLevel_
>  # __nestPath__
> __nestParent__: This field wild will store the document's parent docId, to be 
> used for building the whole hierarchy, using a new document transformer, as 
> suggested by Jan on the mailing list.
> _nestLevel_: This field will store the level of the specified field in the 
> document, using an int value. This field can be used for the parentFilter, 
> eliminating the need to provide a parentFilter, which will be set by default 
> as "_level_:queriedFieldLevel".
> _nestLevel_: This field will contain the full path, separated by a specific 
> reserved char e.g., '.'
>  for example: "first.second.third".
>  This will enable users to search for a specific path, or provide a regular 
> expression to search for fields sharing the same name in different levels of 
> the document, filtering using the level key if needed.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12441) Add deeply nested documents URP

2018-07-04 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532718#comment-16532718
 ] 

mosh edited comment on SOLR-12441 at 7/4/18 1:20 PM:
-

I have been thinking about a special case where a child documents within an 
array has other children.
 Perhaps we should keep inside the _NEST_PATH_ field the child documents' index 
in the hierarchy. e.g.
{code:javascript}
{"_NEST_PATH_": "book/pages[3]/footnote"}{code}
Or perhaps it should be:
{code}{"_NEST_PATH_": "book[0]/pages[3]/footnote[0]"}{code}
WDYT?


was (Author: moshebla):
I have been thinking about a special case where a child documents within an 
array has other children.
 Perhaps we should keep inside the _NEST_PATH_ field the child documents' index 
in the hierarchy. e.g.
{code:javascript}
{"_NEST_PATH_": "book/pages[3]/footnote"}{code}
Or perhaps it should be:
{"_NEST_PATH_": "book[0]/pages[3]/footnote[0]"}{code}
WDYT?

> Add deeply nested documents URP
> ---
>
> Key: SOLR-12441
> URL: https://issues.apache.org/jira/browse/SOLR-12441
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> As discussed in 
> [SOLR-12298|https://issues.apache.org/jira/browse/SOLR-12298], there ought to 
> be an URP to add metadata fields to childDocuments in order to allow a 
> transformer to rebuild the original document hierarchy.
> {quote}I propose we add the following fields:
>  # __nestParent__
>  # _nestLevel_
>  # __nestPath__
> __nestParent__: This field wild will store the document's parent docId, to be 
> used for building the whole hierarchy, using a new document transformer, as 
> suggested by Jan on the mailing list.
> _nestLevel_: This field will store the level of the specified field in the 
> document, using an int value. This field can be used for the parentFilter, 
> eliminating the need to provide a parentFilter, which will be set by default 
> as "_level_:queriedFieldLevel".
> _nestLevel_: This field will contain the full path, separated by a specific 
> reserved char e.g., '.'
>  for example: "first.second.third".
>  This will enable users to search for a specific path, or provide a regular 
> expression to search for fields sharing the same name in different levels of 
> the document, filtering using the level key if needed.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12441) Add deeply nested documents URP

2018-07-04 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532718#comment-16532718
 ] 

mosh edited comment on SOLR-12441 at 7/4/18 2:25 PM:
-

I have been thinking about a special case where a child documents within an 
array has other children.
 Perhaps we should keep inside the _NEST_PATH_ field the child documents' index 
in the hierarchy. e.g.
{code:javascript}
{"_NEST_PATH_": "book/pages/3//footnote"}{code}
Or perhaps it should be:
{code:java}
{"_NEST_PATH_": "book/0//pages/3//footnote/0/"}{code}
WDYT?


was (Author: moshebla):
I have been thinking about a special case where a child documents within an 
array has other children.
 Perhaps we should keep inside the _NEST_PATH_ field the child documents' index 
in the hierarchy. e.g.
{code:javascript}
{"_NEST_PATH_": "book/pages[3]/footnote"}{code}
Or perhaps it should be:
{code}{"_NEST_PATH_": "book[0]/pages[3]/footnote[0]"}{code}
WDYT?

> Add deeply nested documents URP
> ---
>
> Key: SOLR-12441
> URL: https://issues.apache.org/jira/browse/SOLR-12441
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> As discussed in 
> [SOLR-12298|https://issues.apache.org/jira/browse/SOLR-12298], there ought to 
> be an URP to add metadata fields to childDocuments in order to allow a 
> transformer to rebuild the original document hierarchy.
> {quote}I propose we add the following fields:
>  # __nestParent__
>  # _nestLevel_
>  # __nestPath__
> __nestParent__: This field wild will store the document's parent docId, to be 
> used for building the whole hierarchy, using a new document transformer, as 
> suggested by Jan on the mailing list.
> _nestLevel_: This field will store the level of the specified field in the 
> document, using an int value. This field can be used for the parentFilter, 
> eliminating the need to provide a parentFilter, which will be set by default 
> as "_level_:queriedFieldLevel".
> _nestLevel_: This field will contain the full path, separated by a specific 
> reserved char e.g., '.'
>  for example: "first.second.third".
>  This will enable users to search for a specific path, or provide a regular 
> expression to search for fields sharing the same name in different levels of 
> the document, filtering using the level key if needed.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12441) Add deeply nested documents URP

2018-07-04 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16533259#comment-16533259
 ] 

mosh commented on SOLR-12441:
-

I will have a look at using docID order.
 I was thinking this could be used to prevent the need for more block join 
queries, since the join can be sorted by the _NEST_PATH_ field, and then we 
could get each parent from the root SolrInputDocument by using a simple array 
get e.g. {code:java}rootSolrDocument.get(childNum){code}

> Add deeply nested documents URP
> ---
>
> Key: SOLR-12441
> URL: https://issues.apache.org/jira/browse/SOLR-12441
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> As discussed in 
> [SOLR-12298|https://issues.apache.org/jira/browse/SOLR-12298], there ought to 
> be an URP to add metadata fields to childDocuments in order to allow a 
> transformer to rebuild the original document hierarchy.
> {quote}I propose we add the following fields:
>  # __nestParent__
>  # _nestLevel_
>  # __nestPath__
> __nestParent__: This field wild will store the document's parent docId, to be 
> used for building the whole hierarchy, using a new document transformer, as 
> suggested by Jan on the mailing list.
> _nestLevel_: This field will store the level of the specified field in the 
> document, using an int value. This field can be used for the parentFilter, 
> eliminating the need to provide a parentFilter, which will be set by default 
> as "_level_:queriedFieldLevel".
> _nestLevel_: This field will contain the full path, separated by a specific 
> reserved char e.g., '.'
>  for example: "first.second.third".
>  This will enable users to search for a specific path, or provide a regular 
> expression to search for fields sharing the same name in different levels of 
> the document, filtering using the level key if needed.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12441) Add deeply nested documents URP

2018-07-04 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16533259#comment-16533259
 ] 

mosh edited comment on SOLR-12441 at 7/5/18 4:57 AM:
-

I will have a look at using docID order.
 I was thinking this could be used to prevent the need for more block join 
queries, since the join can be sorted by the _NEST_PATH_ field, and then we 
could get each parent from the root SolrInputDocument by using a simple array 
get e.g.
{code:java}
((List)rootSolrDocument.get("childrenArray")).get(childNum);{code}


was (Author: moshebla):
I will have a look at using docID order.
 I was thinking this could be used to prevent the need for more block join 
queries, since the join can be sorted by the _NEST_PATH_ field, and then we 
could get each parent from the root SolrInputDocument by using a simple array 
get e.g. {code:java}rootSolrDocument.get(childNum){code}

> Add deeply nested documents URP
> ---
>
> Key: SOLR-12441
> URL: https://issues.apache.org/jira/browse/SOLR-12441
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> As discussed in 
> [SOLR-12298|https://issues.apache.org/jira/browse/SOLR-12298], there ought to 
> be an URP to add metadata fields to childDocuments in order to allow a 
> transformer to rebuild the original document hierarchy.
> {quote}I propose we add the following fields:
>  # __nestParent__
>  # _nestLevel_
>  # __nestPath__
> __nestParent__: This field wild will store the document's parent docId, to be 
> used for building the whole hierarchy, using a new document transformer, as 
> suggested by Jan on the mailing list.
> _nestLevel_: This field will store the level of the specified field in the 
> document, using an int value. This field can be used for the parentFilter, 
> eliminating the need to provide a parentFilter, which will be set by default 
> as "_level_:queriedFieldLevel".
> _nestLevel_: This field will contain the full path, separated by a specific 
> reserved char e.g., '.'
>  for example: "first.second.third".
>  This will enable users to search for a specific path, or provide a regular 
> expression to search for fields sharing the same name in different levels of 
> the document, filtering using the level key if needed.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12441) Add deeply nested documents URP

2018-07-04 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16533263#comment-16533263
 ] 

mosh commented on SOLR-12441:
-

{quote}What did you think of my suggestion to use some other separator char 
(not PATH_SEP_CHAR) for generating an id for child documents? My first 
suggestion was a pound symbol. Though it wouldn't show when in a URL... maybe a 
comma. But we could stay with a '/'; I just thought it might be nice to make a 
distinction between the separator between parent/child and the separator to 
append the child sequence/num.
{quote}
I do not hold an opinion regarding this dilemma.
 Currently it seems like comma or the pound sign are the candidates for this 
separator, unless some one has some other alternative in mind.

> Add deeply nested documents URP
> ---
>
> Key: SOLR-12441
> URL: https://issues.apache.org/jira/browse/SOLR-12441
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> As discussed in 
> [SOLR-12298|https://issues.apache.org/jira/browse/SOLR-12298], there ought to 
> be an URP to add metadata fields to childDocuments in order to allow a 
> transformer to rebuild the original document hierarchy.
> {quote}I propose we add the following fields:
>  # __nestParent__
>  # _nestLevel_
>  # __nestPath__
> __nestParent__: This field wild will store the document's parent docId, to be 
> used for building the whole hierarchy, using a new document transformer, as 
> suggested by Jan on the mailing list.
> _nestLevel_: This field will store the level of the specified field in the 
> document, using an int value. This field can be used for the parentFilter, 
> eliminating the need to provide a parentFilter, which will be set by default 
> as "_level_:queriedFieldLevel".
> _nestLevel_: This field will contain the full path, separated by a specific 
> reserved char e.g., '.'
>  for example: "first.second.third".
>  This will enable users to search for a specific path, or provide a regular 
> expression to search for fields sharing the same name in different levels of 
> the document, filtering using the level key if needed.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12441) Add deeply nested documents URP

2018-07-05 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16533427#comment-16533427
 ] 

mosh commented on SOLR-12441:
-

Another thought has popped into my mind.
{quote}{code:javascript}{ "from": { "name": "Peyton Manning", "id": "X18" } } 
{code}{quote}
When recreating the original JSON structure, there is no way to know whether a 
single child was inside an array or not.
We could also index single children such as the one above with this 
_NEST_PATH_: {code:javascript}from,s,{code}
we could have a special character to indicate this child was not inside an 
array when indexed.
I just picked the char 's' because it is short for single, perhaps someone 
might have a better idea.

> Add deeply nested documents URP
> ---
>
> Key: SOLR-12441
> URL: https://issues.apache.org/jira/browse/SOLR-12441
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> As discussed in 
> [SOLR-12298|https://issues.apache.org/jira/browse/SOLR-12298], there ought to 
> be an URP to add metadata fields to childDocuments in order to allow a 
> transformer to rebuild the original document hierarchy.
> {quote}I propose we add the following fields:
>  # __nestParent__
>  # _nestLevel_
>  # __nestPath__
> __nestParent__: This field wild will store the document's parent docId, to be 
> used for building the whole hierarchy, using a new document transformer, as 
> suggested by Jan on the mailing list.
> _nestLevel_: This field will store the level of the specified field in the 
> document, using an int value. This field can be used for the parentFilter, 
> eliminating the need to provide a parentFilter, which will be set by default 
> as "_level_:queriedFieldLevel".
> _nestLevel_: This field will contain the full path, separated by a specific 
> reserved char e.g., '.'
>  for example: "first.second.third".
>  This will enable users to search for a specific path, or provide a regular 
> expression to search for fields sharing the same name in different levels of 
> the document, filtering using the level key if needed.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12441) Add deeply nested documents URP

2018-07-07 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535990#comment-16535990
 ] 

mosh edited comment on SOLR-12441 at 7/8/18 4:56 AM:
-

{quote}For what other purpose do we need the path – other than a DocTransformer 
to reconstitute the document? Does this need to be indexed for some purpose?
{quote}
The path field can be used to filter children when using the doc transformer.
{code:javascript}
 { "from": { "name": "Peyton Manning", "id": "X18" }, "to": "John" } {code}
e.g. if the user enters from/name:"Peyton*", the transformer can construct the 
following:
{code:java}
q="to:John", childFilter="from/name:Peyton*"{code}
The transformer could construct the parentFilter by checking whether _root_=id, 
and build the childFilter "__NEST_PATH__:from#*# AND name:Peyton".
 This would prevent keys with the same names in child documents which are in 
different levels making false-positive matches.

The reason I ended each nesting key with #num# is so we can query using the 
__NEST_PATH__ field regardless of the value type or array index using 
"keyName#*#".


was (Author: moshebla):
{quote}For what other purpose do we need the path – other than a DocTransformer 
to reconstitute the document? Does this need to be indexed for some purpose?
{quote}
The path field can be used to filter children when using the doc transformer.
{code:javascript}
 { "from": { "name": "Peyton Manning", "id": "X18" }, "to": John } {code}
e.g. if the user enters from/name:"Peyton*", the transformer can construct the 
following:
{code:java}
q="to:John", childFilter="from/name:Peyton*"{code}
The transformer could construct the parentFilter by checking whether _root_=id, 
and build the childFilter "__NEST_PATH__:from#*# AND name:Peyton".
 This would prevent keys with the same names in child documents which are in 
different levels making false-positive matches.

 The reason I ended each nesting key with #num# is so we can query using the 
__NEST_PATH__ field regardless of the value type or array index using 
"keyName#*#".

> Add deeply nested documents URP
> ---
>
> Key: SOLR-12441
> URL: https://issues.apache.org/jira/browse/SOLR-12441
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> As discussed in 
> [SOLR-12298|https://issues.apache.org/jira/browse/SOLR-12298], there ought to 
> be an URP to add metadata fields to childDocuments in order to allow a 
> transformer to rebuild the original document hierarchy.
> {quote}I propose we add the following fields:
>  # __nestParent__
>  # _nestLevel_
>  # __nestPath__
> __nestParent__: This field wild will store the document's parent docId, to be 
> used for building the whole hierarchy, using a new document transformer, as 
> suggested by Jan on the mailing list.
> _nestLevel_: This field will store the level of the specified field in the 
> document, using an int value. This field can be used for the parentFilter, 
> eliminating the need to provide a parentFilter, which will be set by default 
> as "_level_:queriedFieldLevel".
> _nestLevel_: This field will contain the full path, separated by a specific 
> reserved char e.g., '.'
>  for example: "first.second.third".
>  This will enable users to search for a specific path, or provide a regular 
> expression to search for fields sharing the same name in different levels of 
> the document, filtering using the level key if needed.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12441) Add deeply nested documents URP

2018-07-07 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535990#comment-16535990
 ] 

mosh commented on SOLR-12441:
-

{quote}For what other purpose do we need the path – other than a DocTransformer 
to reconstitute the document? Does this need to be indexed for some purpose?
{quote}
The path field can be used to filter children when using the doc transformer.
{code:javascript}
 { "from": { "name": "Peyton Manning", "id": "X18" }, "to": John } {code}
e.g. if the user enters from/name:"Peyton*", the transformer can construct the 
following:
{code:java}
q="to:John", childFilter="from/name:Peyton*"{code}
The transformer could construct the parentFilter by checking whether _root_=id, 
and build the childFilter "__NEST_PATH__:from#*# AND name:Peyton".
 This would prevent keys with the same names in child documents which are in 
different levels making false-positive matches.

 The reason I ended each nesting key with #num# is so we can query using the 
__NEST_PATH__ field regardless of the value type or array index using 
"keyName#*#".

> Add deeply nested documents URP
> ---
>
> Key: SOLR-12441
> URL: https://issues.apache.org/jira/browse/SOLR-12441
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> As discussed in 
> [SOLR-12298|https://issues.apache.org/jira/browse/SOLR-12298], there ought to 
> be an URP to add metadata fields to childDocuments in order to allow a 
> transformer to rebuild the original document hierarchy.
> {quote}I propose we add the following fields:
>  # __nestParent__
>  # _nestLevel_
>  # __nestPath__
> __nestParent__: This field wild will store the document's parent docId, to be 
> used for building the whole hierarchy, using a new document transformer, as 
> suggested by Jan on the mailing list.
> _nestLevel_: This field will store the level of the specified field in the 
> document, using an int value. This field can be used for the parentFilter, 
> eliminating the need to provide a parentFilter, which will be set by default 
> as "_level_:queriedFieldLevel".
> _nestLevel_: This field will contain the full path, separated by a specific 
> reserved char e.g., '.'
>  for example: "first.second.third".
>  This will enable users to search for a specific path, or provide a regular 
> expression to search for fields sharing the same name in different levels of 
> the document, filtering using the level key if needed.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12441) Add deeply nested documents URP

2018-07-08 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16536139#comment-16536139
 ] 

mosh commented on SOLR-12441:
-

{quote}The childFilter {{from/name:Peyton*}} is not a valid query syntax due to 
the slash. Right?{quote}
It will only work if it is broken down by the transformer into to separate sub 
queries.
{quote}NEST_PATH:from#*# wildcards in the middle of a string can be problematic 
as it may match across 'from' to some other child label. Wether that's an issue 
here or not I'm not sure yet.{quote}
It will only match the ones which are children of the key nest, inside the 
parent document.
{quote}It's quite plausible it's indexed form ought to be stripped of the 
sibling IDs with PatternReplaceCharFilterFactory and then processed with 
PathHierarchyTokenizerFactory{quote}
Do you mean to strip childNum off the _NEST_PATH field?

> Add deeply nested documents URP
> ---
>
> Key: SOLR-12441
> URL: https://issues.apache.org/jira/browse/SOLR-12441
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> As discussed in 
> [SOLR-12298|https://issues.apache.org/jira/browse/SOLR-12298], there ought to 
> be an URP to add metadata fields to childDocuments in order to allow a 
> transformer to rebuild the original document hierarchy.
> {quote}I propose we add the following fields:
>  # __nestParent__
>  # _nestLevel_
>  # __nestPath__
> __nestParent__: This field wild will store the document's parent docId, to be 
> used for building the whole hierarchy, using a new document transformer, as 
> suggested by Jan on the mailing list.
> _nestLevel_: This field will store the level of the specified field in the 
> document, using an int value. This field can be used for the parentFilter, 
> eliminating the need to provide a parentFilter, which will be set by default 
> as "_level_:queriedFieldLevel".
> _nestLevel_: This field will contain the full path, separated by a specific 
> reserved char e.g., '.'
>  for example: "first.second.third".
>  This will enable users to search for a specific path, or provide a regular 
> expression to search for fields sharing the same name in different levels of 
> the document, filtering using the level key if needed.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12441) Add deeply nested documents URP

2018-07-09 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537051#comment-16537051
 ] 

mosh commented on SOLR-12441:
-

{quote}See PathHierarchyTokenizerFactoryTest and the descendents vs ancestors 
distinction as well via two differently indexed fields for use-cases involving 
descendents and ancestors if we need that. With some tricks we could use one 
field if we need all 3 (exact, descendants, ancestors).
{quote}
Oh this is perfect, it makes it so much easier. I was contemplating how the 
transformer could check for all three options(exact, descendants, ancestors).
 do you have any suggestions? I have been trying to use the "!field" 
transformer with boolean operators to no avail.

> Add deeply nested documents URP
> ---
>
> Key: SOLR-12441
> URL: https://issues.apache.org/jira/browse/SOLR-12441
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> As discussed in 
> [SOLR-12298|https://issues.apache.org/jira/browse/SOLR-12298], there ought to 
> be an URP to add metadata fields to childDocuments in order to allow a 
> transformer to rebuild the original document hierarchy.
> {quote}I propose we add the following fields:
>  # __nestParent__
>  # _nestLevel_
>  # __nestPath__
> __nestParent__: This field wild will store the document's parent docId, to be 
> used for building the whole hierarchy, using a new document transformer, as 
> suggested by Jan on the mailing list.
> _nestLevel_: This field will store the level of the specified field in the 
> document, using an int value. This field can be used for the parentFilter, 
> eliminating the need to provide a parentFilter, which will be set by default 
> as "_level_:queriedFieldLevel".
> _nestLevel_: This field will contain the full path, separated by a specific 
> reserved char e.g., '.'
>  for example: "first.second.third".
>  This will enable users to search for a specific path, or provide a regular 
> expression to search for fields sharing the same name in different levels of 
> the document, filtering using the level key if needed.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12441) Add deeply nested documents URP

2018-07-09 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537051#comment-16537051
 ] 

mosh edited comment on SOLR-12441 at 7/9/18 3:17 PM:
-

{quote}See PathHierarchyTokenizerFactoryTest and the descendents vs ancestors 
distinction as well via two differently indexed fields for use-cases involving 
descendents and ancestors if we need that. With some tricks we could use one 
field if we need all 3 (exact, descendants, ancestors).
{quote}
Oh this is perfect, it makes it so much easier. I was contemplating how the 
transformer could check for all three options(exact, descendants, ancestors).
 do you have any suggestions? I have been trying to use the "!field" 
transformer with boolean operators to no avail.

Perhaps this discussion should be moved to the [ChildDocTransformer 
ticket|https://issues.apache.org/jira/browse/SOLR-12519]


was (Author: moshebla):
{quote}See PathHierarchyTokenizerFactoryTest and the descendents vs ancestors 
distinction as well via two differently indexed fields for use-cases involving 
descendents and ancestors if we need that. With some tricks we could use one 
field if we need all 3 (exact, descendants, ancestors).
{quote}
Oh this is perfect, it makes it so much easier. I was contemplating how the 
transformer could check for all three options(exact, descendants, ancestors).
 do you have any suggestions? I have been trying to use the "!field" 
transformer with boolean operators to no avail.

> Add deeply nested documents URP
> ---
>
> Key: SOLR-12441
> URL: https://issues.apache.org/jira/browse/SOLR-12441
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> As discussed in 
> [SOLR-12298|https://issues.apache.org/jira/browse/SOLR-12298], there ought to 
> be an URP to add metadata fields to childDocuments in order to allow a 
> transformer to rebuild the original document hierarchy.
> {quote}I propose we add the following fields:
>  # __nestParent__
>  # _nestLevel_
>  # __nestPath__
> __nestParent__: This field wild will store the document's parent docId, to be 
> used for building the whole hierarchy, using a new document transformer, as 
> suggested by Jan on the mailing list.
> _nestLevel_: This field will store the level of the specified field in the 
> document, using an int value. This field can be used for the parentFilter, 
> eliminating the need to provide a parentFilter, which will be set by default 
> as "_level_:queriedFieldLevel".
> _nestLevel_: This field will contain the full path, separated by a specific 
> reserved char e.g., '.'
>  for example: "first.second.third".
>  This will enable users to search for a specific path, or provide a regular 
> expression to search for fields sharing the same name in different levels of 
> the document, filtering using the level key if needed.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12441) Add deeply nested documents URP

2018-07-09 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538078#comment-16538078
 ] 

mosh commented on SOLR-12441:
-

{quote}Having a query by ancestor ability would allow me to filter where 
"comment" is an ancestor.{quote}
Would this be fit for the use of PathHierarchyTokenizerFactory in conjunction 
with the ToParentBlockJoinQuery?

> Add deeply nested documents URP
> ---
>
> Key: SOLR-12441
> URL: https://issues.apache.org/jira/browse/SOLR-12441
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> As discussed in 
> [SOLR-12298|https://issues.apache.org/jira/browse/SOLR-12298], there ought to 
> be an URP to add metadata fields to childDocuments in order to allow a 
> transformer to rebuild the original document hierarchy.
> {quote}I propose we add the following fields:
>  # __nestParent__
>  # _nestLevel_
>  # __nestPath__
> __nestParent__: This field wild will store the document's parent docId, to be 
> used for building the whole hierarchy, using a new document transformer, as 
> suggested by Jan on the mailing list.
> _nestLevel_: This field will store the level of the specified field in the 
> document, using an int value. This field can be used for the parentFilter, 
> eliminating the need to provide a parentFilter, which will be set by default 
> as "_level_:queriedFieldLevel".
> _nestLevel_: This field will contain the full path, separated by a specific 
> reserved char e.g., '.'
>  for example: "first.second.third".
>  This will enable users to search for a specific path, or provide a regular 
> expression to search for fields sharing the same name in different levels of 
> the document, filtering using the level key if needed.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12519) Support Deeply Nested Docs In Child Documents Transformer

2018-07-10 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538477#comment-16538477
 ] 

mosh commented on SOLR-12519:
-

{quote}Ultimately we'll want to utilize PathHierarchyTokenizer in some 
way.{quote}
I have been playing around using this tokenizer, which is rather useful.
The problem I am currently experiencing is I have not been able to find a way 
to get only children of a certain path, excluding the path. e.g.
All descendants of toppings, excluding toppings.
{code:javascript}!field f=_NEST_PATH_}toppings/{code}
The following filter returns descendants including toppings, even though I have 
appended the delimiter after the path.
Even tried the following filters to no avail:
{code:javascript}((NOT _NEST_PATH_:"toppings") AND ({!field 
f=_NEST_PATH_}toppings)){code}
{code:javascript}((NOT _NEST_PATH_:"*toppings") AND ({!field 
f=_NEST_PATH_}toppings)){code}

> Support Deeply Nested Docs In Child Documents Transformer
> -
>
> Key: SOLR-12519
> URL: https://issues.apache.org/jira/browse/SOLR-12519
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>
> As discussed in SOLR-12298, to make use of the meta-data fields in 
> SOLR-12441, there needs to be a smarter child document transformer, which 
> provides the ability to rebuild the original nested documents' structure.
>  In addition, I also propose the transformer will also have the ability to 
> bring only some of the original hierarchy, to prevent unnecessary block join 
> queries. e.g.
> {code}  {"a": "b", "c": [ {"e": "f"}, {"e": "g"} , {"h": "i"} ]} {code}
>  Incase my query is for all the children of "a:b", which contain the key "e" 
> in them, the query will be broken in to two parts:
>  1. The parent query "a:b"
>  2. The child query "e:*".
> If the only children flag is on, the transformer will return the following 
> documents:
>  {code}[ {"e": "f"}, {"e": "g"} ]{code}
> In case the flag was not turned on(perhaps the default state), the whole 
> document hierarchy will be returned, containing only the matching children:
> {code}{"a": "b", "c": [ {"e": "f"}, {"e": "g"} ]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12519) Support Deeply Nested Docs In Child Documents Transformer

2018-07-11 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540169#comment-16540169
 ] 

mosh commented on SOLR-12519:
-

{quote}if you put a trailing '/' at the end of the input, then it's retained in 
the indexed token for the full path and you can then query by this. In this 
way, if your query includes the trailing '/' then it means an exact match vs 
any descendent
{quote}
I have tried using this trick to no avail, even when I added a trailing / using 
the URP or the PatternReplaceTokenizer. even when my childFilter is 
_NEST_PATH_:toppings/, The query returns toppings' descendants.
Perhaps I'm misconfigured the PathHierarchyTokenizer?
 I will try and look in to this further, hopefully we won't have to add a new 
feature to the PathHierarchyTokenizer.

> Support Deeply Nested Docs In Child Documents Transformer
> -
>
> Key: SOLR-12519
> URL: https://issues.apache.org/jira/browse/SOLR-12519
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>
> As discussed in SOLR-12298, to make use of the meta-data fields in 
> SOLR-12441, there needs to be a smarter child document transformer, which 
> provides the ability to rebuild the original nested documents' structure.
>  In addition, I also propose the transformer will also have the ability to 
> bring only some of the original hierarchy, to prevent unnecessary block join 
> queries. e.g.
> {code}  {"a": "b", "c": [ {"e": "f"}, {"e": "g"} , {"h": "i"} ]} {code}
>  Incase my query is for all the children of "a:b", which contain the key "e" 
> in them, the query will be broken in to two parts:
>  1. The parent query "a:b"
>  2. The child query "e:*".
> If the only children flag is on, the transformer will return the following 
> documents:
>  {code}[ {"e": "f"}, {"e": "g"} ]{code}
> In case the flag was not turned on(perhaps the default state), the whole 
> document hierarchy will be returned, containing only the matching children:
> {code}{"a": "b", "c": [ {"e": "f"}, {"e": "g"} ]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12519) Support Deeply Nested Docs In Child Documents Transformer

2018-07-11 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540169#comment-16540169
 ] 

mosh edited comment on SOLR-12519 at 7/11/18 2:35 PM:
--

{quote}if you put a trailing '/' at the end of the input, then it's retained in 
the indexed token for the full path and you can then query by this. In this 
way, if your query includes the trailing '/' then it means an exact match vs 
any descendent
{quote}
I have tried using this trick to no avail, even when I added a trailing / using 
the URP or the PatternReplaceTokenizer. even when my childFilter is 
_NEST_PATH_:toppings/, the query returns toppings' descendants.
 Perhaps I'm misconfigured the PathHierarchyTokenizer?
 I will try and look in to this further, hopefully we won't have to add a new 
feature to the PathHierarchyTokenizer.


was (Author: moshebla):
{quote}if you put a trailing '/' at the end of the input, then it's retained in 
the indexed token for the full path and you can then query by this. In this 
way, if your query includes the trailing '/' then it means an exact match vs 
any descendent
{quote}
I have tried using this trick to no avail, even when I added a trailing / using 
the URP or the PatternReplaceTokenizer. even when my childFilter is 
_NEST_PATH_:toppings/, The query returns toppings' descendants.
Perhaps I'm misconfigured the PathHierarchyTokenizer?
 I will try and look in to this further, hopefully we won't have to add a new 
feature to the PathHierarchyTokenizer.

> Support Deeply Nested Docs In Child Documents Transformer
> -
>
> Key: SOLR-12519
> URL: https://issues.apache.org/jira/browse/SOLR-12519
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>
> As discussed in SOLR-12298, to make use of the meta-data fields in 
> SOLR-12441, there needs to be a smarter child document transformer, which 
> provides the ability to rebuild the original nested documents' structure.
>  In addition, I also propose the transformer will also have the ability to 
> bring only some of the original hierarchy, to prevent unnecessary block join 
> queries. e.g.
> {code}  {"a": "b", "c": [ {"e": "f"}, {"e": "g"} , {"h": "i"} ]} {code}
>  Incase my query is for all the children of "a:b", which contain the key "e" 
> in them, the query will be broken in to two parts:
>  1. The parent query "a:b"
>  2. The child query "e:*".
> If the only children flag is on, the transformer will return the following 
> documents:
>  {code}[ {"e": "f"}, {"e": "g"} ]{code}
> In case the flag was not turned on(perhaps the default state), the whole 
> document hierarchy will be returned, containing only the matching children:
> {code}{"a": "b", "c": [ {"e": "f"}, {"e": "g"} ]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12519) Support Deeply Nested Docs In Child Documents Transformer

2018-07-11 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-12519:

Attachment: (was: SOLR-12209.patch)

> Support Deeply Nested Docs In Child Documents Transformer
> -
>
> Key: SOLR-12519
> URL: https://issues.apache.org/jira/browse/SOLR-12519
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>
> As discussed in SOLR-12298, to make use of the meta-data fields in 
> SOLR-12441, there needs to be a smarter child document transformer, which 
> provides the ability to rebuild the original nested documents' structure.
>  In addition, I also propose the transformer will also have the ability to 
> bring only some of the original hierarchy, to prevent unnecessary block join 
> queries. e.g.
> {code}  {"a": "b", "c": [ {"e": "f"}, {"e": "g"} , {"h": "i"} ]} {code}
>  Incase my query is for all the children of "a:b", which contain the key "e" 
> in them, the query will be broken in to two parts:
>  1. The parent query "a:b"
>  2. The child query "e:*".
> If the only children flag is on, the transformer will return the following 
> documents:
>  {code}[ {"e": "f"}, {"e": "g"} ]{code}
> In case the flag was not turned on(perhaps the default state), the whole 
> document hierarchy will be returned, containing only the matching children:
> {code}{"a": "b", "c": [ {"e": "f"}, {"e": "g"} ]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12519) Support Deeply Nested Docs In Child Documents Transformer

2018-07-11 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-12519:

Attachment: (was: SOLR-12519-no-commit.patch)

> Support Deeply Nested Docs In Child Documents Transformer
> -
>
> Key: SOLR-12519
> URL: https://issues.apache.org/jira/browse/SOLR-12519
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>
> As discussed in SOLR-12298, to make use of the meta-data fields in 
> SOLR-12441, there needs to be a smarter child document transformer, which 
> provides the ability to rebuild the original nested documents' structure.
>  In addition, I also propose the transformer will also have the ability to 
> bring only some of the original hierarchy, to prevent unnecessary block join 
> queries. e.g.
> {code}  {"a": "b", "c": [ {"e": "f"}, {"e": "g"} , {"h": "i"} ]} {code}
>  Incase my query is for all the children of "a:b", which contain the key "e" 
> in them, the query will be broken in to two parts:
>  1. The parent query "a:b"
>  2. The child query "e:*".
> If the only children flag is on, the transformer will return the following 
> documents:
>  {code}[ {"e": "f"}, {"e": "g"} ]{code}
> In case the flag was not turned on(perhaps the default state), the whole 
> document hierarchy will be returned, containing only the matching children:
> {code}{"a": "b", "c": [ {"e": "f"}, {"e": "g"} ]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12519) Support Deeply Nested Docs In Child Documents Transformer

2018-07-11 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-12519:

Attachment: SOLR-12209.patch

> Support Deeply Nested Docs In Child Documents Transformer
> -
>
> Key: SOLR-12519
> URL: https://issues.apache.org/jira/browse/SOLR-12519
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>
> As discussed in SOLR-12298, to make use of the meta-data fields in 
> SOLR-12441, there needs to be a smarter child document transformer, which 
> provides the ability to rebuild the original nested documents' structure.
>  In addition, I also propose the transformer will also have the ability to 
> bring only some of the original hierarchy, to prevent unnecessary block join 
> queries. e.g.
> {code}  {"a": "b", "c": [ {"e": "f"}, {"e": "g"} , {"h": "i"} ]} {code}
>  Incase my query is for all the children of "a:b", which contain the key "e" 
> in them, the query will be broken in to two parts:
>  1. The parent query "a:b"
>  2. The child query "e:*".
> If the only children flag is on, the transformer will return the following 
> documents:
>  {code}[ {"e": "f"}, {"e": "g"} ]{code}
> In case the flag was not turned on(perhaps the default state), the whole 
> document hierarchy will be returned, containing only the matching children:
> {code}{"a": "b", "c": [ {"e": "f"}, {"e": "g"} ]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12519) Support Deeply Nested Docs In Child Documents Transformer

2018-07-11 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-12519:

Attachment: SOLR-12519-no-commit.patch

> Support Deeply Nested Docs In Child Documents Transformer
> -
>
> Key: SOLR-12519
> URL: https://issues.apache.org/jira/browse/SOLR-12519
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>
> As discussed in SOLR-12298, to make use of the meta-data fields in 
> SOLR-12441, there needs to be a smarter child document transformer, which 
> provides the ability to rebuild the original nested documents' structure.
>  In addition, I also propose the transformer will also have the ability to 
> bring only some of the original hierarchy, to prevent unnecessary block join 
> queries. e.g.
> {code}  {"a": "b", "c": [ {"e": "f"}, {"e": "g"} , {"h": "i"} ]} {code}
>  Incase my query is for all the children of "a:b", which contain the key "e" 
> in them, the query will be broken in to two parts:
>  1. The parent query "a:b"
>  2. The child query "e:*".
> If the only children flag is on, the transformer will return the following 
> documents:
>  {code}[ {"e": "f"}, {"e": "g"} ]{code}
> In case the flag was not turned on(perhaps the default state), the whole 
> document hierarchy will be returned, containing only the matching children:
> {code}{"a": "b", "c": [ {"e": "f"}, {"e": "g"} ]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12519) Support Deeply Nested Docs In Child Documents Transformer

2018-07-11 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540169#comment-16540169
 ] 

mosh edited comment on SOLR-12519 at 7/11/18 4:27 PM:
--

{quote}if you put a trailing '/' at the end of the input, then it's retained in 
the indexed token for the full path and you can then query by this. In this 
way, if your query includes the trailing '/' then it means an exact match vs 
any descendent
{quote}
I have tried using this trick to no avail, even when I added a trailing / using 
the URP or the PatternReplaceTokenizer. even when my childFilter is 
_NEST_PATH_:toppings/, the query returns toppings' descendants.
 Perhaps I have misconfigured the PathHierarchyTokenizer?
 I will try and look in to this further, hopefully we won't have to add a new 
feature to the PathHierarchyTokenizer.


was (Author: moshebla):
{quote}if you put a trailing '/' at the end of the input, then it's retained in 
the indexed token for the full path and you can then query by this. In this 
way, if your query includes the trailing '/' then it means an exact match vs 
any descendent
{quote}
I have tried using this trick to no avail, even when I added a trailing / using 
the URP or the PatternReplaceTokenizer. even when my childFilter is 
_NEST_PATH_:toppings/, the query returns toppings' descendants.
 Perhaps I'm misconfigured the PathHierarchyTokenizer?
 I will try and look in to this further, hopefully we won't have to add a new 
feature to the PathHierarchyTokenizer.

> Support Deeply Nested Docs In Child Documents Transformer
> -
>
> Key: SOLR-12519
> URL: https://issues.apache.org/jira/browse/SOLR-12519
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12519-no-commit.patch
>
>
> As discussed in SOLR-12298, to make use of the meta-data fields in 
> SOLR-12441, there needs to be a smarter child document transformer, which 
> provides the ability to rebuild the original nested documents' structure.
>  In addition, I also propose the transformer will also have the ability to 
> bring only some of the original hierarchy, to prevent unnecessary block join 
> queries. e.g.
> {code}  {"a": "b", "c": [ {"e": "f"}, {"e": "g"} , {"h": "i"} ]} {code}
>  Incase my query is for all the children of "a:b", which contain the key "e" 
> in them, the query will be broken in to two parts:
>  1. The parent query "a:b"
>  2. The child query "e:*".
> If the only children flag is on, the transformer will return the following 
> documents:
>  {code}[ {"e": "f"}, {"e": "g"} ]{code}
> In case the flag was not turned on(perhaps the default state), the whole 
> document hierarchy will be returned, containing only the matching children:
> {code}{"a": "b", "c": [ {"e": "f"}, {"e": "g"} ]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12519) Support Deeply Nested Docs In Child Documents Transformer

2018-07-12 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541487#comment-16541487
 ] 

mosh commented on SOLR-12519:
-

Thanks a lot [~dsmiley], I have uploaded a WIP [pull 
request|https://github.com/apache/lucene-solr/pull/416], which is far from 
ready, just to see we are all on the same track.
Consider it an alpha version of the transformer.
{quote}If we also want to use the same field for finding ancestors{quote}
One major 
[obstacle|https://github.com/apache/lucene-solr/pull/416/files#diff-1477ca79d167daa1ca838fc323ef1badR97]
 I have yet to conquer is a way return only the ancestors of children that 
matched the child filter. Currently I have only thought of doing a reverse 
lookup using the _nest_parent_ field, iterating over the ancestor keys' values, 
but that seems slow and cumbersome.
Hopefully we will be able to find a better way.

> Support Deeply Nested Docs In Child Documents Transformer
> -
>
> Key: SOLR-12519
> URL: https://issues.apache.org/jira/browse/SOLR-12519
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12519-no-commit.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As discussed in SOLR-12298, to make use of the meta-data fields in 
> SOLR-12441, there needs to be a smarter child document transformer, which 
> provides the ability to rebuild the original nested documents' structure.
>  In addition, I also propose the transformer will also have the ability to 
> bring only some of the original hierarchy, to prevent unnecessary block join 
> queries. e.g.
> {code}  {"a": "b", "c": [ {"e": "f"}, {"e": "g"} , {"h": "i"} ]} {code}
>  Incase my query is for all the children of "a:b", which contain the key "e" 
> in them, the query will be broken in to two parts:
>  1. The parent query "a:b"
>  2. The child query "e:*".
> If the only children flag is on, the transformer will return the following 
> documents:
>  {code}[ {"e": "f"}, {"e": "g"} ]{code}
> In case the flag was not turned on(perhaps the default state), the whole 
> document hierarchy will be returned, containing only the matching children:
> {code}{"a": "b", "c": [ {"e": "f"}, {"e": "g"} ]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12519) Support Deeply Nested Docs In Child Documents Transformer

2018-07-15 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16544538#comment-16544538
 ] 

mosh commented on SOLR-12519:
-

{quote}Detecting if child doc X has an ancestor of doc X + N is a matter of 
comparing if the path at X + N is a prefix of the path at X. You stop looping 
forward once you reach the root document – tracked in parentsFilter bits{quote}
parentFilter bits is generated in ToChildBlockJoinWeight#scorer and is a 
private member of the ToChildBlockJoinScorer class. Would this change be 
implemented at the transformer level, thus creating the parent BitSet twice, or 
would it be better to accumulate each BitSet and have a getParentsBitSet method?

> Support Deeply Nested Docs In Child Documents Transformer
> -
>
> Key: SOLR-12519
> URL: https://issues.apache.org/jira/browse/SOLR-12519
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12519-no-commit.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As discussed in SOLR-12298, to make use of the meta-data fields in 
> SOLR-12441, there needs to be a smarter child document transformer, which 
> provides the ability to rebuild the original nested documents' structure.
>  In addition, I also propose the transformer will also have the ability to 
> bring only some of the original hierarchy, to prevent unnecessary block join 
> queries. e.g.
> {code}  {"a": "b", "c": [ {"e": "f"}, {"e": "g"} , {"h": "i"} ]} {code}
>  Incase my query is for all the children of "a:b", which contain the key "e" 
> in them, the query will be broken in to two parts:
>  1. The parent query "a:b"
>  2. The child query "e:*".
> If the only children flag is on, the transformer will return the following 
> documents:
>  {code}[ {"e": "f"}, {"e": "g"} ]{code}
> In case the flag was not turned on(perhaps the default state), the whole 
> document hierarchy will be returned, containing only the matching children:
> {code}{"a": "b", "c": [ {"e": "f"}, {"e": "g"} ]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested

2018-09-02 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601188#comment-16601188
 ] 

mosh commented on SOLR-12685:
-

{quote}Perhaps there is a need though to only get that specific doc, and not 
the whole block, even though the document is in the block, perhaps in place 
update(I am not very familiar with those features, just to make sure)?
{quote}
Now that SOLR-12519 was committed to master, ChildDocTransformer requires 
SolrIndexSearcher.
 This requirement causes documents to skip transaction log lookup, instead 
using a SolrIndexSearcher for docId lookup.
{code:java}
// true in any situation where we have to use a realtime searcher rather then 
returning docs
// directly from the UpdateLog
final boolean mustUseRealtimeSearcher =
  // if we have filters, we need to check those against the indexed form of 
the doc
  (rb.getFilters() != null)
  || ((null != transformer) && transformer.needsSolrIndexSearcher());
{code}
{code:java}
if (mustUseRealtimeSearcher) {
// close handles to current searchers & result context
searcherInfo.clear();
resultContext = null;
ulog.openRealtimeSearcher();  // force open a new realtime searcher
o = null;  // pretend we never found this record and fall through to use 
the searcher
break;
}{code}
I am not quite sure of the performance implications of this requirement.
 In case these implications are not deemed as a limiting factor, the trigger 
for block lookups could be determined purely by 
IndexSchema#isUsableForChildDocs,
removing the need for an additional flag.
 [~dsmiley],
your insights would be of the highest of aids.

> RTG should return the whole block if schema is nested
> -
>
> Key: SOLR-12685
> URL: https://issues.apache.org/jira/browse/SOLR-12685
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12638-no-commit.patch
>
>
> Currently Solr's RealTimeGet component return the document if provided a 
> docId when consulting the index. For AtomicUpdates for child documents, RTG 
> should return the whole block when dealing with a nested schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12685) RTG should return the whole block if schema is nested

2018-09-02 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601188#comment-16601188
 ] 

mosh edited comment on SOLR-12685 at 9/2/18 12:00 PM:
--

{quote}Perhaps there is a need though to only get that specific doc, and not 
the whole block, even though the document is in the block, perhaps in place 
update(I am not very familiar with those features, just to make sure)?
{quote}
Now that SOLR-12519 was committed to master, ChildDocTransformer requires 
SolrIndexSearcher.
 This requirement causes documents to skip transaction log lookup, instead 
using a SolrIndexSearcher for docId lookup.
{code:java}
// true in any situation where we have to use a realtime searcher rather then 
returning docs
// directly from the UpdateLog
final boolean mustUseRealtimeSearcher =
  // if we have filters, we need to check those against the indexed form of 
the doc
  (rb.getFilters() != null)
  || ((null != transformer) && transformer.needsSolrIndexSearcher());
{code}
{code:java}
if (mustUseRealtimeSearcher) {
// close handles to current searchers & result context
searcherInfo.clear();
resultContext = null;
ulog.openRealtimeSearcher();  // force open a new realtime searcher
o = null;  // pretend we never found this record and fall through to use 
the searcher
break;
}{code}
I am not quite sure of the performance implications of this requirement.
 In case these implications are not deemed as a limiting factor, the trigger 
for block lookups under RTG component could be determined purely by the 
combination of IndexSchema#isUsableForChildDocs in addition to the existence of 
_root_ field for the specific doc, thus removing the need for an additional 
flag.
 [~dsmiley],
 your insights would be of the highest of aids.


was (Author: moshebla):
{quote}Perhaps there is a need though to only get that specific doc, and not 
the whole block, even though the document is in the block, perhaps in place 
update(I am not very familiar with those features, just to make sure)?
{quote}
Now that SOLR-12519 was committed to master, ChildDocTransformer requires 
SolrIndexSearcher.
 This requirement causes documents to skip transaction log lookup, instead 
using a SolrIndexSearcher for docId lookup.
{code:java}
// true in any situation where we have to use a realtime searcher rather then 
returning docs
// directly from the UpdateLog
final boolean mustUseRealtimeSearcher =
  // if we have filters, we need to check those against the indexed form of 
the doc
  (rb.getFilters() != null)
  || ((null != transformer) && transformer.needsSolrIndexSearcher());
{code}
{code:java}
if (mustUseRealtimeSearcher) {
// close handles to current searchers & result context
searcherInfo.clear();
resultContext = null;
ulog.openRealtimeSearcher();  // force open a new realtime searcher
o = null;  // pretend we never found this record and fall through to use 
the searcher
break;
}{code}
I am not quite sure of the performance implications of this requirement.
 In case these implications are not deemed as a limiting factor, the trigger 
for block lookups could be determined purely by 
IndexSchema#isUsableForChildDocs,
removing the need for an additional flag.
 [~dsmiley],
your insights would be of the highest of aids.

> RTG should return the whole block if schema is nested
> -
>
> Key: SOLR-12685
> URL: https://issues.apache.org/jira/browse/SOLR-12685
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12638-no-commit.patch
>
>
> Currently Solr's RealTimeGet component return the document if provided a 
> docId when consulting the index. For AtomicUpdates for child documents, RTG 
> should return the whole block when dealing with a nested schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12685) RTG should return the whole block if schema is nested

2018-09-02 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601188#comment-16601188
 ] 

mosh edited comment on SOLR-12685 at 9/2/18 12:19 PM:
--

{quote}Perhaps there is a need though to only get that specific doc, and not 
the whole block, even though the document is in the block, perhaps in place 
update(I am not very familiar with those features, just to make sure)?
{quote}
Now that SOLR-12519 was committed to master, ChildDocTransformer requires 
SolrIndexSearcher.
 This requirement causes documents to skip transaction log lookup, instead 
using a SolrIndexSearcher for docId lookup.
{code:java}
// true in any situation where we have to use a realtime searcher rather then 
returning docs
// directly from the UpdateLog
final boolean mustUseRealtimeSearcher =
  // if we have filters, we need to check those against the indexed form of 
the doc
  (rb.getFilters() != null)
  || ((null != transformer) && transformer.needsSolrIndexSearcher());
{code}
{code:java}
if (mustUseRealtimeSearcher) {
// close handles to current searchers & result context
searcherInfo.clear();
resultContext = null;
ulog.openRealtimeSearcher();  // force open a new realtime searcher
o = null;  // pretend we never found this record and fall through to use 
the searcher
break;
}{code}
I am not quite sure of the performance implications of this requirement.
 In case these implications are not deemed as a limiting factor, the trigger 
for block lookups under RTG component could be determined purely by the 
combination of IndexSchema#isUsableForChildDocs in addition to the existence of 
_root_ field for the specific doc, thus removing the need for an additional 
flag.
Otherwise, due to performance justifications, an additional flag might be 
required.
 [~dsmiley],
 your insights would be of the highest of aids.


was (Author: moshebla):
{quote}Perhaps there is a need though to only get that specific doc, and not 
the whole block, even though the document is in the block, perhaps in place 
update(I am not very familiar with those features, just to make sure)?
{quote}
Now that SOLR-12519 was committed to master, ChildDocTransformer requires 
SolrIndexSearcher.
 This requirement causes documents to skip transaction log lookup, instead 
using a SolrIndexSearcher for docId lookup.
{code:java}
// true in any situation where we have to use a realtime searcher rather then 
returning docs
// directly from the UpdateLog
final boolean mustUseRealtimeSearcher =
  // if we have filters, we need to check those against the indexed form of 
the doc
  (rb.getFilters() != null)
  || ((null != transformer) && transformer.needsSolrIndexSearcher());
{code}
{code:java}
if (mustUseRealtimeSearcher) {
// close handles to current searchers & result context
searcherInfo.clear();
resultContext = null;
ulog.openRealtimeSearcher();  // force open a new realtime searcher
o = null;  // pretend we never found this record and fall through to use 
the searcher
break;
}{code}
I am not quite sure of the performance implications of this requirement.
 In case these implications are not deemed as a limiting factor, the trigger 
for block lookups under RTG component could be determined purely by the 
combination of IndexSchema#isUsableForChildDocs in addition to the existence of 
_root_ field for the specific doc, thus removing the need for an additional 
flag. Otherwise, due to performance justifications, an additional flag might be 
required.
 [~dsmiley],
 your insights would be of the highest of aids.

> RTG should return the whole block if schema is nested
> -
>
> Key: SOLR-12685
> URL: https://issues.apache.org/jira/browse/SOLR-12685
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12638-no-commit.patch
>
>
> Currently Solr's RealTimeGet component return the document if provided a 
> docId when consulting the index. For AtomicUpdates for child documents, RTG 
> should return the whole block when dealing with a nested schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12685) RTG should return the whole block if schema is nested

2018-09-02 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601188#comment-16601188
 ] 

mosh edited comment on SOLR-12685 at 9/2/18 12:19 PM:
--

{quote}Perhaps there is a need though to only get that specific doc, and not 
the whole block, even though the document is in the block, perhaps in place 
update(I am not very familiar with those features, just to make sure)?
{quote}
Now that SOLR-12519 was committed to master, ChildDocTransformer requires 
SolrIndexSearcher.
 This requirement causes documents to skip transaction log lookup, instead 
using a SolrIndexSearcher for docId lookup.
{code:java}
// true in any situation where we have to use a realtime searcher rather then 
returning docs
// directly from the UpdateLog
final boolean mustUseRealtimeSearcher =
  // if we have filters, we need to check those against the indexed form of 
the doc
  (rb.getFilters() != null)
  || ((null != transformer) && transformer.needsSolrIndexSearcher());
{code}
{code:java}
if (mustUseRealtimeSearcher) {
// close handles to current searchers & result context
searcherInfo.clear();
resultContext = null;
ulog.openRealtimeSearcher();  // force open a new realtime searcher
o = null;  // pretend we never found this record and fall through to use 
the searcher
break;
}{code}
I am not quite sure of the performance implications of this requirement.
 In case these implications are not deemed as a limiting factor, the trigger 
for block lookups under RTG component could be determined purely by the 
combination of IndexSchema#isUsableForChildDocs in addition to the existence of 
_root_ field for the specific doc, thus removing the need for an additional 
flag. Otherwise, due to performance justifications, an additional flag might be 
required.
 [~dsmiley],
 your insights would be of the highest of aids.


was (Author: moshebla):
{quote}Perhaps there is a need though to only get that specific doc, and not 
the whole block, even though the document is in the block, perhaps in place 
update(I am not very familiar with those features, just to make sure)?
{quote}
Now that SOLR-12519 was committed to master, ChildDocTransformer requires 
SolrIndexSearcher.
 This requirement causes documents to skip transaction log lookup, instead 
using a SolrIndexSearcher for docId lookup.
{code:java}
// true in any situation where we have to use a realtime searcher rather then 
returning docs
// directly from the UpdateLog
final boolean mustUseRealtimeSearcher =
  // if we have filters, we need to check those against the indexed form of 
the doc
  (rb.getFilters() != null)
  || ((null != transformer) && transformer.needsSolrIndexSearcher());
{code}
{code:java}
if (mustUseRealtimeSearcher) {
// close handles to current searchers & result context
searcherInfo.clear();
resultContext = null;
ulog.openRealtimeSearcher();  // force open a new realtime searcher
o = null;  // pretend we never found this record and fall through to use 
the searcher
break;
}{code}
I am not quite sure of the performance implications of this requirement.
 In case these implications are not deemed as a limiting factor, the trigger 
for block lookups under RTG component could be determined purely by the 
combination of IndexSchema#isUsableForChildDocs in addition to the existence of 
_root_ field for the specific doc, thus removing the need for an additional 
flag.
 [~dsmiley],
 your insights would be of the highest of aids.

> RTG should return the whole block if schema is nested
> -
>
> Key: SOLR-12685
> URL: https://issues.apache.org/jira/browse/SOLR-12685
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12638-no-commit.patch
>
>
> Currently Solr's RealTimeGet component return the document if provided a 
> docId when consulting the index. For AtomicUpdates for child documents, RTG 
> should return the whole block when dealing with a nested schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Deleted] (SOLR-12685) RTG should return the whole block if schema is nested

2018-09-02 Thread mosh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-12685:

Comment: was deleted

(was: Please correct me if mistaken.
I was thinking RTG should not return the whole block when queried directly by 
the RTG handler, but rather should explicitly perform these checks when running 
_RealTimeGetComponent#_getInputDocument, which is used by 
AtomicUpdateDocumentMerger.)

> RTG should return the whole block if schema is nested
> -
>
> Key: SOLR-12685
> URL: https://issues.apache.org/jira/browse/SOLR-12685
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12638-no-commit.patch
>
>
> Currently Solr's RealTimeGet component return the document if provided a 
> docId when consulting the index. For AtomicUpdates for child documents, RTG 
> should return the whole block when dealing with a nested schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested

2018-09-02 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601759#comment-16601759
 ] 

mosh commented on SOLR-12685:
-

Please correct me if mistaken.
I was thinking RTG should not return the whole block when queried directly by 
the RTG handler, but rather should explicitly perform these checks when running 
_RealTimeGetComponent#_getInputDocument, which is used by 
AtomicUpdateDocumentMerger.

> RTG should return the whole block if schema is nested
> -
>
> Key: SOLR-12685
> URL: https://issues.apache.org/jira/browse/SOLR-12685
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12638-no-commit.patch
>
>
> Currently Solr's RealTimeGet component return the document if provided a 
> docId when consulting the index. For AtomicUpdates for child documents, RTG 
> should return the whole block when dealing with a nested schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested

2018-09-03 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601844#comment-16601844
 ] 

mosh commented on SOLR-12685:
-

Please correct me if mistaken.
 I was thinking RTG should not return the whole block when queried directly by 
the RTG handler, but rather should explicitly perform these checks when running 
RealTimeGetComponent#getInputDocument, which is used by 
AtomicUpdateDocumentMerger.
{code:java}
SolrInputDocument oldDocument = RealTimeGetComponent.getInputDocument
  (cmd.getReq().getCore(), idBytes,
   null, // don't want the version to be returned
   true, // avoid stored fields from index
   updatedFields,
   true); // resolve the full document{code}
Unless, of course, RTG block lookup is needed by the replication process, 
which, unfortunately, I am unfamiliar with.
Running through the code it seems like the transaction log lookup is written in 
RealTimeGetComponent#getInputDocumentFromTlog and in process, twice.
We could leverage that to ensure AtomicUpdateDocumentMerger gets the block when 
needed, avoiding further collision and interference with the RealTimeGetHandler.

> RTG should return the whole block if schema is nested
> -
>
> Key: SOLR-12685
> URL: https://issues.apache.org/jira/browse/SOLR-12685
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12638-no-commit.patch
>
>
> Currently Solr's RealTimeGet component return the document if provided a 
> docId when consulting the index. For AtomicUpdates for child documents, RTG 
> should return the whole block when dealing with a nested schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   3   4   >