[jira] [Commented] (SOLR-5944) Support updates of numeric DocValues
[ https://issues.apache.org/jira/browse/SOLR-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194359#comment-15194359 ] Gopal Patwa commented on SOLR-5944: --- Great! to see progress and near to completion this feature. can this be part of 6.0 release? > Support updates of numeric DocValues > > > Key: SOLR-5944 > URL: https://issues.apache.org/jira/browse/SOLR-5944 > Project: Solr > Issue Type: New Feature >Reporter: Ishan Chattopadhyaya >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, > SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, > SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, > SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, > SOLR-5944.patch, SOLR-5944.patch > > > LUCENE-5189 introduced support for updates to numeric docvalues. It would be > really nice to have Solr support this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8176) Model distributed graph traversals with Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194132#comment-15194132 ] Gopal Patwa commented on SOLR-8176: --- Kevin, I am also interested in your solution using GraphQuery with Kafka > Model distributed graph traversals with Streaming Expressions > - > > Key: SOLR-8176 > URL: https://issues.apache.org/jira/browse/SOLR-8176 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrCloud, SolrJ >Affects Versions: master >Reporter: Joel Bernstein > Labels: Graph > Fix For: master > > > I think it would be useful to model a few *distributed graph traversal* use > cases with Solr's *Streaming Expression* language. This ticket will explore > different approaches with a goal of implementing two or three common graph > traversal use cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7341) xjoin - join data from external sources
[ https://issues.apache.org/jira/browse/SOLR-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038800#comment-15038800 ] Gopal Patwa commented on SOLR-7341: --- [~Tomjon] could you make a patch working with 5.x? we are exploring this option but we are on 5.3.1 version > xjoin - join data from external sources > --- > > Key: SOLR-7341 > URL: https://issues.apache.org/jira/browse/SOLR-7341 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 4.10.3 >Reporter: Tom Winch >Priority: Minor > Fix For: Trunk > > Attachments: SOLR-7341.patch, SOLR-7341.patch, SOLR-7341.patch, > SOLR-7341.patch, SOLR-7341.patch, SOLR-7341.patch, SOLR-7341.patch-trunk, > SOLR-7341.patch-trunk, SOLR-7341.patch-trunk > > > h2. XJoin > The "xjoin" SOLR contrib allows external results to be joined with SOLR > results in a query and the SOLR result set to be filtered by the results of > an external query. Values from the external results are made available in the > SOLR results and may also be used to boost the scores of corresponding > documents during the search. The contrib consists of the Java classes > XJoinSearchComponent, XJoinValueSourceParser and XJoinQParserPlugin (and > associated classes), which must be configured in solrconfig.xml, and the > interfaces XJoinResultsFactory and XJoinResults, which are implemented by the > user to provide the link between SOLR and the external results source. > External results and SOLR documents are matched via a single configurable > attribute (the "join field"). The contrib JAR solr-xjoin-4.10.3.jar contains > these classes and interfaces and should be included in SOLR's class path from > solrconfig.xml, as should a JAR containing the user implementations of the > previously mentioned interfaces. For example: > {code:xml} > > .. > >/> > .. > > > .. > > {code} > h2. Java classes and interfaces > h3. XJoinResultsFactory > The user implementation of this interface is responsible for connecting to an > external source to perform a query (or otherwise collect results). Parameters > with prefix ".external." are passed from the SOLR query URL > to pararameterise the search. The interface has the following methods: > * void init(NamedList args) - this is called during SOLR initialisation, and > passed parameters from the search component configuration (see below) > * XJoinResults getResults(SolrParams params) - this is called during a SOLR > search to generate external results, and is passed parameters from the SOLR > query URL (as above) > For example, the implementation might perform queries of an external source > based on the 'q' SOLR query URL parameter (in full, name>.external.q). > h3. XJoinResults > A user implementation of this interface is returned by the getResults() > method of the XJoinResultsFactory implementation. It has methods: > * Object getResult(String joinId) - this should return a particular result > given the value of the join attribute > * Iterable getJoinIds() - this should return an ordered (ascending) > list of the join attribute values for all results of the external search > h3. XJoinSearchComponent > This is the central Java class of the contrib. It is a SOLR search component, > configured in solrconfig.xml and included in one or more SOLR request > handlers. There is one XJoin search component per external source, and each > has two main responsibilities: > * Before the SOLR search, it connects to the external source and retrieves > results, storing them in the SOLR request context > * After the SOLR search, it matches SOLR document in the results set and > external results via the join field, adding attributes from the external > results to documents in the SOLR results set > It takes the following initialisation parameters: > * factoryClass - this specifies the user-supplied class implementing > XJoinResultsFactory, used to generate external results > * joinField - this specifies the attribute on which to join between SOLR > documents and external results > * external - this parameter set is passed to configure the > XJoinResultsFactory implementation > For example, in solrconfig.xml: > {code:xml} > class="org.apache.solr.search.xjoin.XJoinSearchComponent"> > test.TestXJoinResultsFactory > id > > 1,2,3 > > > {code} > Here, the search component instantiates a new TextXJoinResultsFactory during > initialisation, and passes it the "values" parameter (1, 2, 3) to configure > it. To properly use the XJoinSearchComponent in a request handler, it must be > included at the start and end of the component list, and may be configured > with the following query parameters: > * results - a comma-separated list of attributes from the XJoinResults > implementation
[jira] [Commented] (SOLR-4854) Query elevation [elevated] field always false with java binary communication
[ https://issues.apache.org/jira/browse/SOLR-4854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971426#comment-14971426 ] Gopal Patwa commented on SOLR-4854: --- Thanks Shalin, we will add test and steps reproduce soon > Query elevation [elevated] field always false with java binary communication > > > Key: SOLR-4854 > URL: https://issues.apache.org/jira/browse/SOLR-4854 > Project: Solr > Issue Type: Bug > Components: clients - java >Affects Versions: 4.3 > Environment: tomcat 6.0.33, java 1.6.0_26_x64, solrj 4.3 >Reporter: Istvan Hegedus > Attachments: SOLR-4854.patch > > > With XMLResponseParser there is no problem, but with default > BinaryResponseWriter [elevated] is always false. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648285#comment-14648285 ] Gopal Patwa commented on SOLR-4787: --- if this is not committed to solr 5.x, does anyone has this join patch for solr 5.x ? Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: Trunk Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787-with-testcase-fix.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the hjoin. queryParser name=hjoin class=org.apache.solr.joins.HashSetJoinQParserPlugin/ And the join contrib lib jars must be registed in the solrconfig.xml. lib dir=../../../contrib/joins/lib regex=.*\.jar / After issuing the ant dist command from inside the solr directory the joins contrib jar will
[jira] [Commented] (SOLR-5961) Solr gets crazy on /overseer/queue state change
[ https://issues.apache.org/jira/browse/SOLR-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311681#comment-14311681 ] Gopal Patwa commented on SOLR-5961: --- Thanks Mark, here more details for our production issue, I will try to reproduce this issue. Restart sequence: Solr - Restarted 02/03/3015 (8 Nodes, 10 Collection) ZKR - Restarted 02/04/2015 (5 Nodes) Normal Index Size are approx 5GB. Only few nodes had this issue When replica was in recovery, transaction logs size was over 100GB. Possible reason it starts writing all updates sent by the leader in this period to the transaction log . Due to overseer queue size large, Admin UI Cloud tree view hangs, may be similar to below jira issue https://issues.apache.org/jira/browse/SOLR-6395 Exceptions During this time: Zookeeper Log: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /overseer/queue Solr Log: 2015-02-05 23:23:13,174 [] priority=ERROR app_name= thread=RecoveryThread location=RecoveryStrategy line=142 Error while trying to recover. core=city_shard1_replica2:java.util.concurrent.ExecutionException: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: I was asked to wait on state recovering for shard1 in city on srwp01usc002.stubprod.com:8080_solr but I still do not see the requested state. I see state: active live:true leader from ZK: http://srwp01usc001.stubprod.com:8080/solr/city_shard1_replica1/ at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:615) Solr gets crazy on /overseer/queue state change --- Key: SOLR-5961 URL: https://issues.apache.org/jira/browse/SOLR-5961 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7.1 Environment: CentOS, 1 shard - 3 replicas, ZK cluster with 3 nodes (separate machines) Reporter: Maxim Novikov Assignee: Shalin Shekhar Mangar Priority: Critical No idea how to reproduce it, but sometimes Solr stars littering the log with the following messages: 419158 [localhost-startStop-1-EventThread] INFO org.apache.solr.cloud.DistributedQueue ? LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged 419190 [Thread-3] INFO org.apache.solr.cloud.Overseer ? Update state numShards=1 message={ operation:state, state:recovering, base_url:http://${IP_ADDRESS}/solr;, core:${CORE_NAME}, roles:null, node_name:${NODE_NAME}_solr, shard:shard1, collection:${COLLECTION_NAME}, numShards:1, core_node_name:core_node2} It continues spamming these messages with no delay and the restarting of all the nodes does not help. I have even tried to stop all the nodes in the cluster first, but then when I start one, the behavior doesn't change, it gets crazy nuts with this /overseer/queue state again. PS The only way to handle this was to stop everything, manually clean up all the data in ZooKeeper related to Solr, and then rebuild everything from scratch. As you should understand, it is kinda unbearable in the production environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5961) Solr gets crazy on /overseer/queue state change
[ https://issues.apache.org/jira/browse/SOLR-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308716#comment-14308716 ] Gopal Patwa commented on SOLR-5961: --- we also had similar problem today as Ugo mention in our Production system, this was cause after machine reboot for zookeeper (5 node) and 8 node solr cloud (single shard) to install some unix security patch. JDK 7, Solr 4.10.3, CentOS But after reboot, we saw huge amount of message were in overseer/queue ./zkCli.sh -server localhost:2181 ls /search/catalog/overseer/queue | sed 's/,/\n/g' | wc -l 178587 We have very small cluster (8 nodes), how come overseer/queue should have 17k+ messages, due to this leader node took almost few hours to come from recovery. Logs from zookeeper: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /overseer/queue Solr gets crazy on /overseer/queue state change --- Key: SOLR-5961 URL: https://issues.apache.org/jira/browse/SOLR-5961 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7.1 Environment: CentOS, 1 shard - 3 replicas, ZK cluster with 3 nodes (separate machines) Reporter: Maxim Novikov Assignee: Shalin Shekhar Mangar Priority: Critical No idea how to reproduce it, but sometimes Solr stars littering the log with the following messages: 419158 [localhost-startStop-1-EventThread] INFO org.apache.solr.cloud.DistributedQueue ? LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged 419190 [Thread-3] INFO org.apache.solr.cloud.Overseer ? Update state numShards=1 message={ operation:state, state:recovering, base_url:http://${IP_ADDRESS}/solr;, core:${CORE_NAME}, roles:null, node_name:${NODE_NAME}_solr, shard:shard1, collection:${COLLECTION_NAME}, numShards:1, core_node_name:core_node2} It continues spamming these messages with no delay and the restarting of all the nodes does not help. I have even tried to stop all the nodes in the cluster first, but then when I start one, the behavior doesn't change, it gets crazy nuts with this /overseer/queue state again. PS The only way to handle this was to stop everything, manually clean up all the data in ZooKeeper related to Solr, and then rebuild everything from scratch. As you should understand, it is kinda unbearable in the production environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5944) Support updates of numeric DocValues
[ https://issues.apache.org/jira/browse/SOLR-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258857#comment-14258857 ] Gopal Patwa commented on SOLR-5944: --- not sure if this patch is complete but it would be nice to have this in 5.0 Support updates of numeric DocValues Key: SOLR-5944 URL: https://issues.apache.org/jira/browse/SOLR-5944 Project: Solr Issue Type: New Feature Reporter: Ishan Chattopadhyaya Assignee: Shalin Shekhar Mangar Attachments: SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch LUCENE-5189 introduced support for updates to numeric docvalues. It would be really nice to have Solr support this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4792) stop shipping a war in 5.0
[ https://issues.apache.org/jira/browse/SOLR-4792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229968#comment-14229968 ] Gopal Patwa commented on SOLR-4792: --- +1 to keep war in maven repo if possible, so user like us can easily migrate to solr 5.0, we deploy solr as war since company provide infrastructure as tomcat for deployment. It will take some time to change to other deployment model. stop shipping a war in 5.0 -- Key: SOLR-4792 URL: https://issues.apache.org/jira/browse/SOLR-4792 Project: Solr Issue Type: Task Components: Build Reporter: Robert Muir Assignee: Mark Miller Fix For: 5.0, Trunk Attachments: SOLR-4792.patch see the vote on the developer list. This is the first step: if we stop shipping a war then we are free to do anything we want. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5963) Finalize interface and backport analytics component to 4x
[ https://issues.apache.org/jira/browse/SOLR-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050764#comment-14050764 ] Gopal Patwa commented on SOLR-5963: --- Hi Eric, could you add this patch as contrib to 4.x, so other folks can use it. I tried applying this patch to 4.9 but did not work may be it was created before 4.9 Finalize interface and backport analytics component to 4x - Key: SOLR-5963 URL: https://issues.apache.org/jira/browse/SOLR-5963 Project: Solr Issue Type: Improvement Affects Versions: 4.9, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-5963.patch, SOLR-5963.patch Now that we seem to have fixed up the test failures for trunk for the analytics component, we need to solidify the API and back-port it to 4x. For history, see SOLR-5302 and SOLR-5488. As far as I know, these are the merges that need to occur to do this (plus any that this JIRA brings up) svn merge -c 1543651 https://svn.apache.org/repos/asf/lucene/dev/trunk svn merge -c 1545009 https://svn.apache.org/repos/asf/lucene/dev/trunk svn merge -c 1545053 https://svn.apache.org/repos/asf/lucene/dev/trunk svn merge -c 1545054 https://svn.apache.org/repos/asf/lucene/dev/trunk svn merge -c 1545080 https://svn.apache.org/repos/asf/lucene/dev/trunk svn merge -c 1545143 https://svn.apache.org/repos/asf/lucene/dev/trunk svn merge -c 1545417 https://svn.apache.org/repos/asf/lucene/dev/trunk svn merge -c 1545514 https://svn.apache.org/repos/asf/lucene/dev/trunk svn merge -c 1545650 https://svn.apache.org/repos/asf/lucene/dev/trunk svn merge -c 1546074 https://svn.apache.org/repos/asf/lucene/dev/trunk svn merge -c 1546263 https://svn.apache.org/repos/asf/lucene/dev/trunk svn merge -c 1559770 https://svn.apache.org/repos/asf/lucene/dev/trunk svn merge -c 1583636 https://svn.apache.org/repos/asf/lucene/dev/trunk The only remaining thing I think needs to be done is to solidify the interface, see comments from [~yo...@apache.org] on the two JIRAs mentioned, although SOLR-5488 is the most relevant one. [~sbower], [~houstonputman] and [~yo...@apache.org] might be particularly interested here. I really want to put this to bed, so if we can get agreement on this soon I can make it march. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941860#comment-13941860 ] Gopal Patwa commented on SOLR-4787: --- Thanks Kranti, here is my usecase Event Collection: eventId=1 title=Lady Gaga date=06/03/2014 EventTicketStats Collection eventId=1 minPrice=200 minQuantity=5 When user search for lady gaga on event document using hjoin with EventTicketStats then result should include min price and qty data from join core. Final Result for Event Collection: eventId=1 title=Lady Gaga date=06/03/2014 minPrice=200 minQuantity=5 And user has option to filter result for price and qty like show events for minPrice 100 The reason we have EventStats in separate document that our ticket data changes every 5 seconds but Event data changes are like twice a day I thought using Updatable Numeric DocValue after denormalizing Event document with min price and qty fields But Solr does not have support for that feature yet. So I need to rely on using join Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.8 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941359#comment-13941359 ] Gopal Patwa commented on SOLR-4787: --- I am trying to use this patch (27/Jan/14 12:26) but getting below exception using hjoin Solr 4.6 Schema.xml id field between parent and child field name=id type=long indexed=true stored=true required=true multiValued=false/ Caused by: java.lang.IllegalStateException: Type mismatch: id was indexed as SORTED at org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:896) at org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:875) at org.apache.solr.joins.HashSetJoinQParserPlugin$LongJoinRunner.init(HashSetJoinQParserPlugin.java:504) at org.apache.solr.joins.HashSetJoinQParserPlugin$LongHashSetJoinQuery.createWeight(HashSetJoinQParserPlugin.java:313) Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.8 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will
[jira] [Comment Edited] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941359#comment-13941359 ] Gopal Patwa edited comment on SOLR-4787 at 3/20/14 4:54 AM: I am trying to use this patch (27/Jan/14 12:26) using hjoin and mismatch type issue was resolved it was my bad, I had join id with different type. Is it possible to collect data from hjoin collection i.e fromIndex and append to main query result? In my usecase I need to use hjoin and also show fields from fromIndex. was (Author: gpatwa): I am trying to use this patch (27/Jan/14 12:26) but getting below exception using hjoin Solr 4.6 Schema.xml id field between parent and child field name=id type=long indexed=true stored=true required=true multiValued=false/ Caused by: java.lang.IllegalStateException: Type mismatch: id was indexed as SORTED at org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:896) at org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:875) at org.apache.solr.joins.HashSetJoinQParserPlugin$LongJoinRunner.init(HashSetJoinQParserPlugin.java:504) at org.apache.solr.joins.HashSetJoinQParserPlugin$LongHashSetJoinQuery.createWeight(HashSetJoinQParserPlugin.java:313) Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.8 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin
[jira] [Commented] (SOLR-3855) DocValues support
[ https://issues.apache.org/jira/browse/SOLR-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13580021#comment-13580021 ] Gopal Patwa commented on SOLR-3855: --- Is there an example or test case to update DocValues field without updating index or reopening index searcher? is this even possible? DocValues support - Key: SOLR-3855 URL: https://issues.apache.org/jira/browse/SOLR-3855 Project: Solr Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 4.2, 5.0 Attachments: SOLR-3855.patch, SOLR-3855.patch, SOLR-3855.patch, SOLR-3855.patch, SOLR-3855.patch, SOLR-3855.patch, SOLR-3855.patch, SOLR-3855.patch It would be nice if Solr supported DocValues: - for ID fields (fewer disk seeks when running distributed search), - for sorting/faceting/function queries (faster warmup time than fieldcache), - better on-disk and in-memory efficiency (you can use packed impls). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org