[jira] [Commented] (SOLR-7734) MapReduce Indexer can error when using collection
[ https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982787#comment-14982787 ] Mike Drob commented on SOLR-7734: - [~gchanan] - did you have any other feedback on this? > MapReduce Indexer can error when using collection > - > > Key: SOLR-7734 > URL: https://issues.apache.org/jira/browse/SOLR-7734 > Project: Solr > Issue Type: Bug > Components: contrib - MapReduce >Affects Versions: 5.2.1 >Reporter: Mike Drob >Assignee: Gregory Chanan > Fix For: 5.4, Trunk > > Attachments: SOLR-7734.branch5x.patch, SOLR-7734.patch, > SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, > SOLR-7734.patch, SOLR-7734.patch > > > When running the MapReduceIndexerTool, it will usually pull a > {{solrconfig.xml}} from ZK for the collection that it is running against. > This can be problematic for several reasons: > * Performance: The configuration in ZK will likely have several query > handlers, and lots of other components that don't make sense in an > indexing-only use of EmbeddedSolrServer (ESS). > * Classpath Resources: If the Solr services are using some kind of additional > service (such as Sentry for auth) then the indexer will not have access to > the necessary configurations without the user jumping through several hoops. > * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make > sense. There's other configurations that > * Update Chain Behaviours: I'm under the impression that UpdateChains may > behave differently in ESS than a SolrCloud cluster. Is it safe to depend on > consistent behaviour here? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7734) MapReduce Indexer can error when using collection
[ https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706781#comment-14706781 ] Mike Drob commented on SOLR-7734: - The patch applied cleanly to branch_5x for me, and the tests ran without issue. Is there something specific I can check? MapReduce Indexer can error when using collection - Key: SOLR-7734 URL: https://issues.apache.org/jira/browse/SOLR-7734 Project: Solr Issue Type: Bug Components: contrib - MapReduce Affects Versions: 5.2.1 Reporter: Mike Drob Assignee: Gregory Chanan Fix For: Trunk, 5.4 Attachments: SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch When running the MapReduceIndexerTool, it will usually pull a {{solrconfig.xml}} from ZK for the collection that it is running against. This can be problematic for several reasons: * Performance: The configuration in ZK will likely have several query handlers, and lots of other components that don't make sense in an indexing-only use of EmbeddedSolrServer (ESS). * Classpath Resources: If the Solr services are using some kind of additional service (such as Sentry for auth) then the indexer will not have access to the necessary configurations without the user jumping through several hoops. * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make sense. There's other configurations that * Update Chain Behaviours: I'm under the impression that UpdateChains may behave differently in ESS than a SolrCloud cluster. Is it safe to depend on consistent behaviour here? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7734) MapReduce Indexer can error when using collection
[ https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706037#comment-14706037 ] Gregory Chanan commented on SOLR-7734: -- +1 lgtm. I'll commit to trunk assuming the tests/precommit pass. If you want it in 5.x as well please create a new patch (probably need to change the xml for the version). MapReduce Indexer can error when using collection - Key: SOLR-7734 URL: https://issues.apache.org/jira/browse/SOLR-7734 Project: Solr Issue Type: Bug Components: contrib - MapReduce Affects Versions: 5.2.1 Reporter: Mike Drob Assignee: Gregory Chanan Fix For: Trunk, 5.4 Attachments: SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch When running the MapReduceIndexerTool, it will usually pull a {{solrconfig.xml}} from ZK for the collection that it is running against. This can be problematic for several reasons: * Performance: The configuration in ZK will likely have several query handlers, and lots of other components that don't make sense in an indexing-only use of EmbeddedSolrServer (ESS). * Classpath Resources: If the Solr services are using some kind of additional service (such as Sentry for auth) then the indexer will not have access to the necessary configurations without the user jumping through several hoops. * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make sense. There's other configurations that * Update Chain Behaviours: I'm under the impression that UpdateChains may behave differently in ESS than a SolrCloud cluster. Is it safe to depend on consistent behaviour here? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7734) MapReduce Indexer can error when using collection
[ https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702146#comment-14702146 ] Gregory Chanan commented on SOLR-7734: -- bq. 3) stopwords, synonyms, and protwords were all referenced from schema.xml via the text_en type. I copied that from the other mr conf, though. Can look into removing this if you think that's a worthwhile effort. I saw stopwords_en.txt being referenced, not stopwords. But I just looked quickly -- if you try to remove and it doesn't work just leave it in. bq. 4) You lost some of the leading spaces when copying to JIRA, but it's 4 spaces for a continuation (the try resource declaration) and then 2 spaces for the body of the try. What should it be? My mistake. MapReduce Indexer can error when using collection - Key: SOLR-7734 URL: https://issues.apache.org/jira/browse/SOLR-7734 Project: Solr Issue Type: Bug Components: contrib - MapReduce Affects Versions: 5.2.1 Reporter: Mike Drob Assignee: Gregory Chanan Fix For: Trunk, 5.4 Attachments: SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch When running the MapReduceIndexerTool, it will usually pull a {{solrconfig.xml}} from ZK for the collection that it is running against. This can be problematic for several reasons: * Performance: The configuration in ZK will likely have several query handlers, and lots of other components that don't make sense in an indexing-only use of EmbeddedSolrServer (ESS). * Classpath Resources: If the Solr services are using some kind of additional service (such as Sentry for auth) then the indexer will not have access to the necessary configurations without the user jumping through several hoops. * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make sense. There's other configurations that * Update Chain Behaviours: I'm under the impression that UpdateChains may behave differently in ESS than a SolrCloud cluster. Is it safe to depend on consistent behaviour here? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7734) MapReduce Indexer can error when using collection
[ https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702126#comment-14702126 ] Mike Drob commented on SOLR-7734: - 1) Renamed to {{MiniMRBase.java}} 2) Fixed. 3) {{stopwords}}, {{synonyms}}, and {{protwords}} were all referenced from {{schema.xml}} via the {{text_en}} type. I copied that from the other mr conf, though. Can look into removing this if you think that's a worthwhile effort. 4) You lost some of the leading spaces when copying to JIRA, but it's 4 spaces for a continuation (the try resource declaration) and then 2 spaces for the body of the try. What should it be? 5) Fixed. Will upload a new patch once I finish re-running tests with these changes. MapReduce Indexer can error when using collection - Key: SOLR-7734 URL: https://issues.apache.org/jira/browse/SOLR-7734 Project: Solr Issue Type: Bug Components: contrib - MapReduce Affects Versions: 5.2.1 Reporter: Mike Drob Assignee: Gregory Chanan Fix For: Trunk, 5.4 Attachments: SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch When running the MapReduceIndexerTool, it will usually pull a {{solrconfig.xml}} from ZK for the collection that it is running against. This can be problematic for several reasons: * Performance: The configuration in ZK will likely have several query handlers, and lots of other components that don't make sense in an indexing-only use of EmbeddedSolrServer (ESS). * Classpath Resources: If the Solr services are using some kind of additional service (such as Sentry for auth) then the indexer will not have access to the necessary configurations without the user jumping through several hoops. * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make sense. There's other configurations that * Update Chain Behaviours: I'm under the impression that UpdateChains may behave differently in ESS than a SolrCloud cluster. Is it safe to depend on consistent behaviour here? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7734) MapReduce Indexer can error when using collection
[ https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700579#comment-14700579 ] Gregory Chanan commented on SOLR-7734: -- Looks good, I really like the new test. All of my previous comments seem to be addressed. Just a few minor issues/comments below: Issue #1: When I try ant test I get: {code} [junit4] Throwable #1: java.lang.RuntimeException: Suite class org.apache.solr.hadoop.MiniMRTest should be a concrete class (not abstract). {code} Issue #2: {code} public void useSolrHomeDir() throws Exception { String[] prepend = {--solr-home-dir= + DROPALL_CONF_DIR.getAbsolutePath()}; {code} you can't actually tell if this is going to zk or not. Maybe overwrite zk with the good version or something beforehand? Issue #3: {code} solr/contrib/morphlines-core/src/test-files/solr/dropall/conf/stopwords.txt {code} Do we need this file? I don't see it referenced. Issue #4: {code} try (InputStream source = MapReduceIndexerTool.class.getResourceAsStream(/solrconfig.indexer.xml); FileOutputStream destination = new FileOutputStream(getSolrConfig(tmpSolrHomeDir))) { ByteStreams.copy(source, destination); } LOG.debug(Replaced zookeeper's solrconfig.xml with embedded version.); {code} This spacing looks funky here. Issue #5: {code} SolrConfigMRTest {code} can you put the license first? all the other test have the license first (or after the package). I don't know if this fails the rat check or not, but seems good to be consistent. MapReduce Indexer can error when using collection - Key: SOLR-7734 URL: https://issues.apache.org/jira/browse/SOLR-7734 Project: Solr Issue Type: Bug Components: contrib - MapReduce Affects Versions: 5.2.1 Reporter: Mike Drob Assignee: Gregory Chanan Fix For: Trunk, 5.4 Attachments: SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch When running the MapReduceIndexerTool, it will usually pull a {{solrconfig.xml}} from ZK for the collection that it is running against. This can be problematic for several reasons: * Performance: The configuration in ZK will likely have several query handlers, and lots of other components that don't make sense in an indexing-only use of EmbeddedSolrServer (ESS). * Classpath Resources: If the Solr services are using some kind of additional service (such as Sentry for auth) then the indexer will not have access to the necessary configurations without the user jumping through several hoops. * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make sense. There's other configurations that * Update Chain Behaviours: I'm under the impression that UpdateChains may behave differently in ESS than a SolrCloud cluster. Is it safe to depend on consistent behaviour here? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7734) MapReduce Indexer can error when using collection
[ https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627281#comment-14627281 ] Gregory Chanan commented on SOLR-7734: -- {code}+import com.google.common.base.Charsets;{code} This is necessary? {code}+ may be downloaded from this ZooKeeper ensemble.));{code} It's may because you might have specified --use-zk-solrconfig.xml? And you want to leave it vague because the help on --use-zk-solrconfig.xml is suppressed? This seems more confusing to me than just specifying everything in the help. {code} +if (!options.useZkSolrConfig) { + // replace downloaded solrconfig.xml with embedded one + InputStream source = MapReduceIndexerTool.class.getResourceAsStream(/solrconfig.indexer.xml); + FileOutputStream destination = new FileOutputStream(getSolrConfig(tmpSolrHomeDir)); + ByteStreams.copy(source, destination); + destination.close(); + source.close(); +} {code} The spacing looks off here. Maybe better to close everything in a finally as well. {code} + solr-jarify-filesets +fileset dir=src/resources / + /solr-jarify-filesets {code} When i try to run ant jar on the map-reduce contrib I get solr/contrib/map-reduce/src/resources does not exist -- did you mean for solrconfig.indexer.xml to be there? {code} + luceneMatchVersion4.10.3/luceneMatchVersion {code} Why the old version? Should this be 6.0.0 for trunk, 5.something for branch_5x? (I assume you want it in both, tell me if that's incorrect) {code} To enable dynamic schema REST APIs, use the following for schemaFactory: + + schemaFactory class=ManagedIndexSchemaFactory + bool name=mutabletrue/bool + str name=managedSchemaResourceNamemanaged-schema/str + /schemaFactory {code} Does this work with managed schemas? What about if the resource name isn't the default? {code} + !-- JMX + + This example enables JMX if and only if an existing MBeanServer + is found, use this if you want to configure JMX through JVM + parameters. Remove this to disable exposing Solr configuration + and statistics to JMX. + + For more details see http://wiki.apache.org/solr/SolrJmx +-- + jmx / {code} Do we want jmx? Is it even possible to use in an MR job? {code}+ requestDispatcher handleSelect=false +!-- Request Parsing{code} Do we need this whole section? About testing: I assume the existing tests now use the new (non-overwrite behavior). What about adding a test for the new option (--use-zk-solrconfig.xml). Maybe something simple like have your own update chain that adds a field/value that you expect to see. And possibly the converse, where you add an update.chain and check that the new behavior is actually working, i.e. that it doesn't use the solrconfig in zk. MapReduce Indexer can error when using collection - Key: SOLR-7734 URL: https://issues.apache.org/jira/browse/SOLR-7734 Project: Solr Issue Type: Bug Components: contrib - MapReduce Affects Versions: 5.2.1 Reporter: Mike Drob Assignee: Gregory Chanan Fix For: 5.3, Trunk Attachments: SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch When running the MapReduceIndexerTool, it will usually pull a {{solrconfig.xml}} from ZK for the collection that it is running against. This can be problematic for several reasons: * Performance: The configuration in ZK will likely have several query handlers, and lots of other components that don't make sense in an indexing-only use of EmbeddedSolrServer (ESS). * Classpath Resources: If the Solr services are using some kind of additional service (such as Sentry for auth) then the indexer will not have access to the necessary configurations without the user jumping through several hoops. * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make sense. There's other configurations that * Update Chain Behaviours: I'm under the impression that UpdateChains may behave differently in ESS than a SolrCloud cluster. Is it safe to depend on consistent behaviour here? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org