[jira] [Commented] (SOLR-7734) MapReduce Indexer can error when using collection

2015-10-30 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982787#comment-14982787
 ] 

Mike Drob commented on SOLR-7734:
-

[~gchanan] - did you have any other feedback on this?

> MapReduce Indexer can error when using collection
> -
>
> Key: SOLR-7734
> URL: https://issues.apache.org/jira/browse/SOLR-7734
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - MapReduce
>Affects Versions: 5.2.1
>Reporter: Mike Drob
>Assignee: Gregory Chanan
> Fix For: 5.4, Trunk
>
> Attachments: SOLR-7734.branch5x.patch, SOLR-7734.patch, 
> SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, 
> SOLR-7734.patch, SOLR-7734.patch
>
>
> When running the MapReduceIndexerTool, it will usually pull a 
> {{solrconfig.xml}} from ZK for the collection that it is running against. 
> This can be problematic for several reasons:
> * Performance: The configuration in ZK will likely have several query 
> handlers, and lots of other components that don't make sense in an 
> indexing-only use of EmbeddedSolrServer (ESS).
> * Classpath Resources: If the Solr services are using some kind of additional 
> service (such as Sentry for auth) then the indexer will not have access to 
> the necessary configurations without the user jumping through several hoops.
> * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make 
> sense. There's other configurations that 
> * Update Chain Behaviours: I'm under the impression that UpdateChains may 
> behave differently in ESS than a SolrCloud cluster. Is it safe to depend on 
> consistent behaviour here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7734) MapReduce Indexer can error when using collection

2015-08-21 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706781#comment-14706781
 ] 

Mike Drob commented on SOLR-7734:
-

The patch applied cleanly to branch_5x for me, and the tests ran without issue. 
Is there something specific I can check?

 MapReduce Indexer can error when using collection
 -

 Key: SOLR-7734
 URL: https://issues.apache.org/jira/browse/SOLR-7734
 Project: Solr
  Issue Type: Bug
  Components: contrib - MapReduce
Affects Versions: 5.2.1
Reporter: Mike Drob
Assignee: Gregory Chanan
 Fix For: Trunk, 5.4

 Attachments: SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, 
 SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch


 When running the MapReduceIndexerTool, it will usually pull a 
 {{solrconfig.xml}} from ZK for the collection that it is running against. 
 This can be problematic for several reasons:
 * Performance: The configuration in ZK will likely have several query 
 handlers, and lots of other components that don't make sense in an 
 indexing-only use of EmbeddedSolrServer (ESS).
 * Classpath Resources: If the Solr services are using some kind of additional 
 service (such as Sentry for auth) then the indexer will not have access to 
 the necessary configurations without the user jumping through several hoops.
 * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make 
 sense. There's other configurations that 
 * Update Chain Behaviours: I'm under the impression that UpdateChains may 
 behave differently in ESS than a SolrCloud cluster. Is it safe to depend on 
 consistent behaviour here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7734) MapReduce Indexer can error when using collection

2015-08-20 Thread Gregory Chanan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706037#comment-14706037
 ] 

Gregory Chanan commented on SOLR-7734:
--

+1 lgtm.

I'll commit to trunk assuming the tests/precommit pass.  If you want it in 5.x 
as well please create a new patch  (probably need to change the xml for the 
version).

 MapReduce Indexer can error when using collection
 -

 Key: SOLR-7734
 URL: https://issues.apache.org/jira/browse/SOLR-7734
 Project: Solr
  Issue Type: Bug
  Components: contrib - MapReduce
Affects Versions: 5.2.1
Reporter: Mike Drob
Assignee: Gregory Chanan
 Fix For: Trunk, 5.4

 Attachments: SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, 
 SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch


 When running the MapReduceIndexerTool, it will usually pull a 
 {{solrconfig.xml}} from ZK for the collection that it is running against. 
 This can be problematic for several reasons:
 * Performance: The configuration in ZK will likely have several query 
 handlers, and lots of other components that don't make sense in an 
 indexing-only use of EmbeddedSolrServer (ESS).
 * Classpath Resources: If the Solr services are using some kind of additional 
 service (such as Sentry for auth) then the indexer will not have access to 
 the necessary configurations without the user jumping through several hoops.
 * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make 
 sense. There's other configurations that 
 * Update Chain Behaviours: I'm under the impression that UpdateChains may 
 behave differently in ESS than a SolrCloud cluster. Is it safe to depend on 
 consistent behaviour here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7734) MapReduce Indexer can error when using collection

2015-08-18 Thread Gregory Chanan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702146#comment-14702146
 ] 

Gregory Chanan commented on SOLR-7734:
--

bq. 3) stopwords, synonyms, and protwords were all referenced from schema.xml 
via the text_en type. I copied that from the other mr conf, though. Can look 
into removing this if you think that's a worthwhile effort.

I saw stopwords_en.txt being referenced, not stopwords.  But I just looked 
quickly -- if you try to remove and it doesn't work just leave it in.

bq. 4) You lost some of the leading spaces when copying to JIRA, but it's 4 
spaces for a continuation (the try resource declaration) and then 2 spaces for 
the body of the try. What should it be?

My mistake.

 MapReduce Indexer can error when using collection
 -

 Key: SOLR-7734
 URL: https://issues.apache.org/jira/browse/SOLR-7734
 Project: Solr
  Issue Type: Bug
  Components: contrib - MapReduce
Affects Versions: 5.2.1
Reporter: Mike Drob
Assignee: Gregory Chanan
 Fix For: Trunk, 5.4

 Attachments: SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, 
 SOLR-7734.patch, SOLR-7734.patch


 When running the MapReduceIndexerTool, it will usually pull a 
 {{solrconfig.xml}} from ZK for the collection that it is running against. 
 This can be problematic for several reasons:
 * Performance: The configuration in ZK will likely have several query 
 handlers, and lots of other components that don't make sense in an 
 indexing-only use of EmbeddedSolrServer (ESS).
 * Classpath Resources: If the Solr services are using some kind of additional 
 service (such as Sentry for auth) then the indexer will not have access to 
 the necessary configurations without the user jumping through several hoops.
 * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make 
 sense. There's other configurations that 
 * Update Chain Behaviours: I'm under the impression that UpdateChains may 
 behave differently in ESS than a SolrCloud cluster. Is it safe to depend on 
 consistent behaviour here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7734) MapReduce Indexer can error when using collection

2015-08-18 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702126#comment-14702126
 ] 

Mike Drob commented on SOLR-7734:
-

1) Renamed to {{MiniMRBase.java}}
2) Fixed.
3) {{stopwords}}, {{synonyms}}, and {{protwords}} were all referenced from 
{{schema.xml}} via the {{text_en}} type. I copied that from the other mr conf, 
though. Can look into removing this if you think that's a worthwhile effort.
4) You lost some of the leading spaces when copying to JIRA, but it's 4 spaces 
for a continuation (the try resource declaration) and then 2 spaces for the 
body of the try. What should it be?
5) Fixed.

Will upload a new patch once I finish re-running tests with these changes.

 MapReduce Indexer can error when using collection
 -

 Key: SOLR-7734
 URL: https://issues.apache.org/jira/browse/SOLR-7734
 Project: Solr
  Issue Type: Bug
  Components: contrib - MapReduce
Affects Versions: 5.2.1
Reporter: Mike Drob
Assignee: Gregory Chanan
 Fix For: Trunk, 5.4

 Attachments: SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, 
 SOLR-7734.patch, SOLR-7734.patch


 When running the MapReduceIndexerTool, it will usually pull a 
 {{solrconfig.xml}} from ZK for the collection that it is running against. 
 This can be problematic for several reasons:
 * Performance: The configuration in ZK will likely have several query 
 handlers, and lots of other components that don't make sense in an 
 indexing-only use of EmbeddedSolrServer (ESS).
 * Classpath Resources: If the Solr services are using some kind of additional 
 service (such as Sentry for auth) then the indexer will not have access to 
 the necessary configurations without the user jumping through several hoops.
 * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make 
 sense. There's other configurations that 
 * Update Chain Behaviours: I'm under the impression that UpdateChains may 
 behave differently in ESS than a SolrCloud cluster. Is it safe to depend on 
 consistent behaviour here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7734) MapReduce Indexer can error when using collection

2015-08-17 Thread Gregory Chanan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700579#comment-14700579
 ] 

Gregory Chanan commented on SOLR-7734:
--

Looks good, I really like the new test.  All of my previous comments seem to be 
addressed.  Just a few minor issues/comments below:

Issue #1:
When I try ant test I get:
{code}
   [junit4] Throwable #1: java.lang.RuntimeException: Suite class 
org.apache.solr.hadoop.MiniMRTest should be a concrete class (not abstract).
{code}

Issue #2:
{code}
public void useSolrHomeDir() throws Exception {
String[] prepend = {--solr-home-dir= + 
DROPALL_CONF_DIR.getAbsolutePath()};
{code}
you can't actually tell if this is going to zk or not.  Maybe overwrite zk with 
the good version or something beforehand?

Issue #3:
{code}
solr/contrib/morphlines-core/src/test-files/solr/dropall/conf/stopwords.txt 
{code}
Do we need this file?  I don't see it referenced.

Issue #4:
{code}
try (InputStream source = 
MapReduceIndexerTool.class.getResourceAsStream(/solrconfig.indexer.xml);
  FileOutputStream destination = new 
FileOutputStream(getSolrConfig(tmpSolrHomeDir))) {
ByteStreams.copy(source, destination);
  }
  LOG.debug(Replaced zookeeper's solrconfig.xml with embedded 
version.);
{code}
This spacing looks funky here.

Issue #5:
{code}
SolrConfigMRTest
{code}
can you put the license first?  all the other test have the license first (or 
after the package).  I don't know if this fails the rat check or not, but seems 
good to be consistent.



 MapReduce Indexer can error when using collection
 -

 Key: SOLR-7734
 URL: https://issues.apache.org/jira/browse/SOLR-7734
 Project: Solr
  Issue Type: Bug
  Components: contrib - MapReduce
Affects Versions: 5.2.1
Reporter: Mike Drob
Assignee: Gregory Chanan
 Fix For: Trunk, 5.4

 Attachments: SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch, 
 SOLR-7734.patch, SOLR-7734.patch


 When running the MapReduceIndexerTool, it will usually pull a 
 {{solrconfig.xml}} from ZK for the collection that it is running against. 
 This can be problematic for several reasons:
 * Performance: The configuration in ZK will likely have several query 
 handlers, and lots of other components that don't make sense in an 
 indexing-only use of EmbeddedSolrServer (ESS).
 * Classpath Resources: If the Solr services are using some kind of additional 
 service (such as Sentry for auth) then the indexer will not have access to 
 the necessary configurations without the user jumping through several hoops.
 * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make 
 sense. There's other configurations that 
 * Update Chain Behaviours: I'm under the impression that UpdateChains may 
 behave differently in ESS than a SolrCloud cluster. Is it safe to depend on 
 consistent behaviour here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7734) MapReduce Indexer can error when using collection

2015-07-14 Thread Gregory Chanan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627281#comment-14627281
 ] 

Gregory Chanan commented on SOLR-7734:
--

{code}+import com.google.common.base.Charsets;{code}
This is necessary?

{code}+ may be downloaded from this ZooKeeper ensemble.));{code}
It's may because you might have specified --use-zk-solrconfig.xml?  And you 
want to leave it vague because the help on --use-zk-solrconfig.xml is 
suppressed?  This seems more confusing to me than just specifying everything in 
the help.

{code}
+if (!options.useZkSolrConfig) {
+  // replace downloaded solrconfig.xml with embedded one
+  InputStream source = 
MapReduceIndexerTool.class.getResourceAsStream(/solrconfig.indexer.xml);
+  FileOutputStream destination = new 
FileOutputStream(getSolrConfig(tmpSolrHomeDir));
+  ByteStreams.copy(source, destination);
+ destination.close();
+ source.close();
+}
{code}
The spacing looks off here.  Maybe better to close everything in a finally as 
well.

{code}
+  solr-jarify-filesets
+fileset dir=src/resources /
+  /solr-jarify-filesets
{code}
When i try to run ant jar on the map-reduce contrib I get 
solr/contrib/map-reduce/src/resources does not exist -- did you mean for 
solrconfig.indexer.xml to be there?

{code}
+  luceneMatchVersion4.10.3/luceneMatchVersion
{code}
Why the old version?  Should this be 6.0.0 for trunk, 5.something for 
branch_5x?  (I assume you want it in both, tell me if that's incorrect)

{code}
To enable dynamic schema REST APIs, use the following for schemaFactory:
+
+   schemaFactory class=ManagedIndexSchemaFactory
+ bool name=mutabletrue/bool
+ str name=managedSchemaResourceNamemanaged-schema/str
+   /schemaFactory
{code}
Does this work with managed  schemas?  What about if the resource name isn't 
the default?

{code}
+  !-- JMX
+
+   This example enables JMX if and only if an existing MBeanServer
+   is found, use this if you want to configure JMX through JVM
+   parameters. Remove this to disable exposing Solr configuration
+   and statistics to JMX.
+
+   For more details see http://wiki.apache.org/solr/SolrJmx
+--
+  jmx /
{code}
Do we want jmx?  Is it even possible to use in an MR job?

{code}+  requestDispatcher handleSelect=false 
+!-- Request Parsing{code}
Do we need this whole section?

About testing: I assume the existing tests now use the new (non-overwrite 
behavior).  What about adding a test for the new option 
(--use-zk-solrconfig.xml).  Maybe something simple like have your own update 
chain that adds a field/value that you expect to see.  And possibly the 
converse, where you add an update.chain and check that the new behavior is 
actually working, i.e. that it doesn't use the solrconfig in zk.

 MapReduce Indexer can error when using collection
 -

 Key: SOLR-7734
 URL: https://issues.apache.org/jira/browse/SOLR-7734
 Project: Solr
  Issue Type: Bug
  Components: contrib - MapReduce
Affects Versions: 5.2.1
Reporter: Mike Drob
Assignee: Gregory Chanan
 Fix For: 5.3, Trunk

 Attachments: SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch


 When running the MapReduceIndexerTool, it will usually pull a 
 {{solrconfig.xml}} from ZK for the collection that it is running against. 
 This can be problematic for several reasons:
 * Performance: The configuration in ZK will likely have several query 
 handlers, and lots of other components that don't make sense in an 
 indexing-only use of EmbeddedSolrServer (ESS).
 * Classpath Resources: If the Solr services are using some kind of additional 
 service (such as Sentry for auth) then the indexer will not have access to 
 the necessary configurations without the user jumping through several hoops.
 * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make 
 sense. There's other configurations that 
 * Update Chain Behaviours: I'm under the impression that UpdateChains may 
 behave differently in ESS than a SolrCloud cluster. Is it safe to depend on 
 consistent behaviour here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org