Re: schemaless slow indexing

2015-03-23 Thread Alexandre Rafalovitch
I looked at SOLR-7290, but I think the discussion should stay on the mailing list for at least one more iteration. My understanding that the reason copyField exists is so that a search actually worked out of the box. Without knowing the field names, one cannot say what to search. So, the

Re: schemaless slow indexing

2015-03-23 Thread Yonik Seeley
On Mon, Mar 23, 2015 at 1:54 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: I looked at SOLR-7290, but I think the discussion should stay on the mailing list for at least one more iteration. My understanding that the reason copyField exists is so that a search actually worked out of the

Re: schemaless slow indexing

2015-03-23 Thread Alexandre Rafalovitch
Yonik, those are all facts. Which I do not disagree with at all. But there are also consequences when you bring the rest of the facts and the assumptions and documented workflows into play. My comment was trying to address the situation on that level I am all for improving performance. I am just

Re: schemaless slow indexing

2015-03-23 Thread Steve Rowe
On Mar 23, 2015, at 11:51 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: For example, I am not even sure if we can create a copyField definition via REST API yet. https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-AddaNewCopyFieldRule

Re: schemaless slow indexing

2015-03-23 Thread Steve Rowe
On Mar 23, 2015, at 11:09 AM, Yonik Seeley ysee...@gmail.com wrote: On Mon, Mar 23, 2015 at 1:54 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: I looked at SOLR-7290, but I think the discussion should stay on the mailing list for at least one more iteration. My understanding that

Re: schemaless slow indexing

2015-03-22 Thread Erick Erickson
I think you mean https://issues.apache.org/jira/browse/SOLR-7290? Erick On Sun, Mar 22, 2015 at 2:30 PM, Mike Murphy mmurphy3...@gmail.com wrote: That's it! I hand edited the file that says you are not supposed to edit it and removed that copyField. Indexing performance is now back to

schemaless slow indexing

2015-03-22 Thread Mike Murphy
I'm trying out schemaless in solr 5.0, but the indexing seems quite a bit slower than it did in the past on 4.10. Any pointers? --Mike

Re: schemaless slow indexing

2015-03-22 Thread Erick Erickson
Please review: http://wiki.apache.org/solr/UsingMailingLists You haven't quantified the slowdown. Or given any details on how you're measuring the slowdown. Or how you've configured your setups in 4.10 and 5.0. Or... Ad Hossman would say details matter. Best, Erick On Sun, Mar 22, 2015 at 8:35

Re: schemaless slow indexing

2015-03-22 Thread Mike Murphy
I start up solr schemaless and index a bunch of data, and it takes a lot longer to finish indexing. No configuration changes, just straight schemaless. --Mike On Sun, Mar 22, 2015 at 12:27 PM, Erick Erickson erickerick...@gmail.com wrote: Please review:

Re: schemaless slow indexing

2015-03-22 Thread Alexandre Rafalovitch
Same data with same version of Solr with the only difference between Schema vs. Schemaless? How much longer, 10%, 2x, 20x? Schemaless mode has a much more complex UpdateRequestProcessor chain, that's partially what makes it schemaless. But I hesitate pointing fingers at that without any real

Re: schemaless slow indexing

2015-03-22 Thread Yonik Seeley
I took a quick look at the stock schemaless configs... unfortunately they contain a performance trap. There's a copyField by default that copies *all* fields to a catch-all field called _text. IMO, that's not a great default. Double the index size (well, the index portion of it at least... not

Re: schemaless slow indexing

2015-03-22 Thread Mike Murphy
That's it! I hand edited the file that says you are not supposed to edit it and removed that copyField. Indexing performance is now back to expected levels. I created an issue for this, https://issues.apache.org/jira/browse/SOLR-7284 --Mike On Sun, Mar 22, 2015 at 3:29 PM, Yonik Seeley