Re: Commit Issue in Solr 3.4
Yes it is amazon ec2 indeed. To expqnd on that, This solr deployment was working fine, handling the same load, on a 34 GB instance on ebs storage for quite some time. To reduce the time taken by a commit, I shifted this to a 30 GB SSD instance. It performed better in writes and commits for sure. But, since the last week I started facing this problem of infinite back to back commits. Not being able to resolve this, I have finally switched back to a 34 GB machine with ebs storage, and now the commits are working fine, though slow. Any thoughts? On 6 Feb 2014 23:00, Shawn Heisey s...@elyograg.org wrote: On 2/6/2014 9:56 AM, samarth s wrote: Size of index = 260 GB Total Docs = 100mn Usual writing speed = 50K per hour autoCommit-maxDocs = 400,000 autoCommit-maxTime = 1500,000 (25 mins) merge factor = 10 M/c memory = 30 GB, Xmx = 20 GB Server - Jetty OS - Cent OS 6 With 30GB of RAM (is it Amazon EC2, by chance?) and a 20GB heap, you have about 10GB of RAM left for caching your Solr index. If that server has all 260GB of index, I am really surprised that you have only been having problems for a short time. I would have expected problems from day one. Even if it only has half or one quarter of the index, there is still a major discrepancy in RAM vs. index size. You either need more memory or you need to reduce the size of your index. The size of the indexed portion generally has more of an impact on performance than the size of the stored portion, but they do both have an impact, especially on indexing and committing. With regular disks, it's best to have at least 50% of your index size available to the OS disk cache, but 100% is better. http://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache If you are already using SSD, you might think there can't be memory-related performance problems ... but you still need a pretty significant chunk of disk cache. https://wiki.apache.org/solr/SolrPerformanceProblems#SSD Thanks, Shawn
Re: Index a new record in MySQL
But delta indexing is supposed to do that. Please share details of your datatimport-config.xml and version of Solr for further help. -- View this message in context: http://lucene.472066.n3.nabble.com/Index-a-new-record-in-MySQL-tp4116164p4116184.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index a new record in MySQL
Hi, Version of SOLR is 4.6 and here is my db-import-config.xml dataConfig dataSource name=dbDS type=JdbcDataSource driver=com.mysql.jdbc.Driver url=[fill me] user=“fill me password=“fill me/ document name=“example name entity name=“table name pk=id transformer=com.example.sometransformer query=SELECT * from [table] deltaImportQuery=SELECT * from [table] where [pk]='${dataimporter.delta.pk}' deltaQuery=SELECT [pk] from [table] where last_modified gt; '${dataimporter.last_index_time}' field column=“col name=“col name/ field column=last_modified name=lastModified/ /entity /document /dataConfig On Feb 8, 2014, at 7:29 AM, tamanjit.bin...@yahoo.co.in wrote: But delta indexing is supposed to do that. Please share details of your datatimport-config.xml and version of Solr for further help. -- View this message in context: http://lucene.472066.n3.nabble.com/Index-a-new-record-in-MySQL-tp4116164p4116184.html Sent from the Solr - User mailing list archive at Nabble.com. *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global Services to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
Re: Index a new record in MySQL
The config seems ok. Have you checked the date in the properties file before a delta-import? MySql does give issues with date format. Also, check your webcontainer/solr logs; whether they give any exception when you call a delta-import. -- View this message in context: http://lucene.472066.n3.nabble.com/Index-a-new-record-in-MySQL-tp4116164p4116193.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Composite Unique key from existing fields in schema
You could use a combination of all your composite key columns and put them in a field in solr which can then be used as the unique key. As in if you have two columns c1 and c2, you could have a field in solr which have the value as c1_c2 or something on those lines. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Composite-Unique-key-from-existing-fields-in-schema-tp4116036p4116195.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Composite Unique key from existing fields in schema
Also take a look at http://wiki.apache.org/solr/UniqueKey#Use_cases_which_require_a_unique_key_generated_from_data_in_the_document http://wiki.apache.org/solr/UniqueKey#Use_cases_which_require_a_unique_key_generated_from_data_in_the_document -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Composite-Unique-key-from-existing-fields-in-schema-tp4116036p4116198.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud: problems with delete
What version of Solr? If it works with commit=true, then it's working just fine. Index changes aren't visible for searchers until a commit occurs. How are you determining whether the doc is gone or not? A Solr search (which depends on commits) or examining the index from a low level? If the latter, then docs are just marked as deleted, they aren't physically removed until segments are merged. Best Erick On Fri, Feb 7, 2014 at 4:25 AM, ku3ia dem...@gmail.com wrote: autocommit and /update/?commit=true works fine. I tell about, for example, I send 807632 docs to index to my 3 shard cluster - everything is fine, but when I'm trying to remove them, using POST request with small number of ids, lets say 100 per request - some docs are still on index, but seems must not. When I send a POST request with ids number like 1K or 2K all docs are deleted from index. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-problems-with-delete-tp4116026p4116038.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19
If that is the case we really have to dig in. Given the error, the first thing I would assume is that you have an old solrj jar or something before 4.6.1 involved with a 4.6.1 solrj jar or install. - Mark http://about.me/markrmiller On Feb 7, 2014, 7:15:24 PM, Mark Miller markrmil...@gmail.com wrote: Hey, yeah, blew it on this one. Someone just reported it the other day - the way that a bug was fixed was not back and forward compatible. The first implementation was wrong. You have to update the other nodes to 4.6.1 as well. I’m going to look at some scripting test that can help check for this type of thing. - Mark http://about.me/markrmiller On Feb 7, 2014, 7:01:24 PM, Brett Hoerner br...@bretthoerner.com wrote: I have Solr 4.6.1 on the server and just upgraded my indexer app to SolrJ 4.6.1 and indexing ceased (indexer returned No live servers for shard but the real root from the Solr servers is below). Note that SolrJ 4.6.1 is fine for the query side, just not adding documents. 21:35:21.508 [qtp1418442930-22296231] ERROR o.a.solr.servlet.SolrDispatchFilter - null:java.lang.RuntimeException: Unknown type 19 at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:232) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:724)
Re: Tf-Idf for a specific query
David: If you're, say, faceting on fields with lots of unique values, this will be quite expensive. No idea whether you can tolerate slower queries or not, just sayin' Erick On Fri, Feb 7, 2014 at 5:35 PM, David Miller davthehac...@gmail.com wrote: Thanks Mikhai, It seems that, this was what I was looking for. Being new to this, I wasn't aware of such a use of facets. Now I can probably combine the term vectors and facets to fit my scenario. Regards, Dave On Fri, Feb 7, 2014 at 2:43 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: David, I can imagine that DF for resultset is facets! On Fri, Feb 7, 2014 at 11:26 PM, David Miller davthehac...@gmail.com wrote: Hi Mikhail, The DF seems to be based on the entire document set. What I require is based on a the results of a single query. Suppose my Solr query returns a set of 50K documents from a superset of 10Million documents, I require to calculate the DF just based on the 50K documents. But currently it seems to be calculated on the entire doc set. So, is there any way to get the DF or IDF just on basis of the docs returned by the query? Regards, Dave On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Dave you can get DF from http://wiki.apache.org/solr/TermsComponent (invert it yourself) then, for certain term you can get number of occurrences per document by http://wiki.apache.org/solr/FunctionQuery#tf On Fri, Feb 7, 2014 at 3:58 AM, David Miller davthehac...@gmail.com wrote: Hi Guys.. I require to obtain Tf-idf score from Solr for a certain set of documents. But the catch is that, I needs the IDF (or DF) to be calculated on the documents returned by the specific query and not the entire corpus. Please provide me some hint on whether Solr has this feature or if I can use the Lucene Api directly to achieve this. Thanks in advance, Dave -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19
Hmmm, I'm assembling into an uberjar that forces uniqueness of classes. I verified 4.6.1 is definitely winning and included alone when it breaks. On Sat, Feb 8, 2014 at 9:44 AM, Mark Miller markrmil...@gmail.com wrote: If that is the case we really have to dig in. Given the error, the first thing I would assume is that you have an old solrj jar or something before 4.6.1 involved with a 4.6.1 solrj jar or install. - Mark http://about.me/markrmiller On Feb 7, 2014, 7:15:24 PM, Mark Miller markrmil...@gmail.com wrote: Hey, yeah, blew it on this one. Someone just reported it the other day - the way that a bug was fixed was not back and forward compatible. The first implementation was wrong. You have to update the other nodes to 4.6.1 as well. I’m going to look at some scripting test that can help check for this type of thing. - Mark http://about.me/markrmiller On Feb 7, 2014, 7:01:24 PM, Brett Hoerner br...@bretthoerner.com wrote: I have Solr 4.6.1 on the server and just upgraded my indexer app to SolrJ 4.6.1 and indexing ceased (indexer returned No live servers for shard but the real root from the Solr servers is below). Note that SolrJ 4.6.1 is fine for the query side, just not adding documents. 21:35:21.508 [qtp1418442930-22296231] ERROR o.a.solr.servlet.SolrDispatchFilter - null:java.lang.RuntimeException: Unknown type 19 at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:232) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at
Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19
If you look at the stack trace, the line numbers match 4.6.0 in the src, but not 4.6.1. That code couldn’t have been 4.6.1 it seems. - Mark http://about.me/markrmiller On Feb 8, 2014, at 11:12 AM, Brett Hoerner br...@bretthoerner.com wrote: Hmmm, I'm assembling into an uberjar that forces uniqueness of classes. I verified 4.6.1 is definitely winning and included alone when it breaks. On Sat, Feb 8, 2014 at 9:44 AM, Mark Miller markrmil...@gmail.com wrote: If that is the case we really have to dig in. Given the error, the first thing I would assume is that you have an old solrj jar or something before 4.6.1 involved with a 4.6.1 solrj jar or install. - Mark http://about.me/markrmiller On Feb 7, 2014, 7:15:24 PM, Mark Miller markrmil...@gmail.com wrote: Hey, yeah, blew it on this one. Someone just reported it the other day - the way that a bug was fixed was not back and forward compatible. The first implementation was wrong. You have to update the other nodes to 4.6.1 as well. I’m going to look at some scripting test that can help check for this type of thing. - Mark http://about.me/markrmiller On Feb 7, 2014, 7:01:24 PM, Brett Hoerner br...@bretthoerner.com wrote: I have Solr 4.6.1 on the server and just upgraded my indexer app to SolrJ 4.6.1 and indexing ceased (indexer returned No live servers for shard but the real root from the Solr servers is below). Note that SolrJ 4.6.1 is fine for the query side, just not adding documents. 21:35:21.508 [qtp1418442930-22296231] ERROR o.a.solr.servlet.SolrDispatchFilter - null:java.lang.RuntimeException: Unknown type 19 at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:232) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953) at
Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19
Oh, I was talking about my indexer. That stack is from my Solr servers, very weird since it's *supposed* to be 4.6.1 . I'll dig in more, thanks. On Sat, Feb 8, 2014 at 10:21 AM, Mark Miller markrmil...@gmail.com wrote: If you look at the stack trace, the line numbers match 4.6.0 in the src, but not 4.6.1. That code couldn’t have been 4.6.1 it seems. - Mark http://about.me/markrmiller On Feb 8, 2014, at 11:12 AM, Brett Hoerner br...@bretthoerner.com wrote: Hmmm, I'm assembling into an uberjar that forces uniqueness of classes. I verified 4.6.1 is definitely winning and included alone when it breaks. On Sat, Feb 8, 2014 at 9:44 AM, Mark Miller markrmil...@gmail.com wrote: If that is the case we really have to dig in. Given the error, the first thing I would assume is that you have an old solrj jar or something before 4.6.1 involved with a 4.6.1 solrj jar or install. - Mark http://about.me/markrmiller On Feb 7, 2014, 7:15:24 PM, Mark Miller markrmil...@gmail.com wrote: Hey, yeah, blew it on this one. Someone just reported it the other day - the way that a bug was fixed was not back and forward compatible. The first implementation was wrong. You have to update the other nodes to 4.6.1 as well. I’m going to look at some scripting test that can help check for this type of thing. - Mark http://about.me/markrmiller On Feb 7, 2014, 7:01:24 PM, Brett Hoerner br...@bretthoerner.com wrote: I have Solr 4.6.1 on the server and just upgraded my indexer app to SolrJ 4.6.1 and indexing ceased (indexer returned No live servers for shard but the real root from the Solr servers is below). Note that SolrJ 4.6.1 is fine for the query side, just not adding documents. 21:35:21.508 [qtp1418442930-22296231] ERROR o.a.solr.servlet.SolrDispatchFilter - null:java.lang.RuntimeException: Unknown type 19 at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:232) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19
Mark, you were correct. I realized I was still running a prerelease of 4.6.1 (by a handful of commits). Bounced them with proper 4.6.1 and we're all good, sorry for the spam. :) On Sat, Feb 8, 2014 at 10:29 AM, Brett Hoerner br...@bretthoerner.comwrote: Oh, I was talking about my indexer. That stack is from my Solr servers, very weird since it's *supposed* to be 4.6.1 . I'll dig in more, thanks. On Sat, Feb 8, 2014 at 10:21 AM, Mark Miller markrmil...@gmail.comwrote: If you look at the stack trace, the line numbers match 4.6.0 in the src, but not 4.6.1. That code couldn’t have been 4.6.1 it seems. - Mark http://about.me/markrmiller On Feb 8, 2014, at 11:12 AM, Brett Hoerner br...@bretthoerner.com wrote: Hmmm, I'm assembling into an uberjar that forces uniqueness of classes. I verified 4.6.1 is definitely winning and included alone when it breaks. On Sat, Feb 8, 2014 at 9:44 AM, Mark Miller markrmil...@gmail.com wrote: If that is the case we really have to dig in. Given the error, the first thing I would assume is that you have an old solrj jar or something before 4.6.1 involved with a 4.6.1 solrj jar or install. - Mark http://about.me/markrmiller On Feb 7, 2014, 7:15:24 PM, Mark Miller markrmil...@gmail.com wrote: Hey, yeah, blew it on this one. Someone just reported it the other day - the way that a bug was fixed was not back and forward compatible. The first implementation was wrong. You have to update the other nodes to 4.6.1 as well. I’m going to look at some scripting test that can help check for this type of thing. - Mark http://about.me/markrmiller On Feb 7, 2014, 7:01:24 PM, Brett Hoerner br...@bretthoerner.com wrote: I have Solr 4.6.1 on the server and just upgraded my indexer app to SolrJ 4.6.1 and indexing ceased (indexer returned No live servers for shard but the real root from the Solr servers is below). Note that SolrJ 4.6.1 is fine for the query side, just not adding documents. 21:35:21.508 [qtp1418442930-22296231] ERROR o.a.solr.servlet.SolrDispatchFilter - null:java.lang.RuntimeException: Unknown type 19 at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:232) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at
Re: Commit Issue in Solr 3.4
On 2/8/2014 1:40 AM, samarth s wrote: Yes it is amazon ec2 indeed. To expqnd on that, This solr deployment was working fine, handling the same load, on a 34 GB instance on ebs storage for quite some time. To reduce the time taken by a commit, I shifted this to a 30 GB SSD instance. It performed better in writes and commits for sure. But, since the last week I started facing this problem of infinite back to back commits. Not being able to resolve this, I have finally switched back to a 34 GB machine with ebs storage, and now the commits are working fine, though slow. The extra 4GB of RAM is almost guaranteed to be the difference. If your index continues to grow, you'll probably be having problems very soon even with 34GB of RAM. If you could put it on a box with 128 to 256GB of RAM, you'd likely see your performance increase dramatically. Can you share your solrconfig.xml file? I may be able to confirm a couple of things I suspect, and depending on what's there, may be able to offer some ideas to help a little bit. It's best if you use a file sharing site like dropbox - the list doesn't deal with attachments very well. Sometimes they work, but most of the time they don't. I will reiterate my main point -- you really need a LOT more memory. Another option is to shard your index across multiple servers. This doesn't actually reduce the TOTAL memory requirement, but it is sometimes easier to get management to agree to buy more servers than it is to get them to agree to buy really large servers. It's a paradox that doesn't make any sense to me, but I've seen it over and over. Thanks, Shawn
Re: Commit Issue in Solr 3.4
On 2/8/2014 10:22 AM, Shawn Heisey wrote: Can you share your solrconfig.xml file? I may be able to confirm a couple of things I suspect, and depending on what's there, may be able to offer some ideas to help a little bit. It's best if you use a file sharing site like dropbox - the list doesn't deal with attachments very well. Sometimes they work, but most of the time they don't. One additional idea: Unless you know through actual testing that you really do need a 20GB heap, try reducing it. You have 100 million documents, so perhaps you really do need a heap that big. In addition to the solrconfig.xml, I would also be interested in knowing what memory/GC tuning options you've used to start your java instance, and I'd like to see a sampling of typical and worst-case query parameters or URL strings. I'd need to see all parameters and know which request handler you used, so I can cross-reference with the config. Thanks, Shawn
Re: Commit Issue in Solr 3.4
I would be curious what the cause is. Samarth says that it worked for over a year /and supposedly docs were being added all the time/. Did the index grew considerably in the last period? Perhaps he could attach visualvm while it is in the 'black hole' state to see what is actually going on. I don't know if the instance is used also for searching, but if its only indexing, maybe just shorter commit intervals would alleviate the problem. To add context, our indexer is configured with 16gb heap, on machine with 64gb ram, but busy one, so sometimes there is no cache to spare for os. The index is 300gb (out of which 140gb stored values), and it is working just 'fine' - 30doc/s on average, but our docs are large /0.5mb on avg/ and fetched from two databases, so the slowness is outside solr. I didnt see big improvements with bigger heap, but I don't remember exact numbers. This is solr4. Roman On 8 Feb 2014 12:23, Shawn Heisey s...@elyograg.org wrote: On 2/8/2014 1:40 AM, samarth s wrote: Yes it is amazon ec2 indeed. To expqnd on that, This solr deployment was working fine, handling the same load, on a 34 GB instance on ebs storage for quite some time. To reduce the time taken by a commit, I shifted this to a 30 GB SSD instance. It performed better in writes and commits for sure. But, since the last week I started facing this problem of infinite back to back commits. Not being able to resolve this, I have finally switched back to a 34 GB machine with ebs storage, and now the commits are working fine, though slow. The extra 4GB of RAM is almost guaranteed to be the difference. If your index continues to grow, you'll probably be having problems very soon even with 34GB of RAM. If you could put it on a box with 128 to 256GB of RAM, you'd likely see your performance increase dramatically. Can you share your solrconfig.xml file? I may be able to confirm a couple of things I suspect, and depending on what's there, may be able to offer some ideas to help a little bit. It's best if you use a file sharing site like dropbox - the list doesn't deal with attachments very well. Sometimes they work, but most of the time they don't. I will reiterate my main point -- you really need a LOT more memory. Another option is to shard your index across multiple servers. This doesn't actually reduce the TOTAL memory requirement, but it is sometimes easier to get management to agree to buy more servers than it is to get them to agree to buy really large servers. It's a paradox that doesn't make any sense to me, but I've seen it over and over. Thanks, Shawn
Re: Commit Issue in Solr 3.4
On 2/8/2014 11:02 AM, Roman Chyla wrote: I would be curious what the cause is. Samarth says that it worked for over a year /and supposedly docs were being added all the time/. Did the index grew considerably in the last period? Perhaps he could attach visualvm while it is in the 'black hole' state to see what is actually going on. I don't know if the instance is used also for searching, but if its only indexing, maybe just shorter commit intervals would alleviate the problem. To add context, our indexer is configured with 16gb heap, on machine with 64gb ram, but busy one, so sometimes there is no cache to spare for os. The index is 300gb (out of which 140gb stored values), and it is working just 'fine' - 30doc/s on average, but our docs are large /0.5mb on avg/ and fetched from two databases, so the slowness is outside solr. I didnt see big improvements with bigger heap, but I don't remember exact numbers. This is solr4. For this discussion, refer to this image, or the Google Books link where I originally found it: https://dl.dropboxusercontent.com/u/97770508/performance-dropoff-graph.png http://books.google.com/books?id=dUiNGYCiWg0Cpg=PA33#v=onepageqf=false Computer systems have had a long history of performance curves like this. Everything goes really well, possibly for a really long time, until you cross some threshold where a resource cannot keep up with the demands being placed on it. That threshold is usually something you can't calculate in advance. Once it is crossed, even by a tiny amount, performance drops VERY quickly. I do recommend that people closely analyze their GC characteristics, but jconsole, jvisualvm, and other tools like that are actually not very good at this task. You can only get summary info -- how many GCs occurred and total amount of time spent doing GC, often with a useless granularity -- jconsole reports the time in minutes on a system that has been running for any length of time. I *was* having occasional super-long GC pauses (15 seconds or more), but I did not know it, even though I had religiously looked at GC info in jconsole and jstat. I discovered the problem indirectly, and had to find additional tools to quantify it. After discovering it, I tuned my garbage collection and have not had the problem since. If you have detailed GC logs enabled, this is a good free tool for offline analysis: https://code.google.com/p/gclogviewer/ I have also had good results with this free tool, but it requires a little more work to set up: http://www.azulsystems.com/jHiccup Azul Systems has an alternate Java implementation for Linux that virtually eliminates GC pauses, but it isn't free. I do not have any information about how much it costs. We found our own solution, but for those who can throw money at the problem, I've heard good things about it. Thanks, Shawn
Re: Max Limit to Schema Fields - Solr 4.X
That was the original plan. However its important to preserve the originating field that loaded the value. The data is very fine and granular and each field stores a particular value. When searching the data against solr - it would be important to know what docs contain that particular data from that particular field. (fielda:value, fieldb:value ) where as searching field:value would hide what originating field loaded this value. Im going to try loading all 3000 fields in the schema and see how that goes. Only concern is doing boolean searches and whether or not Ill run into URL length issues but I guess Ill find out soon. Thanks again! Sent from my iPhone On Feb 6, 2014, at 1:02 PM, Erick Erickson erickerick...@gmail.com wrote: Sometimes you can spoof the many fields problem by using prefixes on the data. Rather than fielda, fieldb... Have field and index values like fielda_value, fieldb_value into a single field. Then do the right thing when searching. Watch tokenization though. Best Erick On Feb 5, 2014 4:59 AM, Mike L. javaone...@yahoo.com wrote: Thanks Shawn. This is good to know. Sent from my iPhone On Feb 5, 2014, at 12:53 AM, Shawn Heisey s...@elyograg.org wrote: On 2/4/2014 8:00 PM, Mike L. wrote: I'm just wondering here if there is any defined limit to how many fields can be created within a schema? I'm sure the configuration maintenance of a schema like this would be a nightmare, but would like to know if its at all possible in the first place before It may be attempted. There are no hard limits on the number of fields, whether they are dynamically defined or not. Several thousand fields should be no problem. If you have enough system resources and you don't run into an unlikely bug, there's no reason it won't work. As you've already been told, there are potential performance concerns. Depending on the exact nature of your queries, you might need to increase maxBooleanClauses. The only hard limitation that Lucene really has (and by extension, Solr also has that limitation) is that a single index cannot have more than about two billion documents in it - the inherent limitation on a Java int type. Solr can use indexes larger than this through sharding. See the very end of this page: https://lucene.apache.org/core/4_6_0/core/org/apache/lucene/codecs/lucene46/package-summary.html#Limitations Thanks, Shawn
Re: Commit Issue in Solr 3.4
Thanks for the links. I think it would be worth getting more detailed info. Because it could be the performance threshold, or it could be st else /such as updated java version or st else, loosely related to ram, eg what is held in memory before the commit, what is cached, leaked custom query objects with holding to some big object etc/. Btw if i study the graph, i see that there *are* warning signs. That's the point of testing/measuring after all, IMHO. --roman On 8 Feb 2014 13:51, Shawn Heisey s...@elyograg.org wrote: On 2/8/2014 11:02 AM, Roman Chyla wrote: I would be curious what the cause is. Samarth says that it worked for over a year /and supposedly docs were being added all the time/. Did the index grew considerably in the last period? Perhaps he could attach visualvm while it is in the 'black hole' state to see what is actually going on. I don't know if the instance is used also for searching, but if its only indexing, maybe just shorter commit intervals would alleviate the problem. To add context, our indexer is configured with 16gb heap, on machine with 64gb ram, but busy one, so sometimes there is no cache to spare for os. The index is 300gb (out of which 140gb stored values), and it is working just 'fine' - 30doc/s on average, but our docs are large /0.5mb on avg/ and fetched from two databases, so the slowness is outside solr. I didnt see big improvements with bigger heap, but I don't remember exact numbers. This is solr4. For this discussion, refer to this image, or the Google Books link where I originally found it: https://dl.dropboxusercontent.com/u/97770508/performance-dropoff-graph.png http://books.google.com/books?id=dUiNGYCiWg0Cpg=PA33#v=onepageqf=false Computer systems have had a long history of performance curves like this. Everything goes really well, possibly for a really long time, until you cross some threshold where a resource cannot keep up with the demands being placed on it. That threshold is usually something you can't calculate in advance. Once it is crossed, even by a tiny amount, performance drops VERY quickly. I do recommend that people closely analyze their GC characteristics, but jconsole, jvisualvm, and other tools like that are actually not very good at this task. You can only get summary info -- how many GCs occurred and total amount of time spent doing GC, often with a useless granularity -- jconsole reports the time in minutes on a system that has been running for any length of time. I *was* having occasional super-long GC pauses (15 seconds or more), but I did not know it, even though I had religiously looked at GC info in jconsole and jstat. I discovered the problem indirectly, and had to find additional tools to quantify it. After discovering it, I tuned my garbage collection and have not had the problem since. If you have detailed GC logs enabled, this is a good free tool for offline analysis: https://code.google.com/p/gclogviewer/ I have also had good results with this free tool, but it requires a little more work to set up: http://www.azulsystems.com/jHiccup Azul Systems has an alternate Java implementation for Linux that virtually eliminates GC pauses, but it isn't free. I do not have any information about how much it costs. We found our own solution, but for those who can throw money at the problem, I've heard good things about it. Thanks, Shawn
A bit lost in the land of schemaless Solr
Say that I have 10 fieldTypes for 10 languages. Is there a way to associate a naming convention from field names to field types so that I can avoid bothering with all those dynamic fields?
(lack) of error for missing library?
!-- an exact 'path' can be used instead of a 'dir' to specify a specific jar file. This will cause a serious error to be logged if it can't be loaded. -- is the comment, but when I put a completely missing path in there -- no error. Should I file a JIRA?
Re: Max Limit to Schema Fields - Solr 4.X
It's safe to say that you are on thin ice - try it and if it works well for your specific data and requirements, great. But if it eventually fails, you can't say we didn't warn you! Generally, I would say that a few hundred fields is the recommended limit - not that more will definitely cause problems, but because you will be beyond common usage and increasingly sensitive to amount of data and Java/JVM performance capabilities. -- Jack Krupansky -Original Message- From: Mike L. Sent: Saturday, February 8, 2014 2:12 PM To: solr-user@lucene.apache.org Cc: solr-user@lucene.apache.org Subject: Re: Max Limit to Schema Fields - Solr 4.X That was the original plan. However its important to preserve the originating field that loaded the value. The data is very fine and granular and each field stores a particular value. When searching the data against solr - it would be important to know what docs contain that particular data from that particular field. (fielda:value, fieldb:value ) where as searching field:value would hide what originating field loaded this value. Im going to try loading all 3000 fields in the schema and see how that goes. Only concern is doing boolean searches and whether or not Ill run into URL length issues but I guess Ill find out soon. Thanks again! Sent from my iPhone On Feb 6, 2014, at 1:02 PM, Erick Erickson erickerick...@gmail.com wrote: Sometimes you can spoof the many fields problem by using prefixes on the data. Rather than fielda, fieldb... Have field and index values like fielda_value, fieldb_value into a single field. Then do the right thing when searching. Watch tokenization though. Best Erick On Feb 5, 2014 4:59 AM, Mike L. javaone...@yahoo.com wrote: Thanks Shawn. This is good to know. Sent from my iPhone On Feb 5, 2014, at 12:53 AM, Shawn Heisey s...@elyograg.org wrote: On 2/4/2014 8:00 PM, Mike L. wrote: I'm just wondering here if there is any defined limit to how many fields can be created within a schema? I'm sure the configuration maintenance of a schema like this would be a nightmare, but would like to know if its at all possible in the first place before It may be attempted. There are no hard limits on the number of fields, whether they are dynamically defined or not. Several thousand fields should be no problem. If you have enough system resources and you don't run into an unlikely bug, there's no reason it won't work. As you've already been told, there are potential performance concerns. Depending on the exact nature of your queries, you might need to increase maxBooleanClauses. The only hard limitation that Lucene really has (and by extension, Solr also has that limitation) is that a single index cannot have more than about two billion documents in it - the inherent limitation on a Java int type. Solr can use indexes larger than this through sharding. See the very end of this page: https://lucene.apache.org/core/4_6_0/core/org/apache/lucene/codecs/lucene46/package-summary.html#Limitations Thanks, Shawn
Re: A bit lost in the land of schemaless Solr
File a Jira? (Really!) Yeah, the absolute separation of field and field type can be a bit annoying in some of these dynamic cases. It might be nice to have a hybrid - like a dynamic field with type details as nested content. I mean it does seems like one of the common use cases of dynamic field - the pattern suffix specifies the specific type. Or, maybe the ability to specify a dynamic field by directly referencing the type, like: type:text_fr. Or, just completely merge field and type so you can have any degree of hybrid you want, including pure field, pure type, field plus analyzer, etc. More like type as merely inheriting attributes from another field/type. -- Jack Krupansky -Original Message- From: Benson Margulies Sent: Saturday, February 8, 2014 2:37 PM To: solr-user@lucene.apache.org Subject: A bit lost in the land of schemaless Solr Say that I have 10 fieldTypes for 10 languages. Is there a way to associate a naming convention from field names to field types so that I can avoid bothering with all those dynamic fields?
Re: Max Limit to Schema Fields - Solr 4.X
On 2/8/2014 12:12 PM, Mike L. wrote: Im going to try loading all 3000 fields in the schema and see how that goes. Only concern is doing boolean searches and whether or not Ill run into URL length issues but I guess Ill find out soon. It will likely work without a problem. As already mentioned, you may need to increase maxBooleanClauses in solrconfig.xml beyond the default of 1024. The max URL size is configurable with any decent servlet container, including the jetty that comes with the Solr example. In the part of the jetty config that adds the connector, this increases the max HTTP header size to 32K, and the size for the entire HTTP buffer to 64K. These may not be big enough with 3000 fields, but it gives you the general idea: Set name=requestHeaderSize32768/Set Set name=requestBufferSize65536/Set Another option is to use a POST request instead of a GET request with the parameters in the posted body. The default POST buffer size in Jetty is 200K. In newer versions of Solr, the limit is actually set by Solr, not the servlet container, and defaults to 2MB. I believe that if you are using SolrJ, it uses POST requests by default. Thanks, Shawn
RE: Highlight results in Arabic are backword
Thank you both for responding. Is there a way to specify to Solr to add those attributes on the field when it returns results (e.g. Language is Arabic, English. Or direction is LTR or RTL.)? Right now I only have Arabic content indexed, but we plan to add English in the near future. I don't want to have to re-do everything later if there is a better way of designing this now. Regards, Fatima -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Friday, February 07, 2014 3:48 AM To: solr-user@lucene.apache.org Subject: Re: Highlight results in Arabic are backword Arabic if complex. Basically, don't trust anything you see until you put that content on the screen with the surrounding tag marked with attribute dir='rtl' (e.g. p dir='rlt'arabic test/p). Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Feb 6, 2014 at 10:12 PM, Steve Rowe sar...@gmail.com wrote: Hi Fatima, I don’t think there’s an actual problem, it just looks like it because the program you’re using to look at the JSON makes a different choice for laying out the highlighting results than it does for the field values. In fact, all the bytes are the same, and in the same order for both the “author” field text and the highlighting text, though some space characters are ASCII space (U+0020) in one and non-breaking space (U+00A0) in the other. By the way, I see the same thing as you in my email client (OS X Mail.app). I assume there is a rule shared by our programs about complex layout like this, where right-to-left text is mixed with left-to-right text, likely based on the proportion of each, that triggers a left-to-right word sequencing instead of the expected right-to-left word sequencing. Anyway, I pulled out the author field and highlighting texts into an HTML document and viewed it in my browser (Safari), and both are layed out the same (with the exception of the emphasis given the highlighted word): —— html body pauthor: د. فيشر السعر,/p phighlighting: { 1: { author: [ د. emفيشر/em السعر ] } }/p /body /html —— Steve On Feb 6, 2014, at 8:23 AM, Fatima Issawi issa...@qu.edu.qa wrote: Hello, I am getting highlight results in Arabic, but the order of the words are backwards. Querying on that field gives me the correct result, though. Is there are setting I’m missing? An extract from an example query from my Solr Console is below: { responseHeader: { status: 0, QTime: 1, params: { indent: true, q: author:\فيشر\, _: 1391692704242, hl.simple.pre: em, hl.simple.post: /em, hl.fl: author, wt: json, hl: true } }, response: { numFound: 4, start: 0, docs: [ { pagenumber: 1, id: 1, author: د. فيشر السعر, author_s: د. فيشر السعر, collector: فاطمة عيساوي, }, highlighting: { 1: { author: [ د. emفيشر/em السعر ]
Re: Highlight results in Arabic are backword
You will most probably put your English and Arabic content into different fields. Mostly because you will want to apply different field type definitions to your English and Arabic text (tokenizers, etc). Also, I would search around the web for articles on multilingual approach to Solr, if you are doing some deliberate design now. There are some deeper issues. Some good questions are covered here: http://info.basistech.com/blog/bid/171842/Indexing-Strategies-for-Multilingual-Search-with-Solr-and-Rosette (even if it is talking about the commercial tool). There is also a series of 12 blog posts on dealing with Solr for CJK in the libraries. Your issues will be different, but there will be overlap. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Sun, Feb 9, 2014 at 12:56 PM, Fatima Issawi issa...@qu.edu.qa wrote: Thank you both for responding. Is there a way to specify to Solr to add those attributes on the field when it returns results (e.g. Language is Arabic, English. Or direction is LTR or RTL.)? Right now I only have Arabic content indexed, but we plan to add English in the near future. I don't want to have to re-do everything later if there is a better way of designing this now. Regards, Fatima -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Friday, February 07, 2014 3:48 AM To: solr-user@lucene.apache.org Subject: Re: Highlight results in Arabic are backword Arabic if complex. Basically, don't trust anything you see until you put that content on the screen with the surrounding tag marked with attribute dir='rtl' (e.g. p dir='rlt'arabic test/p). Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Feb 6, 2014 at 10:12 PM, Steve Rowe sar...@gmail.com wrote: Hi Fatima, I don’t think there’s an actual problem, it just looks like it because the program you’re using to look at the JSON makes a different choice for laying out the highlighting results than it does for the field values. In fact, all the bytes are the same, and in the same order for both the “author” field text and the highlighting text, though some space characters are ASCII space (U+0020) in one and non-breaking space (U+00A0) in the other. By the way, I see the same thing as you in my email client (OS X Mail.app). I assume there is a rule shared by our programs about complex layout like this, where right-to-left text is mixed with left-to-right text, likely based on the proportion of each, that triggers a left-to-right word sequencing instead of the expected right-to-left word sequencing. Anyway, I pulled out the author field and highlighting texts into an HTML document and viewed it in my browser (Safari), and both are layed out the same (with the exception of the emphasis given the highlighted word): —— html body pauthor: د. فيشر السعر,/p phighlighting: { 1: { author: [ د. emفيشر/em السعر ] } }/p /body /html —— Steve On Feb 6, 2014, at 8:23 AM, Fatima Issawi issa...@qu.edu.qa wrote: Hello, I am getting highlight results in Arabic, but the order of the words are backwards. Querying on that field gives me the correct result, though. Is there are setting I’m missing? An extract from an example query from my Solr Console is below: { responseHeader: { status: 0, QTime: 1, params: { indent: true, q: author:\فيشر\, _: 1391692704242, hl.simple.pre: em, hl.simple.post: /em, hl.fl: author, wt: json, hl: true } }, response: { numFound: 4, start: 0, docs: [ { pagenumber: 1, id: 1, author: د. فيشر السعر, author_s: د. فيشر السعر, collector: فاطمة عيساوي, }, highlighting: { 1: { author: [ د. emفيشر/em السعر ]
RE: Highlight results in Arabic are backword
Thank you. I will look into that. -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Sunday, February 09, 2014 9:35 AM To: solr-user@lucene.apache.org Subject: Re: Highlight results in Arabic are backword You will most probably put your English and Arabic content into different fields. Mostly because you will want to apply different field type definitions to your English and Arabic text (tokenizers, etc). Also, I would search around the web for articles on multilingual approach to Solr, if you are doing some deliberate design now. There are some deeper issues. Some good questions are covered here: http://info.basistech.com/blog/bid/171842/Indexing-Strategies-for- Multilingual-Search-with-Solr-and-Rosette (even if it is talking about the commercial tool). There is also a series of 12 blog posts on dealing with Solr for CJK in the libraries. Your issues will be different, but there will be overlap. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Sun, Feb 9, 2014 at 12:56 PM, Fatima Issawi issa...@qu.edu.qa wrote: Thank you both for responding. Is there a way to specify to Solr to add those attributes on the field when it returns results (e.g. Language is Arabic, English. Or direction is LTR or RTL.)? Right now I only have Arabic content indexed, but we plan to add English in the near future. I don't want to have to re-do everything later if there is a better way of designing this now. Regards, Fatima -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Friday, February 07, 2014 3:48 AM To: solr-user@lucene.apache.org Subject: Re: Highlight results in Arabic are backword Arabic if complex. Basically, don't trust anything you see until you put that content on the screen with the surrounding tag marked with attribute dir='rtl' (e.g. p dir='rlt'arabic test/p). Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Feb 6, 2014 at 10:12 PM, Steve Rowe sar...@gmail.com wrote: Hi Fatima, I don’t think there’s an actual problem, it just looks like it because the program you’re using to look at the JSON makes a different choice for laying out the highlighting results than it does for the field values. In fact, all the bytes are the same, and in the same order for both the “author” field text and the highlighting text, though some space characters are ASCII space (U+0020) in one and non-breaking space (U+00A0) in the other. By the way, I see the same thing as you in my email client (OS X Mail.app). I assume there is a rule shared by our programs about complex layout like this, where right-to-left text is mixed with left-to-right text, likely based on the proportion of each, that triggers a left-to-right word sequencing instead of the expected right-to-left word sequencing. Anyway, I pulled out the author field and highlighting texts into an HTML document and viewed it in my browser (Safari), and both are layed out the same (with the exception of the emphasis given the highlighted word): —— html body pauthor: د. فيشر السعر,/p phighlighting: { 1: { author: [ د. emفيشر/em السعر ] } }/p /body /html —— Steve On Feb 6, 2014, at 8:23 AM, Fatima Issawi issa...@qu.edu.qa wrote: Hello, I am getting highlight results in Arabic, but the order of the words are backwards. Querying on that field gives me the correct result, though. Is there are setting I’m missing? An extract from an example query from my Solr Console is below: { responseHeader: { status: 0, QTime: 1, params: { indent: true, q: author:\فيشر\, _: 1391692704242, hl.simple.pre: em, hl.simple.post: /em, hl.fl: author, wt: json, hl: true } }, response: { numFound: 4, start: 0, docs: [ { pagenumber: 1, id: 1, author: د. فيشر السعر, author_s: د. فيشر السعر, collector: فاطمة عيساوي, }, highlighting: { 1: { author: [ د. emفيشر/em السعر ]