Re: IF function and FieldList
Thanks Erick. Arcadius. On 22 May 2014 22:14, Erick Erickson erickerick...@gmail.com wrote: Why not just return them all and sort it out on the app layer? Seems easier Or consider doc transformers I suppose. Best, Erick On Thu, May 22, 2014 at 10:20 AM, Arcadius Ahouansou arcad...@menelic.com wrote: Hello. I need to have dynamically assigned field list (fl) depending on the existence of a field in the response. I need to do something like fl=if(exists(field0),field0 field1,field2 field3)) The problem is that the if function does not like the space. I have tried many combinations like double or quotes around the field list: fl=if(exists(field0),'field0 field1','field2 field3')) or fl=if(exists(field0),field0,field1,field2,field3)) or parenthesis etc. Any help would be very appreciated. Thanks. Arcadius.
index a repository of documents(.doc) without using post.jar
Hello, I need to index a repository of documents(.doc) without using post.jar, i'm using Solr with Tomcat6. maybe its with http REST api, but how to use it? Thanks for your answer, Best regards, Anass BENJELLOUN -- View this message in context: http://lucene.472066.n3.nabble.com/index-a-repository-of-documents-doc-without-using-post-jar-tp4137798.html Sent from the Solr - User mailing list archive at Nabble.com.
index a repository of documents(.doc) without using post.jar
Hello, I need to index a repository of documents(.doc) without using post.jar, i'm using Solr with Tomcat6. maybe its with http REST api, but how to use it? Thanks for your answer, Best regards, Anass BENJELLOUN -- View this message in context: http://lucene.472066.n3.nabble.com/index-a-repository-of-documents-doc-without-using-post-jar-tp4137797.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to Disable Commit Option and Just Manage it via SolrConfig?
Hi Michael; I've written an API that users send their request. I resend their queries into Solr and manage which collection is theirs and drop query parameters about commit. However users can send commitWithin option within their request data and I have to analyze the data inside request to disallow it. That's why I'm looking a solution for it within Solr without customizing it. Thanks; Furkan KAMACI 2014-05-22 22:16 GMT+03:00 Michael Della Bitta michael.della.bi...@appinions.com: Just a thought: If your users can send updates and you can't trust them, how can you keep them from deleting all your data? I would consider using a servlet filter to inspect the request. That would probably be non-trivial if you plan to accept javabin requests as well. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Thu, May 22, 2014 at 6:36 AM, Furkan KAMACI furkankam...@gmail.com wrote: Hi All; I've designed a system that allows people to use a search service from SolrCloud. However I think that I should disable commit option for people to avoid performance issues (many users can send commit requests and this may cause to performance issues). I'll configure solr config file with autocommit and I'll not let people to commit individually. I've done some implementation for it and people can not send commit request by GET as like: localhost:8983/solr/update*?commit=true* and they can not use: HttpSolrServer solrServer = new HttpSolrServer( http://localhost:8983/solr ); solrServer*.commit();* I think that there is another way to send a commit request to Solr. It is something like: {add:{ doc:{id:change.me,title:change.me },boost:1.0,overwrite:true,*commitWithin*:1000}} So, I want to stop that usage and my current implementation does not provide it. My question is that: Is there anyway I can close the commit option for Solr from clients/outside the world of Solr and manage that option only via solr config? Thanks; Furkan KAMACI
Re: index a repository of documents(.doc) without using post.jar
Post jar is just there for convenience. Look at the relevant WIKI pages for actual URL examples: https://wiki.apache.org/solr/UpdateXmlMessages Regards, Alex Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, May 23, 2014 at 3:36 PM, benjelloun anass@gmail.com wrote: Hello, I need to index a repository of documents(.doc) without using post.jar, i'm using Solr with Tomcat6. maybe its with http REST api, but how to use it? Thanks for your answer, Best regards, Anass BENJELLOUN -- View this message in context: http://lucene.472066.n3.nabble.com/index-a-repository-of-documents-doc-without-using-post-jar-tp4137797.html Sent from the Solr - User mailing list archive at Nabble.com.
Import data from Mysql concat issues
Hi, I'm trying to index data from mysql. The indexing is successful. Then I tried to use the mysql concat function (data-config.xml) in order to concatenate a custom string with a field like this: *CONCAT('(', CAST('ΤΜΗΜΑ' AS CHAR CHARACTER SET utf8), ' ', apofasi_tmima, ')') *. The custom string ('ΤΜΗΜΑ') in Greek. Then when I try to query the field solr returns ? instead of ΤΜΗΜΑ. I have also use this: *CONCAT('(', 'ΤΜΗΜΑ', ' ', apofasi_tmima, ')')* with no success. The data-config.xml file is utf-8 encoded and at the beginning there is the ?xml version=1.0 encoding=UTF-8? xml directive. I have also tried to set in the dataSource url “characterEncoding=utf8” but indexing fails. What am I missing here? Is there any workaround for this? Below is a snippet from data-config.xml: ?xml version=1.0 encoding=UTF-8? dataConfig dataSource type=JdbcDataSource autoCommit=true batchSize=-1 convertType=false driver=com.mysql.jdbc.Driver url=jdbc:mysql://127.0.0.1:3306/apofaseis?zeroDateTimeBehavior=convertToNull user=root password= name=db / dataSource name=fieldReader type=FieldStreamDataSource / document entity name= apofaseis _2000 dataSource=db transformer=HTMLStripTransformer query=select id, CONCAT_WS('', CONCAT(apofasi_number, '/', apofasi_date, ' ', (CASE apofasi_tmima WHEN NULL THEN '' WHEN '' THEN '' ELSE CONCAT('(', CAST('ΤΜΗΜΑ' AS CHAR CHARACTER SET utf8), ' ', apofasi_tmima, ')') END))) AS grid_title, CAST(CONCAT_WS('_',id,model) AS CHAR) AS solr_id, apofasi_number, apofasi_date, apofasi_tmima, CONCAT(IFNULL(apofasi_thema, ''), ' ', IFNULL(apofasi_description, ''), ' ', apofasi_body) AS content, type, model, url, search_tag, last_modified, CONCAT_WS('', CONCAT(apofasi_number, '/', apofasi_date, ' ', (CASE apofasi_tmima WHEN NULL THEN'' WHEN '' THEN '' ELSE CONCAT('(', CAST('ΤΜΗΜΑ' AS CHAR CHARACTER SET utf8), ' ', apofasi_tmima, ')') END))) AS title from apofaseis _2000 where type = 'text' ... ... Regards, anarchos78 -- View this message in context: http://lucene.472066.n3.nabble.com/Import-data-from-Mysql-concat-issues-tp4137814.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to Disable Commit Option and Just Manage it via SolrConfig?
There is no direct Solr configuration option to disable commit requests that I know of. Maybe you could do it with an update processor. The ProcessAdd method is called to process a document; it is passed an AddUpdateCommand object for a single document and has a field for the commitWithin setting. I don't see a public method for zapping commitWithin, but I didn't look too deeply. Worst case, you would need to substitute your own equivalent of RunUpdateProcessorFactory that ignores the commitWithin setting on ProcessAdd; maybe you could subclass and extend the existing class, or maybe you would have to copy and edit it. Also, note that the delete command also has a commitWithin setting. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Thursday, May 22, 2014 6:36 AM To: solr-user@lucene.apache.org Subject: How to Disable Commit Option and Just Manage it via SolrConfig? Hi All; I've designed a system that allows people to use a search service from SolrCloud. However I think that I should disable commit option for people to avoid performance issues (many users can send commit requests and this may cause to performance issues). I'll configure solr config file with autocommit and I'll not let people to commit individually. I've done some implementation for it and people can not send commit request by GET as like: localhost:8983/solr/update*?commit=true* and they can not use: HttpSolrServer solrServer = new HttpSolrServer(http://localhost:8983/solr ); solrServer*.commit();* I think that there is another way to send a commit request to Solr. It is something like: {add:{ doc:{id:change.me,title:change.me },boost:1.0,overwrite:true,*commitWithin*:1000}} So, I want to stop that usage and my current implementation does not provide it. My question is that: Is there anyway I can close the commit option for Solr from clients/outside the world of Solr and manage that option only via solr config? Thanks; Furkan KAMACI
Re: index a repository of documents(.doc) without using post.jar
Is there a particular reason you are adverse to using post.jar? I mean, if there is some bug or inconvenience, let us know so we can fix it! The Solr server itself does not provide any ability to crawl file systems (LucidWorks Search does.) post.jar does provide that convenience. -- Jack Krupansky -Original Message- From: benjelloun Sent: Friday, May 23, 2014 4:36 AM To: solr-user@lucene.apache.org Subject: index a repository of documents(.doc) without using post.jar Hello, I need to index a repository of documents(.doc) without using post.jar, i'm using Solr with Tomcat6. maybe its with http REST api, but how to use it? Thanks for your answer, Best regards, Anass BENJELLOUN -- View this message in context: http://lucene.472066.n3.nabble.com/index-a-repository-of-documents-doc-without-using-post-jar-tp4137797.html Sent from the Solr - User mailing list archive at Nabble.com.
java.io.EOFException: seek past EOF
Hi We are getting the seek past EOF exception in solr. This occurs randomly and after a reindex we are able to access data again. After running Check Index, we got no corrupt blocks. Kindly throw light on the issue.The following is the error log: 2014-05-21 13:57:29,172 INFO processor.LogUpdateProcessor - [LucidWorksLogs] webapp= path=/update params={waitSearcher=truecommit=truewt=javabinexpungeDeletes=falsecommit_end_point=trueversion=2softCommit=falseupdate.chain=lucid-update-chain} {commit=} 0 14 2014-05-21 13:57:56,139 ERROR core.SolrCore - java.io.EOFException: seek past EOF: MMapIndexInput(path=/xxx/xxx/xxx/LucidWorks/LucidWorksSearch/conf/solr/test_Index/data/index.20140515122858307/_cgx_Lucene41_0.doc) at org.apache.lucene.store.ByteBufferIndexInput.seek(ByteBufferIndexInput.java:174) at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.reset(Lucene41PostingsReader.java:407) at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(Lucene41PostingsReader.java:293) at org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.docs(BlockTreeTermsReader.java:2188) at org.apache.lucene.search.FieldCacheImpl$SortedDocValuesCache.createValue(FieldCacheImpl.java:1240) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:212) at org.apache.lucene.search.FieldCacheImpl.getTermsIndex(FieldCacheImpl.java:1167) at org.apache.lucene.search.FieldCacheImpl.getTermsIndex(FieldCacheImpl.java:1147) at org.apache.lucene.search.FieldComparator$TermOrdValComparator.setNextReader(FieldComparator.java:1056) at org.apache.lucene.search.grouping.AbstractFirstPassGroupingCollector.setNextReader(AbstractFirstPassGroupingCollector.java:332) at org.apache.lucene.search.grouping.term.TermFirstPassGroupingCollector.setNextReader(TermFirstPassGroupingCollector.java:89) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:615) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) at org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:426) at org.apache.solr.search.Grouping.execute(Grouping.java:348) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:408) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) at org.eclipse.jetty.server.Server.handle(Server.java:351) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944) at
Re: index a repository of documents(.doc) without using post.jar
Hello, There is no inconvenience, i just need to index some files from the system using JEE and tomcat6, maybe there is a fonction which call HTTP REST. Maybe there is a solution to integrate post.jar to tomcat6. Please if you know any solution to my probleme, suggest it to me. Thanks, Best regards, Anass BENJELLOUN -- View this message in context: http://lucene.472066.n3.nabble.com/index-a-repository-of-documents-doc-without-using-post-jar-tp4137797p4137848.html Sent from the Solr - User mailing list archive at Nabble.com.
Fwd: Question to send
HI, I have one running solr core with some data indexed on solr server. This core is designed to provide OpenNLP functionalities for indexing and searching. So I have kept following binary models at this location: *\apache-tomcat-7.0.53\solr\collection1\conf\opennlp * · en-sent.bin · en-token.bin · en-pos-maxent.bin · en-ner-person.bin · en-ner-location.bin *My Problem is*: When I unload the running core, and try to delete conf directory from it. It is not allowing me to delete directory with prompt that *en-sent.bin*and *en-token.bin* is in use. If I have unloaded core, then why it is not unlocking the connection with core? Is this a known issue with OpenNLP Binaries? How can I release the connection between unloaded core and conf directory. (Specially binary models) Please provide me some pointers on this. Thanks in Advance
Re: index a repository of documents(.doc) without using post.jar
Feel free to look at the source code for post.jar. I mean, all it is really doing is scanning the directory (optionally recursively) and then streaming each file to Solr. -- Jack Krupansky -Original Message- From: benjelloun Sent: Friday, May 23, 2014 8:15 AM To: solr-user@lucene.apache.org Subject: Re: index a repository of documents(.doc) without using post.jar Hello, There is no inconvenience, i just need to index some files from the system using JEE and tomcat6, maybe there is a fonction which call HTTP REST. Maybe there is a solution to integrate post.jar to tomcat6. Please if you know any solution to my probleme, suggest it to me. Thanks, Best regards, Anass BENJELLOUN -- View this message in context: http://lucene.472066.n3.nabble.com/index-a-repository-of-documents-doc-without-using-post-jar-tp4137797p4137848.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: index a repository of documents(.doc) without using post.jar
Hey Anass, Have look at another Apache project : http://manifoldcf.apache.org It works with Tomcat/Solr. It is handy to handle deletions and incremental updates. On Friday, May 23, 2014 3:41 PM, benjelloun anass@gmail.com wrote: Hello, There is no inconvenience, i just need to index some files from the system using JEE and tomcat6, maybe there is a fonction which call HTTP REST. Maybe there is a solution to integrate post.jar to tomcat6. Please if you know any solution to my probleme, suggest it to me. Thanks, Best regards, Anass BENJELLOUN -- View this message in context: http://lucene.472066.n3.nabble.com/index-a-repository-of-documents-doc-without-using-post-jar-tp4137797p4137848.html Sent from the Solr - User mailing list archive at Nabble.com.
Change the group.field name in the solr response
Hi, How can I change the field name in the grouped section of the solr response. I know for changing the field names in the response where solr returns documents you can make a query with fl changed as fl=mapping1:fieldname1,mapping2:fieldname2 How do I achieve the same thing for grouping? For eg: If the solr returns the below response for grouped section when I send the query with group.field=fieldname grouped: {fieldname: {matches: 1,ngroups: 1,groups: [{groupValue : 11254,doclist: {numFound: 1,start: 0,docs: [{store_id: 101, name: tubelight ,fieldname: 14}]}}]}}, I want solr to change the fieldname in the response to say some other value I specify in the query. How can I achieve this? Thanks, Prathik
Re: index a repository of documents(.doc) without using post.jar
Hello, I looked to source code of post.jar, that was very interesting. I looked for manifoldcf apache, that was interesting too. But i what i want to do is indexing some files using http rest, this is my request which dont work, maybe this way is the easiest for implementation: put: localhost:8080/solr/update?commit=true add doc field name=titlekhalid/field field name=descriptionbouchna9 /field field name=date23/05/2014 /field /doc /add I'm using dev http client for test. Thanks, Anass BENJELLOUN -- View this message in context: http://lucene.472066.n3.nabble.com/index-a-repository-of-documents-doc-without-using-post-jar-tp4137797p4137881.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How does query on few-hits AND many-hits work
I can answer some of this myself now that I have dived into it to understand what Solr/Lucene does and to see if it can be done better * In current Solr/Lucene (or at least in 4.4) indices on both no_dlng_doc_ind_sto and timestamp_dlng_doc_ind_sto are used and the doc-id-sets found are intersected to get the final set of doc-ids * It IS more efficient to just use the index for the no_dlng_doc_ind_sto-part of the request to get doc-ids that match that part and then fetch timestamp-doc-values for those doc-ids to filter out the docs that does not match the timestamp_dlng_doc_ind_sto-part of the query. I have made changes to our version of Solr (and Lucene) to do that and response-times go from about 10 secs to about 1 sec (of course dependent on whats in file-cache etc.) - in cases where no_dlng_doc_ind_sto hit about 500-1000 docs and timestamp_dlng_doc_ind_sto hit about 3-4 billion. Regards, Per Steffensen On 19/05/14 13:33, Per Steffensen wrote: Hi Lets say I have a Solr collection (running across several servers) containing 5 billion documents. A.o. each document have a value for field no_dlng_doc_ind_sto (a long) and field timestamp_dlng_doc_ind_sto (also a long). Both no_dlng_doc_ind_sto and timestamp_dlng_doc_ind_sto are doc-value, indexed and stored. Like this in schema.xml dynamicField name=*_dlng_doc_ind_sto type=dlng indexed=true stored=true required=true docValues=true/ fieldType name=dlng class=solr.TrieLongField precisionStep=0 positionIncrementGap=0 docValuesFormat=Disk/ I make queries like this: no_dlng_doc_ind_sto:(NO) AND timestamp_dlng_doc_ind_sto:([TIME_START TO TIME_END]) * The no_dlng_doc_ind_sto:(NO)-part of a typical query will hit between 500 and 1000 documents out of the total 5 billion * The timestamp_dlng_doc_ind_sto:([TIME_START TO TIME_END])-part of a typical query will hit between 3-4 billion documents out of the total 5 billion Question is how Solr/Lucene deals with such requests? I am thinking that using the indices on both no_dlng_doc_ind_sto and timestamp_dlng_doc_ind_sto to get two sets of doc-ids and then make an intersection of those might not be the most efficient. You are making an intersection of two doc-id-sets of size 500-1000 and 3-4 billion. It might be faster to just use the index for no_dlng_doc_ind_sto to get the doc-ids for the 500-1000 documents, then for each of those fetch their timestamp_dlng_doc_ind_sto-value (using doc-value) to filter out the ones among the 500-1000 that does not match the timestamp-part of the query. But what does Solr/Lucene actually do? Is it Solr- or Lucene-code that make the decision on what to do? Can you somehow hint the search-engine that you want one or the other method used? Solr 4.4 (and corresponding Lucene), BTW, if that makes a difference Regards, Per Steffensen
RE: How does query on few-hits AND many-hits work
Per Steffensen [st...@designware.dk] wrote: * It IS more efficient to just use the index for the no_dlng_doc_ind_sto-part of the request to get doc-ids that match that part and then fetch timestamp-doc-values for those doc-ids to filter out the docs that does not match the timestamp_dlng_doc_ind_sto-part of the query. Thank you for the follow up. It sounds rather special-case though, with requirement of DocValues for the range-field. Do you think this can be generalized? - Toke Eskildsen
Re: How does query on few-hits AND many-hits work
On Fri, May 23, 2014 at 11:37 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Per Steffensen [st...@designware.dk] wrote: * It IS more efficient to just use the index for the no_dlng_doc_ind_sto-part of the request to get doc-ids that match that part and then fetch timestamp-doc-values for those doc-ids to filter out the docs that does not match the timestamp_dlng_doc_ind_sto-part of the query. Thank you for the follow up. It sounds rather special-case though, with requirement of DocValues for the range-field. Do you think this can be generalized? Maybe it already is? http://heliosearch.org/advanced-filter-caching-in-solr/ Something like this: fq={!frange cache=false cost=150 v=timestampField l=beginTime u=endTime} -Yonik http://heliosearch.org - facet functions, subfacets, off-heap filtersfieldcache
Internals about Too many values for UnInvertedField faceting on field xxx
Could anybody tell us some internals about Too many values for UnInvertedField faceting on field xxx ? We have two solr servers. Solr A : 128G RAM, 60M docs, 2600 different terms with field “code”, every term of field “code” has fixed length 6. the sum count of token of field “code” is 9 Billions. The total space used by field “code” is 50 Billions. Solr B: 128G RAM, 140M docs,1600 different terms with field “code” every term of field “code” has fixed length 6. the sum count of token of field “code” is 18 Billions The total space of field “code” is 90 Billions. When we do facet query “ q=*:*wt=xmlindent=truefacet=truefacet.field=code” Solr B is OK, BUT Solr A meets Exception with the message “Too many values for UnInvertedField faceting on field code”. Now we think the limitation of UnInvertedField is related with the number of different terms with one field Could anybody tell us some internals about this problem? We won’t to use facet.method=enum because it ‘s too slow to use. Thanks!
RE: Internals about Too many values for UnInvertedField faceting on field xxx
张月祥 [zhan...@calis.edu.cn] wrote: Could anybody tell us some internals about Too many values for UnInvertedField faceting on field xxx ? I must admit I do not fully understand it in detail, but it is a known problem with Field Cache (facet.method=fc) faceting. The remedy is to use DocValues, which does not have the same limitation. This should also result in lower heap usage. You will have to re-index everything though. We have successfully used DocValues on an index with 400M documents and 300M unique values on a single facet field. - Toke Eskildsen
Solr 4.7.2 ValueSourceParser classCast exception
Hi All, I have my own popularity value source class and I let solr know about it via solrconfig.xml valueSourceParser name=popularity class=mysolr.sources.PopValueSourceParser / But then I get the following class cast exception I have tried to make sure there are no old Solr jar files in the classpath. Why would this be happening ? I even tried to use the lib tag to hard code the solr and solrj jars for 4.7.2 org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, mysolr.sources.PopValueSourceParser failed to instantiate org.apache.solr.search.ValueSourceParser at org.apache.solr.core.SolrCore.init(SolrCore.java:844) at org.apache.solr.core.SolrCore.init(SolrCore.java:630) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:562) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:258) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:250) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Caused by: org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, mysolr.sources.PopValueSourceParser failed to instantiate org.apache.solr.search.ValueSourceParser at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:552) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:587) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2191) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2185) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2218) at org.apache.solr.core.SolrCore.initValueSourceParsers(SolrCore.java:2130) at org.apache.solr.core.SolrCore.init(SolrCore.java:765) ... 13 more Caused by: java.lang.ClassCastException: class mysolr.sources.PopValueSourceParser at java.lang.Class.asSubclass(Class.java:3018) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:454) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:401) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:531) ... 19 more MySolr[46778:5844 0] 2014/05/22 15:47:28 717.16 MB/4.09 GB ERROR org.apache.solr.core.CoreContainer- null:org.apache.solr.common.SolrException: Unable to create core: core1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:989) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:606) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:258) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:250) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Thanks, Summer
Re: Import data from Mysql concat issues
Couple of possibilities: 1 The data in Solr is fine. However, your browser is getting the proper characters back but is not set up to handle the proper character set so displays . 2 Your servlet container is not set up (either inbound or outbound) to handle the character set you're sending it. Best, Erick On Fri, May 23, 2014 at 3:18 AM, anarchos78 rigasathanasio...@hotmail.com wrote: Hi, I'm trying to index data from mysql. The indexing is successful. Then I tried to use the mysql concat function (data-config.xml) in order to concatenate a custom string with a field like this: *CONCAT('(', CAST('ΤΜΗΜΑ' AS CHAR CHARACTER SET utf8), ' ', apofasi_tmima, ')') *. The custom string ('ΤΜΗΜΑ') in Greek. Then when I try to query the field solr returns ? instead of ΤΜΗΜΑ. I have also use this: *CONCAT('(', 'ΤΜΗΜΑ', ' ', apofasi_tmima, ')')* with no success. The data-config.xml file is utf-8 encoded and at the beginning there is the ?xml version=1.0 encoding=UTF-8? xml directive. I have also tried to set in the dataSource url “characterEncoding=utf8” but indexing fails. What am I missing here? Is there any workaround for this? Below is a snippet from data-config.xml: ?xml version=1.0 encoding=UTF-8? dataConfig dataSource type=JdbcDataSource autoCommit=true batchSize=-1 convertType=false driver=com.mysql.jdbc.Driver url=jdbc:mysql://127.0.0.1:3306/apofaseis?zeroDateTimeBehavior=convertToNull user=root password= name=db / dataSource name=fieldReader type=FieldStreamDataSource / document entity name= apofaseis _2000 dataSource=db transformer=HTMLStripTransformer query=select id, CONCAT_WS('', CONCAT(apofasi_number, '/', apofasi_date, ' ', (CASE apofasi_tmima WHEN NULL THEN '' WHEN '' THEN '' ELSE CONCAT('(', CAST('ΤΜΗΜΑ' AS CHAR CHARACTER SET utf8), ' ', apofasi_tmima, ')') END))) AS grid_title, CAST(CONCAT_WS('_',id,model) AS CHAR) AS solr_id, apofasi_number, apofasi_date, apofasi_tmima, CONCAT(IFNULL(apofasi_thema, ''), ' ', IFNULL(apofasi_description, ''), ' ', apofasi_body) AS content, type, model, url, search_tag, last_modified, CONCAT_WS('', CONCAT(apofasi_number, '/', apofasi_date, ' ', (CASE apofasi_tmima WHEN NULL THEN'' WHEN '' THEN '' ELSE CONCAT('(', CAST('ΤΜΗΜΑ' AS CHAR CHARACTER SET utf8), ' ', apofasi_tmima, ')') END))) AS title from apofaseis _2000 where type = 'text' ... ... Regards, anarchos78 -- View this message in context: http://lucene.472066.n3.nabble.com/Import-data-from-Mysql-concat-issues-tp4137814.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: java.io.EOFException: seek past EOF
What version of Solr are you using? There were some issues like this in the 4.1 time-frame. Best, Erick On Fri, May 23, 2014 at 3:39 AM, aarthi aarthiran...@gmail.com wrote: Hi We are getting the seek past EOF exception in solr. This occurs randomly and after a reindex we are able to access data again. After running Check Index, we got no corrupt blocks. Kindly throw light on the issue.The following is the error log: 2014-05-21 13:57:29,172 INFO processor.LogUpdateProcessor - [LucidWorksLogs] webapp= path=/update params={waitSearcher=truecommit=truewt=javabinexpungeDeletes=falsecommit_end_point=trueversion=2softCommit=falseupdate.chain=lucid-update-chain} {commit=} 0 14 2014-05-21 13:57:56,139 ERROR core.SolrCore - java.io.EOFException: seek past EOF: MMapIndexInput(path=/xxx/xxx/xxx/LucidWorks/LucidWorksSearch/conf/solr/test_Index/data/index.20140515122858307/_cgx_Lucene41_0.doc) at org.apache.lucene.store.ByteBufferIndexInput.seek(ByteBufferIndexInput.java:174) at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.reset(Lucene41PostingsReader.java:407) at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(Lucene41PostingsReader.java:293) at org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.docs(BlockTreeTermsReader.java:2188) at org.apache.lucene.search.FieldCacheImpl$SortedDocValuesCache.createValue(FieldCacheImpl.java:1240) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:212) at org.apache.lucene.search.FieldCacheImpl.getTermsIndex(FieldCacheImpl.java:1167) at org.apache.lucene.search.FieldCacheImpl.getTermsIndex(FieldCacheImpl.java:1147) at org.apache.lucene.search.FieldComparator$TermOrdValComparator.setNextReader(FieldComparator.java:1056) at org.apache.lucene.search.grouping.AbstractFirstPassGroupingCollector.setNextReader(AbstractFirstPassGroupingCollector.java:332) at org.apache.lucene.search.grouping.term.TermFirstPassGroupingCollector.setNextReader(TermFirstPassGroupingCollector.java:89) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:615) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) at org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:426) at org.apache.solr.search.Grouping.execute(Grouping.java:348) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:408) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) at org.eclipse.jetty.server.Server.handle(Server.java:351) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) at
Re: Import data from Mysql concat issues
I think that it happens at index time. The reason is that when i query for the specific field solr returns the ? string! -- View this message in context: http://lucene.472066.n3.nabble.com/Import-data-from-Mysql-concat-issues-tp4137814p4137908.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: index a repository of documents(.doc) without using post.jar
There's an example of using curl to make a REST call to update a core on this page: https://wiki.apache.org/solr/UpdateXmlMessages If that doesn't help, please let us know what error you're receiving. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Fri, May 23, 2014 at 10:42 AM, benjelloun anass@gmail.com wrote: Hello, I looked to source code of post.jar, that was very interesting. I looked for manifoldcf apache, that was interesting too. But i what i want to do is indexing some files using http rest, this is my request which dont work, maybe this way is the easiest for implementation: put: localhost:8080/solr/update?commit=true add doc field name=titlekhalid/field field name=descriptionbouchna9 /field field name=date23/05/2014 /field /doc /add I'm using dev http client for test. Thanks, Anass BENJELLOUN -- View this message in context: http://lucene.472066.n3.nabble.com/index-a-repository-of-documents-doc-without-using-post-jar-tp4137797p4137881.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.7.2 ValueSourceParser classCast exception
Are you sure that you compiled your code with the proper Solr jars so that the class signature (extends, implements, and constructors) matches the Solr 4.7.2 jars? I mean, Java is simply complaining that your class is not a valid value source class of the specified type. -- Jack Krupansky -Original Message- From: Summer Shire Sent: Friday, May 23, 2014 12:40 PM To: solr-user@lucene.apache.org Subject: Solr 4.7.2 ValueSourceParser classCast exception Hi All, I have my own popularity value source class and I let solr know about it via solrconfig.xml valueSourceParser name=popularity class=mysolr.sources.PopValueSourceParser / But then I get the following class cast exception I have tried to make sure there are no old Solr jar files in the classpath. Why would this be happening ? I even tried to use the lib tag to hard code the solr and solrj jars for 4.7.2 org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, mysolr.sources.PopValueSourceParser failed to instantiate org.apache.solr.search.ValueSourceParser at org.apache.solr.core.SolrCore.init(SolrCore.java:844) at org.apache.solr.core.SolrCore.init(SolrCore.java:630) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:562) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:258) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:250) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Caused by: org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, mysolr.sources.PopValueSourceParser failed to instantiate org.apache.solr.search.ValueSourceParser at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:552) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:587) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2191) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2185) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2218) at org.apache.solr.core.SolrCore.initValueSourceParsers(SolrCore.java:2130) at org.apache.solr.core.SolrCore.init(SolrCore.java:765) ... 13 more Caused by: java.lang.ClassCastException: class mysolr.sources.PopValueSourceParser at java.lang.Class.asSubclass(Class.java:3018) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:454) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:401) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:531) ... 19 more MySolr[46778:5844 0] 2014/05/22 15:47:28 717.16 MB/4.09 GB ERROR org.apache.solr.core.CoreContainer- null:org.apache.solr.common.SolrException: Unable to create core: core1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:989) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:606) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:258) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:250) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Thanks, Summer
Re: Question to send
You'll have better luck asking the folks at OpenNLP. This isn't really a Solr question. On Fri, May 23, 2014 at 6:38 PM, rashi gandhi gandhirash...@gmail.comwrote: HI, I have one running solr core with some data indexed on solr server. This core is designed to provide OpenNLP functionalities for indexing and searching. So I have kept following binary models at this location: *\apache-tomcat-7.0.53\solr\collection1\conf\opennlp * · en-sent.bin · en-token.bin · en-pos-maxent.bin · en-ner-person.bin · en-ner-location.bin *My Problem is*: When I unload the running core, and try to delete conf directory from it. It is not allowing me to delete directory with prompt that *en-sent.bin*and *en-token.bin* is in use. If I have unloaded core, then why it is not unlocking the connection with core? Is this a known issue with OpenNLP Binaries? How can I release the connection between unloaded core and conf directory. (Specially binary models) Please provide me some pointers on this. Thanks in Advance -- Regards, Shalin Shekhar Mangar.
Re: Import data from Mysql concat issues
bq: I think that it happens at index time How do you know that? If you're looking at the results in a browser you do _not_ know that. If you're looking at the raw values in, say, SolrJ then you _might_ know that, there's still the issue of whether you're sending the docs to Solr and your servlet container is munging them. In this latter case, you're right the info is indexed that way. Either way, it's not Solr's problem. Solr works just fine with utf-8. So if the data is getting into the index with weird characters, it's your setup. If it's just a browser problem, it's _still_ a problem with your setup. FWIW, Erick On Fri, May 23, 2014 at 10:36 AM, anarchos78 rigasathanasio...@hotmail.com wrote: I think that it happens at index time. The reason is that when i query for the specific field solr returns the ? string! -- View this message in context: http://lucene.472066.n3.nabble.com/Import-data-from-Mysql-concat-issues-tp4137814p4137908.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Import data from Mysql concat issues
Tomcat setup is fine. I insist that it's Solr's issue. The whole index consists of Greek (funny characters) and solr returns them normally. The problem here is that I cannot concatenate Greek characters in data-config.xml (hard-coded). -- View this message in context: http://lucene.472066.n3.nabble.com/Import-data-from-Mysql-concat-issues-tp4137814p4137939.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud Nodes autoSoftCommit and (temporary) missing documents
Hey all, I've got a number of nodes (Solr 4.4 Cloud) that I'm balancing with HaProxy for queries. I'm indexing pretty much constantly, and have autoCommit and autoSoftCommit on for Near Realtime Searching. All works nicely, except that occasionally the auto-commit cycles are far enough off that one node will return a document that another node doesn't. I don't want to have to add something like this: timestamp:[* TO NOW-30MINUTE] to every query to make sure that all the nodes have the record. Ideas? autoSoftCommit more often? autoCommit maxDocs10/maxDocs maxTime720/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime3/maxTime maxDocs5000/maxDocs /autoSoftCommit Thanks, M.
fw: (Issue) How improve solr facet performance
Hi, Solr Developer Thanks very much for your timely reply. 1. I'm sorry, I have made a mistake, the total number of documents is 32 Million, not 320 Million. 2. The system memory is large for solr index, OS total has 256G, I set the solr tomcat HEAPSIZE=-Xms25G -Xmx100G -How many fields are you faceting on? Reply: 9 fields I facet on. - How many unique values does your facet fields have (approximately)? Reply: 3 facet fields have one hundred unique values, other 6 facet fields' unique values are between 3 to 15. - What is the content of your facets (Strings, numbers?) Reply: 9 fields are all numbers. - Which facet.method do you use? Reply: Used the default facet.method=fc And we test this scenario: If the number of facet fields' unique values is less we add facet.method=enum, there is a little to improve performance. - What is the response time with faceting and a few thousand hits? Reply: result name=response numFound=2925 start=0 QTime is int name=QTime6/int Best Regards, Alice Yang +86-021-51530666*41493 Floor 19,KaiKai Plaza,888,Wanhandu Rd,Shanghai(200042) -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Friday, May 23, 2014 8:08 PM To: d...@lucene.apache.org Subject: Re: (Issue) How improve solr facet performance On Fri, 2014-05-23 at 11:45 +0200, Alice.H.Yang (mis.cnsh04.Newegg) 41493 wrote: We are blocked by solr facet performance when query hits many documents. (about 10,000,000) [320M documents, immediate response for plain search with 1M hits] But when we add several facet.field to do facet ,QTime increaseto 220ms or more. It is not clear whether your observation of increased response time is due to many hits or faceting in itself. - How many fields are you faceting on? - How many unique values does your facet fields have (approximately)? - What is the content of your facets (Strings, numbers?) - Which facet.method do you use? - What is the response time with faceting and a few thousand hits? Do you have some advice on how improve the facet performance when hit many documents. That depends on whether your bottleneck is the hitcount itself, the number of unique facet values or something third like I/O. - Toke Eskildsen, State and University Library, Denmark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: java.io.EOFException: seek past EOF
We are using Solr version 4.4 -- View this message in context: http://lucene.472066.n3.nabble.com/java-io-EOFException-seek-past-EOF-tp4137817p4137959.html Sent from the Solr - User mailing list archive at Nabble.com.