Re: Upgrading solr from 3.3 to 3.4
Hi , Ya we need to upgrade but my question is whether reindexing of all cores is required or we can directly use already indexed data folders of solr 3.3 to solr 3.4. Thanks, Isan Fulia. On 19 September 2011 11:03, Wyhw Whon w...@microgle.com wrote: If you are already using Apache Lucene 3.1, 3.2 or 3.3, we strongly recommend you upgrade to 3.4.0 because of the index corruption bug on OS or computer crash or power loss (LUCENE-3418), now fixed in 3.4.0. 2011/9/19 Isan Fulia isan.fu...@germinait.com Hi all, Does upgrading solr from 3.3 to 3.4 requires reindexing of all the cores or we can directly copy the data folders to the new solr ? -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia.
Re: Is it possible to use different types of datasource in DIH?
I did some more testing and it seems that as soon as you use FileDataSource it overrides any other dataSource. dataConfig dataSource type=HttpDataSource name=url encoding=UTF-8 connectionTimeout=3 readTimeout=3/ dataSource type=FileDataSource encoding=UTF-8 / document entity name=xmlroot datasource=url rootEntity=false url=http://www.server.com/rss.xml; processor=XPathEntityProcessor forEach=/rss/channel/item field column=link xpath=/rss/channel/item/link/ /entity /document /dataConfig will not work, unless you remove FileDataSource. Anyone know a way to fix this (except removing FileDataSource) ? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-use-different-types-of-datasource-in-DIH-tp3344380p3348011.html Sent from the Solr - User mailing list archive at Nabble.com.
java.io.CharConversionException While Indexing in Solr 3.4
Hi List, I tried Solr 3.4.0 today and while indexing I got the error java.lang.RuntimeException: [was class java.io.CharConversionException] Invalid UTF-8 middle byte 0x73 (at char #66611, byte #65289) My earlier version was Solr 1.4 and this same document went into index successfully. Looking around, I see issue https://issues.apache.org/jira/browse/SOLR-2381 which seems to fix the issue. I thought this patch is already applied to Solr 3.4.0. Is there something I am missing? Is there anything else I need to mention? Logs/ My document details etc.? *Pranav Prakash* temet nosce Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com | Google http://www.google.com/profiles/pranny
Re: Lucene-SOLR transition
On Sep 18, 2011, at 19:43 , Michael Sokolov wrote: On 9/15/2011 8:30 PM, Scott Smith wrote: 2. Assuming that the answer to 1 is correct, then is there an easy way to take a lucene query (with nested Boolean queries, filter queries, etc.) and generate a SOLR query string with q and fq components? I believe that Query.toString() will probably get you back something that can be parsed in turn by the traditional lucene QueryParser, thus completing the circle and returning your original Query. But why would you want to do that? No, you can't rely on Query.toString() roundtripping (think stemming, for example - but many other examples that won't work that way too). What you can do, since you know Lucene's API well, is write a QParser(Plugin) that takes request parameters as strings and generates the Query from that like you are now with your Lucene app. Erik
Re: indexing data from rich documents - Tika with solr3.1
On Sep 18, 2011, at 21:52 , scorpking wrote: Hi Erik Hatcher-4 I tried index from your url. But i have a problem. In your case, you knew a files absolute path (Dir.new(/Users/erikhatcher/apache-solr-3.3.0/docs). So you can indexed it. In my case, i don't know a files absolute path. I only know http's address where have files (ex: you can see this link as reference: http://www.lc.unsw.edu.au/onlib/pdf/). Another ways? Thanks Write a little script that takes the HTTP directory listing like that, and then uses stream.url (rather than stream.file as my example used). Erik
Re: Upgrading solr from 3.3 to 3.4
Reindexing is not necessary. Drop in 3.4 and go. For this sort of scenario, it's easy enough to try using a copy of your SOLR_HOME directory with an instance of the newest release of Solr. If the release notes don't say a reindex is necessary, then it's not, but always a good idea to try it and run any tests you have handy. Erik On Sep 19, 2011, at 00:02 , Isan Fulia wrote: Hi , Ya we need to upgrade but my question is whether reindexing of all cores is required or we can directly use already indexed data folders of solr 3.3 to solr 3.4. Thanks, Isan Fulia. On 19 September 2011 11:03, Wyhw Whon w...@microgle.com wrote: If you are already using Apache Lucene 3.1, 3.2 or 3.3, we strongly recommend you upgrade to 3.4.0 because of the index corruption bug on OS or computer crash or power loss (LUCENE-3418), now fixed in 3.4.0. 2011/9/19 Isan Fulia isan.fu...@germinait.com Hi all, Does upgrading solr from 3.3 to 3.4 requires reindexing of all the cores or we can directly copy the data folders to the new solr ? -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia.
Re: indexing data from rich documents - Tika with solr3.1
yeah, i want to use DIH and i tried config my file dataconfig. but it is wrong. This is my config: *dataConfig dataSource type=JdbcDataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://ipAddress;databaseName=VTC_Edu user=myuser password=mypass name=VTCEduDocument/ dataSource type=BinURLDataSource name=dsurl/ document entity name=VTCEduDocument pk=pk_document_id query=select TOP 10 pk_document_id, s_path_origin from [VTC_Edu].[dbo].[tbl_Document] transformer=vn.vtc.solr.transformer.ImageFilter,vn.vtc.solr.transformer.RemoveHTML,RegexTransformer,TemplateTransformer,vn.vtc.solr.transformer.vntransformer,vn.vtc.solr.correctUnicodeString.correctUnicodeString,vn.vtc.solr.unescapeHtmlString.UnescapeHtmlString,vn.vtc.solr.correctISOString.correctISOString field column=pk_document_id name=pk_document_id / field column=s_path_origin name=s_path_origin / /entity entity processor=TikaEntityProcessor dataSource=dsurl format=text url= http://media.gox.vn/edu/document/original/${VTCEduDocument.s_path_origin}; field column=Author name=author meta=true/ field column=title name=title meta=true/ field column=text name=text/ /entity /document /dataConfig* And here error: *EVERE: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Exception in invoking url null Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:89) at org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:38) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:353) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:411) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392) Caused by: java.net.MalformedURLException: no protocol: nullselect TOP 10 pk_document_id, s_path_origin from [VTC_Edu].[dbo].[tbl_Document] at java.net.URL.init(URL.java:567) at java.net.URL.init(URL.java:464) at java.net.URL.init(URL.java:413) at org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:81) ... 10 more* ??? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3348149.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: java.io.CharConversionException While Indexing in Solr 3.4
Just in case, someone might be intrested here is the log SEVERE: java.lang.RuntimeException: [was class java.io.CharConversionException] Invalid UTF-8 middle byte 0x73 (at char #66641, byte #65289) at com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:287) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:146) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.io.CharConversionException: Invalid UTF-8 middle byte 0x73 (at char #66641, byte #65289) at com.ctc.wstx.io.UTF8Reader.reportInvalidOther(UTF8Reader.java:313) at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:204) at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101) at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84) at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57) at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992) at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628) at com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126) at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649) ... 26 more Also, is there a setting so I can change the level of backtrace? This would be helpful in showing the complete stack instead of 26 more ... *Pranav Prakash* temet nosce Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com | Google http://www.google.com/profiles/pranny On Mon, Sep 19, 2011 at 14:16, Pranav Prakash pra...@gmail.com wrote: Hi List, I tried Solr 3.4.0 today and while indexing I got the error java.lang.RuntimeException: [was class java.io.CharConversionException] Invalid UTF-8 middle byte 0x73 (at char #66611, byte #65289) My earlier version was Solr 1.4 and this same document went into index successfully. Looking around, I see issue https://issues.apache.org/jira/browse/SOLR-2381 which seems to fix the issue. I thought this patch is already applied to Solr 3.4.0. Is there something I am missing? Is there anything else I need to mention? Logs/ My document details etc.? *Pranav Prakash* temet nosce Twitter http://twitter.com/pranavprakash | Bloghttp://blog.myblive.com | Google http://www.google.com/profiles/pranny
term vector parser in solr.NET
hi, i was wondering if there is any method to get back the term vector list from solr through solr.NET? from the source code for SOLR.NET i couldn't notice any term vector parser in SOLR.NET . -- -JAME
Re: Is it possible to use different types of datasource in DIH?
I did some more testing and it seems that as soon as you use FileDataSource it overrides any other dataSource. dataConfig dataSource type=HttpDataSource name=url encoding=UTF-8 connectionTimeout=3 readTimeout=3/ dataSource type=FileDataSource encoding=UTF-8 / document entity name=xmlroot datasource=url rootEntity=false url=http://www.server.com/rss.xml; processor=XPathEntityProcessor forEach=/rss/channel/item field column=link xpath=/rss/channel/item/link/ /entity /document /dataConfig will not work, unless you remove FileDataSource. Anyone know a way to fix this (except removing FileDataSource) ? Did you try to give a name to FileDataSource? e.g. dataSource type=FileDataSource encoding=UTF-8 name=fileData/
Re: Upgrading solr from 3.3 to 3.4
Thanks Erick. On 19 September 2011 15:10, Erik Hatcher erik.hatc...@gmail.com wrote: Reindexing is not necessary. Drop in 3.4 and go. For this sort of scenario, it's easy enough to try using a copy of your SOLR_HOME directory with an instance of the newest release of Solr. If the release notes don't say a reindex is necessary, then it's not, but always a good idea to try it and run any tests you have handy. Erik On Sep 19, 2011, at 00:02 , Isan Fulia wrote: Hi , Ya we need to upgrade but my question is whether reindexing of all cores is required or we can directly use already indexed data folders of solr 3.3 to solr 3.4. Thanks, Isan Fulia. On 19 September 2011 11:03, Wyhw Whon w...@microgle.com wrote: If you are already using Apache Lucene 3.1, 3.2 or 3.3, we strongly recommend you upgrade to 3.4.0 because of the index corruption bug on OS or computer crash or power loss (LUCENE-3418), now fixed in 3.4.0. 2011/9/19 Isan Fulia isan.fu...@germinait.com Hi all, Does upgrading solr from 3.3 to 3.4 requires reindexing of all the cores or we can directly copy the data folders to the new solr ? -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia.
Re: Is it possible to use different types of datasource in DIH?
Yeah, naming datasources maybe only works when they are of the same type. I got this to work with URLdatasource and url=file:///${crawl.fileAbsolutePath} (2 forward slashes doesn't work) for the local files. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-use-different-types-of-datasource-in-DIH-tp3344380p3348257.html Sent from the Solr - User mailing list archive at Nabble.com.
dataimport.properties still updated on error
Hi I am currently using the DIH to connect to and import data from a MS SQL Server, and in general doing full, delta or deletes seems to work perfectly. The issue is that I spotted some errors being logged in the tomcat logs for SOLR which are : 19-Sep-2011 07:45:25 org.apache.solr.common.SolrException log SEVERE: Exception in entity : product:org.apache.solr.handler.dataimport.DataImportHandlerException: com.microsoft.sqlserver.jdbc.SQLServerException: Transaction (Process ID 125) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:64) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:339) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$600(JdbcDataSource.java:228) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:262) at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:77) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:302) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:178) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:390) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:429) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Transaction (Process ID 125) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:213) at com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:4713) at com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1671) at com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:944) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:331) ... 11 more 19-Sep-2011 07:45:25 org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Delta Import completed successfully 19-Sep-2011 07:45:25 org.apache.solr.handler.dataimport.DocBuilder finish Now this SQL error I can deal with and I will probably switch to snapshot Isolation as these are constantly updated tables, but my issue is not the sql error but the fact that the delta import still reported that it had imported successfully and still wrote out the last updated time to the dataimport.properties file, so the next time it ran it missed a bunch of documents that should have been indexed. If it had failed and just rolled back the changes and not updated the dataimport.properties file it would (assuming no more deadlocks) have caught all of the missed documents on the next delta import. My connection to MS SQL is using the responseBuffering=adaptive setting to reduce memory overhead, So I guess what I am asking is there any way I can cause the DIH to roll back the import if an error occurs and to not update the dataimport.properties file. Any help or suggestions would be appreciated Thanks Barry H DISCLAIMER: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Misco UK Ltd. Any unauthorised use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation. Misco UK Ltd, registered in Scotland Number 114143. Registered Office: Caledonian Exchange, 19a Canning Street, Edinburgh EH3 8EG. Telephone +44 (0)1933 686000. This e-mail message has been scanned by CA Gateway Security.
Re: OutOfMemoryError coming from TermVectorsReader
Please include information about your heap size, (and other Java command line arguments) as well a platform OS (version, swap size, etc), Java version, underlying hardware (RAM, etc) for us to better help you. From the information you have given, increasing your heap size should help. Thanks, Glen http://zzzoot.blogspot.com/ On Mon, Sep 19, 2011 at 1:34 AM, anand.ni...@rbs.com wrote: Hi, I am new to solr. I an trying to index text documents of large size. On searching from indexed documents I am getting following OutOfMemoryError. Please help me in resolving this issue. The field which stores file content is configured in schema.xml as below: field name=Content type=text_token indexed=true stored=true omitNorms=true termVectors=true termPositions=true termOffsets=true / and Highlighting is configured as below: str name=hlon/str str name=hl.fl${all.fields.list}/str str name=f.Content.hl.fragsize500/str str name=f.Content.hl.useFastVectorHighlightertrue/str 2011-09-16 09:38:45.763 [http-thread-pool-9091(5)] ERROR - java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.TermVectorsReader.readTermVector(TermVectorsReader.java:503) at org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:263) at org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:284) at org.apache.lucene.index.SegmentReader.getTermFreqVector(SegmentReader.java:759) at org.apache.lucene.index.DirectoryReader.getTermFreqVector(DirectoryReader.java:510) at org.apache.solr.search.SolrIndexReader.getTermFreqVector(SolrIndexReader.java:234) at org.apache.lucene.search.vectorhighlight.FieldTermStack.init(FieldTermStack.java:83) at org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getFieldFragList(FastVectorHighlighter.java:175) at org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBestFragments(FastVectorHighlighter.java:166) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastVectorHighlighter(DefaultSolrHighlighter.java:509) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:376) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:116) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:279) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:655) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:595) at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:98) at com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:91) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:162) at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:326) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:227) at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:170) at com.sun.grizzly.http.ProcessorTask.invokeAdapter(ProcessorTask.java:822) at com.sun.grizzly.http.ProcessorTask.doProcess(ProcessorTask.java:719) at com.sun.grizzly.http.ProcessorTask.process(ProcessorTask.java:1013) Thanks Regards Anand Nigam Developer *** The Royal Bank of Scotland plc. Registered in Scotland No 90312. Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. Authorised and regulated by the Financial Services Authority. The Royal Bank of Scotland N.V. is authorised and regulated by the De Nederlandsche Bank and has its seat at Amsterdam, the Netherlands, and is registered in the Commercial Register under number 33002587. Registered Office: Gustav Mahlerlaan 350, Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and The Royal Bank of Scotland plc are authorised to act as agent for each other in certain jurisdictions. This e-mail message is confidential and
Re: Lucene-SOLR transition
On 9/19/2011 5:27 AM, Erik Hatcher wrote: On Sep 18, 2011, at 19:43 , Michael Sokolov wrote: On 9/15/2011 8:30 PM, Scott Smith wrote: 2. Assuming that the answer to 1 is correct, then is there an easy way to take a lucene query (with nested Boolean queries, filter queries, etc.) and generate a SOLR query string with q and fq components? I believe that Query.toString() will probably get you back something that can be parsed in turn by the traditional lucene QueryParser, thus completing the circle and returning your original Query. But why would you want to do that? No, you can't rely on Query.toString() roundtripping (think stemming, for example - but many other examples that won't work that way too). Oops - thanks for clearing that up, Erik
Two unrelated questions
Hi all- I'm not sure if I should break this out into two separate questions to the list for searching purposes, or if one is more acceptable (don't want to flood). I have two (hopefully) straightforward questions: 1. Is it possible to expose the unique ID of a document to a DIH query? The reason I want to do this is because I use the unique ID of the row in the table as the unique ID of the Lucene document, but I've noticed that the counts of documents doesn't match the count in the table; I'd like to add these rows and was hoping to avoid writing a custom SolrJ app to do it. 2. Is there any limit to the number of conditions in a Boolean search? We're working on a new project where the user can choose either, for example, Ford Vehicles, in which case I can simply search for Ford, but if the user chooses specific makes and models, then I have to say something like Crown Vic OR Focus OR Taurus OR F-150, etc., where they could theoretically choose every model of Ford ever made except one. This could lead to a *very* large query, and was worried both that it was even possible, but also the impact on performance. Thanks, and I apologize if this really should be two separate messages. Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Different Solr versions between Master and Slave(s)
Hi all, while thinking about a migration plan of a Solr 1.4.1 master / slave architecture (1 master with N slaves already in production) to Solr 3.x I imagined to go for a graceful migration, starting with migrating only one/two slaves, making the needed tests on those while still offering the indexing and searching capabilities on top of the 1.4.1 instances. I did a small test of this migration plan but I see that the 'javabin' format used by the replication handler has changed (version 1 in 1.4.1, version 2 in 3.x) so the slaves at 3.x seem not able to replicate from the master (at 1.4.1). Is it possible to use the older 'javabin' version in order to enable replication from the master at 1.4.1 towards the slave at 3.x ? Or is there a better migration approach that sounds better for the above scenario? Thanks in advance for your help. Cheers, Tommaso
Re: Different Solr versions between Master and Slave(s)
The javabin versions are not compatible as well as the index format. I don't think it will even work. Can you not reindex the master on a 3.x version? On Monday 19 September 2011 18:17:45 Tommaso Teofili wrote: Hi all, while thinking about a migration plan of a Solr 1.4.1 master / slave architecture (1 master with N slaves already in production) to Solr 3.x I imagined to go for a graceful migration, starting with migrating only one/two slaves, making the needed tests on those while still offering the indexing and searching capabilities on top of the 1.4.1 instances. I did a small test of this migration plan but I see that the 'javabin' format used by the replication handler has changed (version 1 in 1.4.1, version 2 in 3.x) so the slaves at 3.x seem not able to replicate from the master (at 1.4.1). Is it possible to use the older 'javabin' version in order to enable replication from the master at 1.4.1 towards the slave at 3.x ? Or is there a better migration approach that sounds better for the above scenario? Thanks in advance for your help. Cheers, Tommaso -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: Example setting TieredMergePolicy for Solr 3.3 or 3.4?
Thanks Robert, Removing set from setMaxMergedSegmentMB and using maxMergedSegmentMB fixed the problem. ( Sorry about the multiple posts. Our mail server was being flaky and the client lied to me about whether the message had been sent.) I'm still confused about the mergeFactor=10 setting in the example configuration. Took a quick look at the code, but I'm obviously looking in the wrong place. Is mergeFactor=10 interpreted by TieredMergePolicy as segmentsPerTier=10 and maxMergeAtOnce=10? If I specify values for these is the mergeFactor setting ignored? Tom -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Friday, September 16, 2011 7:09 PM To: solr-user@lucene.apache.org Subject: Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4? On Fri, Sep 16, 2011 at 6:53 PM, Burton-West, Tom tburt...@umich.edu wrote: Hello, The TieredMergePolicy has become the default with Solr 3.3, but the configuration in the example uses the mergeFactor setting which applys to the LogByteSizeMergePolicy. How is the mergeFactor interpreted by the TieredMergePolicy? Is there an example somewhere showing how to configure the Solr TieredMergePolicy to set the parameters: setMaxMergeAtOnce, setSegmentsPerTier, and setMaxMergedSegmentMB? an example is here: http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/test-files/solr/conf/solrconfig-mergepolicy.xml I tried setting setMaxMergedSegmentMB in Solr 3.3 mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce20/int int name=segmentsPerTier40/int !--400GB /20=20GB or 2MB-- double name=setMaxMergedSegmentMB2/double /mergePolicy and got this error message SEVERE: java.lang.RuntimeException: no setter corrresponding to 'setMaxMergedSegmentMB' in org.apache.lucene.index.TieredMergePolicy Right, i think it should be: double name=maxMergedSegmentMB2/double -- lucidimagination.com
XPath value passed to SQL query
Hi, After little struggle figured out a way of joining xml files with database. But for some reason it is not working. After the import, only the content from xml is present in my index. Msql contents are missing. To debug, I replaced the parametrized query with a simple select statement and it worked well. As a next step, I purposefully created a syntax error in the sql and tried again. This time the import failed as expected printing the values in the log file. What I found interesting is all the values eg. brochure_id are substituted in the query by a enclosing square brackets. for example: SELECT * FROM accommodation_attribute_content where accommodation_code = '[7850]' and brochure_year = [12] and brochure_id = '[55]' I have the following in the schema.xml field indexed=true multiValued=true name=c_brochure_id omitNorms=false omitTermFreqAndPositions=false stored=true termVectors=false type=int/ 613field indexed=true multiValued=true name=c_brochure_year omitNorms=false omitTermFreqAndPositions=false stored=true termVectors=false type=int/ 614field indexed=true multiValued=true name=c_accommodation_code omitNorms=false omitTermFreqAndPositions=false stored=true termVectors=false type=int/ And my data configuration: dataconfig.xml -- ?xml version=1.0 encoding=UTF-8? dataConfig dataSource name=mysqlDS batchSize=-1 convertType=true driver=com.mysql.jdbc.Driver password=stage url=jdbc:mysql://x.x.x.x:3306/stagedb?useOldAliasMetadataBehavior=true user=dev_stage / dataSource type=FileDataSource / document entity name=f processor=FileListEntityProcessor baseDir=/root/csvs/sample/output fileName=.*xml newerThan='NOW-5DAYS' recursive=true rootEntity=false dataSource=null entity name=x processor=XPathEntityProcessor forEach=/packages/record url=${f.fileAbsolutePath} stream=true logLevel=debug field column=id xpath=/packages/record/id / field column=c_brochure_id xpath=/packages/record/brochure_id / field column=c_brochure_year xpath=/packages/record/brochure_year / field column=c_accommodation_code xpath=/packages/record/accommodation_code / entity name=accommodationAttribute query=SELECT * FROM accommodation_attribute_content where accommodation_code = '${x.c_accommodation_code}' and brochure_year = ${x.c_brochure_year} and brochure_id = '${x.c_brochure_id}' dataSource=mysqlDS /entity /entity /entity /document /dataConfig Any idea why I am getting this weird substitution ? Thanks, Srikanth -- View this message in context: http://lucene.472066.n3.nabble.com/XPath-value-passed-to-SQL-query-tp3348658p3348658.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Lucene-SOLR transition
OK. Thanks for all of the suggestions. Cheers Scott -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: Monday, September 19, 2011 3:27 AM To: solr-user@lucene.apache.org Subject: Re: Lucene-SOLR transition On Sep 18, 2011, at 19:43 , Michael Sokolov wrote: On 9/15/2011 8:30 PM, Scott Smith wrote: 2. Assuming that the answer to 1 is correct, then is there an easy way to take a lucene query (with nested Boolean queries, filter queries, etc.) and generate a SOLR query string with q and fq components? I believe that Query.toString() will probably get you back something that can be parsed in turn by the traditional lucene QueryParser, thus completing the circle and returning your original Query. But why would you want to do that? No, you can't rely on Query.toString() roundtripping (think stemming, for example - but many other examples that won't work that way too). What you can do, since you know Lucene's API well, is write a QParser(Plugin) that takes request parameters as strings and generates the Query from that like you are now with your Lucene app. Erik
JSON indexing failing...
Hello, I am running a simple test after reading: http://wiki.apache.org/solr/UpdateJSON I am only using one object from a large json file to test and see if the indexing works: curl 'http://localhost:8983/solr/update/json?commit=true' --data-binary @productSample.json -H 'Content-type:application/json' The data is from bbyopen.com, I've attached the one single object that I'm testing with. The indexing process fails with: Sep 19, 2011 2:37:54 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: invalid key: url [1701] at org.apache.solr.handler.JsonLoader.parseDoc(JsonLoader.java:355) I thought that any json attributes that did not have a mapping in the schema.xml file would simply not get indexed. (a) Is this not true? But this error made me retry after adding url to schema.xml file: field name=url type=string indexed=false stored=true/ I retried after a restart but I still keep getting the same error! (b) Can someone wise perhaps point me in the right direction for troubleshooting this issue? Thank You! - Pulkit productSample.json Description: application/json
How does Solr deal with JSON data?
Hello Everyone, I'm quite curious about how does the following data get understood and indexed by Solr? [{ id:Fubar, url: null, regularPrice: 3.99, offers: [ { url: , text: On Sale, id: OS } ] }] 1) The field id is present as part of the main object and as part of a nested offers object, so how does Solr make sense of it? 2) Is the data under offers expected to be stored as multi-value in Solr? Or am I supposed to create offerURL, offerText and offerId fields in schema.xml? Even if I do that how do I tell Solr what data to match up where? Please be kind, I know I'm not thinking about this in the right manner, just gently set me straight about all this :) - Pulkit
Re: JSON indexing failing...
Ok a little bit of deleting lines from the json file led me to realize that Solr isn't happy with the following: offers: [ { url: , text: On Sale, id: OS } ], But as to why? Or what to do to remedy this ... I have no clue :( - Pulkit On Mon, Sep 19, 2011 at 2:45 PM, Pulkit Singhal pulkitsing...@gmail.com wrote: Hello, I am running a simple test after reading: http://wiki.apache.org/solr/UpdateJSON I am only using one object from a large json file to test and see if the indexing works: curl 'http://localhost:8983/solr/update/json?commit=true' --data-binary @productSample.json -H 'Content-type:application/json' The data is from bbyopen.com, I've attached the one single object that I'm testing with. The indexing process fails with: Sep 19, 2011 2:37:54 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: invalid key: url [1701] at org.apache.solr.handler.JsonLoader.parseDoc(JsonLoader.java:355) I thought that any json attributes that did not have a mapping in the schema.xml file would simply not get indexed. (a) Is this not true? But this error made me retry after adding url to schema.xml file: field name=url type=string indexed=false stored=true/ I retried after a restart but I still keep getting the same error! (b) Can someone wise perhaps point me in the right direction for troubleshooting this issue? Thank You! - Pulkit
Re: JSON indexing failing...
So I'm not an expert in the Solr JSON update message, never used it before myself. It's documented here: http://wiki.apache.org/solr/UpdateJSON But Solr is not a structured data store like mongodb or something; you can send it an update command in JSON as a convenience, but don't let that make you think it can store arbitrarily nested structured data like mongodb or couchdb or something. Solr has a single flat list of indexes, as well as stored fields which are also a single flat list per-document. You can format your update message as JSON in Solr 3.x, but you still can't tell it to do something it's incapable of. If a field is multi-valued, according to the documentation, the json value can be an array of values. But if the JSON value is a hash... there's nothing Solr can do with this, it's not how solr works. It looks from the documentation that the value can sometimes be a hash when you're communicating other meta-data to Solr, like field boosts: my_boosted_field: {/* use a map with boost/value for a boosted field */ boost: 2.3, value: test }, But you can't just give it arbitrary JSON, you have to give it JSON of the sort it expects. Which does not include arbitrarily nested data hashes. Jonathan
How To perform SQL Like Join
Hi As we do join two or more tables in sql, can we join 2 or more indexes in solr as well. if yes than in which version. Regards Ahsan