Re: Upgrading solr from 3.3 to 3.4

2011-09-19 Thread Isan Fulia
Hi ,

Ya we need to upgrade but my question is whether  reindexing of all cores is
required
or
we can directly use already indexed data folders of solr 3.3 to solr 3.4.

Thanks,
Isan Fulia.





On 19 September 2011 11:03, Wyhw Whon w...@microgle.com wrote:

 If you are already using Apache Lucene 3.1, 3.2 or 3.3, we strongly
 recommend you upgrade to 3.4.0 because of the index corruption bug on
 OS or computer crash or power loss (LUCENE-3418), now fixed in 3.4.0.

 2011/9/19 Isan Fulia isan.fu...@germinait.com

  Hi all,
 
  Does upgrading solr from 3.3 to 3.4 requires reindexing of all the cores
 or
  we can directly copy the data folders to
  the new solr ?
 
 
  --
  Thanks  Regards,
  Isan Fulia.
 




-- 
Thanks  Regards,
Isan Fulia.


Re: Is it possible to use different types of datasource in DIH?

2011-09-19 Thread O. Klein
I did some more testing and it seems that as soon as you use FileDataSource
it overrides any other dataSource.

dataConfig
dataSource type=HttpDataSource name=url encoding=UTF-8
connectionTimeout=3 readTimeout=3/
dataSource type=FileDataSource encoding=UTF-8 /
document

entity name=xmlroot datasource=url rootEntity=false
url=http://www.server.com/rss.xml; processor=XPathEntityProcessor
forEach=/rss/channel/item 
field column=link xpath=/rss/channel/item/link/
 /entity
 
/document
/dataConfig

will not work, unless you remove FileDataSource. Anyone know a way to fix
this (except removing FileDataSource) ?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-possible-to-use-different-types-of-datasource-in-DIH-tp3344380p3348011.html
Sent from the Solr - User mailing list archive at Nabble.com.


java.io.CharConversionException While Indexing in Solr 3.4

2011-09-19 Thread Pranav Prakash
Hi List,

I tried Solr 3.4.0 today and while indexing I got the error
java.lang.RuntimeException: [was class java.io.CharConversionException]
Invalid UTF-8 middle byte 0x73 (at char #66611, byte #65289)

My earlier version was Solr 1.4 and this same document went into index
successfully. Looking around, I see issue
https://issues.apache.org/jira/browse/SOLR-2381 which seems to fix the
issue. I thought this patch is already applied to Solr 3.4.0. Is there
something I am missing?

Is there anything else I need to mention? Logs/ My document details etc.?

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com |
Google http://www.google.com/profiles/pranny


Re: Lucene-SOLR transition

2011-09-19 Thread Erik Hatcher

On Sep 18, 2011, at 19:43 , Michael Sokolov wrote:

 On 9/15/2011 8:30 PM, Scott Smith wrote:
 
 2.   Assuming that the answer to 1 is correct, then is there an easy 
 way to take a lucene query (with nested Boolean queries, filter queries, 
 etc.) and generate a SOLR query string with q and fq components?
 
 
 I believe that Query.toString() will probably get you back something that can 
 be parsed in turn by the traditional lucene QueryParser, thus completing the 
 circle and returning your original Query.  But why would you want to do that?

No, you can't rely on Query.toString() roundtripping (think stemming, for 
example - but many other examples that won't work that way too).

What you can do, since you know Lucene's API well, is write a QParser(Plugin) 
that takes request parameters as strings and generates the Query from that like 
you are now with your Lucene app.

Erik



Re: indexing data from rich documents - Tika with solr3.1

2011-09-19 Thread Erik Hatcher

On Sep 18, 2011, at 21:52 , scorpking wrote:

 Hi Erik Hatcher-4
 I tried index from your url. But i have a problem. In your case, you knew a
 files absolute path (Dir.new(/Users/erikhatcher/apache-solr-3.3.0/docs).
 So you can indexed it. In my case, i don't know a files absolute path. I
 only know http's address where have files (ex: you can see this link as
 reference: http://www.lc.unsw.edu.au/onlib/pdf/). Another ways? Thanks 

Write a little script that takes the HTTP directory listing like that, and then 
uses stream.url (rather than stream.file as my example used).

Erik



Re: Upgrading solr from 3.3 to 3.4

2011-09-19 Thread Erik Hatcher
Reindexing is not necessary.  Drop in 3.4 and go.  

For this sort of scenario, it's easy enough to try using a copy of your 
SOLR_HOME directory with an instance of the newest release of Solr.  If the 
release notes don't say a reindex is necessary, then it's not, but always a 
good idea to try it and run any tests you have handy.

Erik



On Sep 19, 2011, at 00:02 , Isan Fulia wrote:

 Hi ,
 
 Ya we need to upgrade but my question is whether  reindexing of all cores is
 required
 or
 we can directly use already indexed data folders of solr 3.3 to solr 3.4.
 
 Thanks,
 Isan Fulia.
 
 
 
 
 
 On 19 September 2011 11:03, Wyhw Whon w...@microgle.com wrote:
 
 If you are already using Apache Lucene 3.1, 3.2 or 3.3, we strongly
 recommend you upgrade to 3.4.0 because of the index corruption bug on
 OS or computer crash or power loss (LUCENE-3418), now fixed in 3.4.0.
 
 2011/9/19 Isan Fulia isan.fu...@germinait.com
 
 Hi all,
 
 Does upgrading solr from 3.3 to 3.4 requires reindexing of all the cores
 or
 we can directly copy the data folders to
 the new solr ?
 
 
 --
 Thanks  Regards,
 Isan Fulia.
 
 
 
 
 
 -- 
 Thanks  Regards,
 Isan Fulia.



Re: indexing data from rich documents - Tika with solr3.1

2011-09-19 Thread scorpking
yeah, i want to use DIH and i tried config my file dataconfig. but it is
wrong. This is my config:

*dataConfig
dataSource type=JdbcDataSource
driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
url=jdbc:sqlserver://ipAddress;databaseName=VTC_Edu user=myuser
password=mypass  name=VTCEduDocument/

dataSource type=BinURLDataSource name=dsurl/

document

entity name=VTCEduDocument pk=pk_document_id query=select 
TOP 10
pk_document_id, s_path_origin from [VTC_Edu].[dbo].[tbl_Document]  



transformer=vn.vtc.solr.transformer.ImageFilter,vn.vtc.solr.transformer.RemoveHTML,RegexTransformer,TemplateTransformer,vn.vtc.solr.transformer.vntransformer,vn.vtc.solr.correctUnicodeString.correctUnicodeString,vn.vtc.solr.unescapeHtmlString.UnescapeHtmlString,vn.vtc.solr.correctISOString.correctISOString

field column=pk_document_id name=pk_document_id / 

field column=s_path_origin 
name=s_path_origin /   
/entity

entity processor=TikaEntityProcessor dataSource=dsurl 
format=text
url=
http://media.gox.vn/edu/document/original/${VTCEduDocument.s_path_origin};
field column=Author name=author 
meta=true/
field column=title name=title meta=true/
field column=text name=text/ 
  /entity
  
/document
/dataConfig*

And here error: 
*EVERE: Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
Exception in invoking url null Processing Document # 1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at
org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:89)
at
org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:38)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:353)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:411)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392)
Caused by: java.net.MalformedURLException: no protocol: nullselect TOP 10
pk_document_id, s_path_origin from [VTC_Edu].[dbo].[tbl_Document]
at java.net.URL.init(URL.java:567)
at java.net.URL.init(URL.java:464)
at java.net.URL.init(URL.java:413)
at
org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDataSource.java:81)
... 10 more*

???
Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3348149.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: java.io.CharConversionException While Indexing in Solr 3.4

2011-09-19 Thread Pranav Prakash
Just in case, someone might be intrested here is the log

SEVERE: java.lang.RuntimeException: [was class
java.io.CharConversionException] Invalid UTF-8 middle byte 0x73 (at char
#66641, byte #65289)
 at
com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
 at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
 at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:287)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:146)
 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
 at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
 at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
 at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.CharConversionException: Invalid UTF-8 middle byte 0x73
(at char #66641, byte #65289)
 at com.ctc.wstx.io.UTF8Reader.reportInvalidOther(UTF8Reader.java:313)
at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:204)
 at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
 at
com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
 at
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628)
at
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
 at
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
 ... 26 more


Also, is there a setting so I can change the level of backtrace? This would
be helpful in showing the complete stack instead of 26 more ...

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com |
Google http://www.google.com/profiles/pranny


On Mon, Sep 19, 2011 at 14:16, Pranav Prakash pra...@gmail.com wrote:


 Hi List,

 I tried Solr 3.4.0 today and while indexing I got the error
 java.lang.RuntimeException: [was class java.io.CharConversionException]
 Invalid UTF-8 middle byte 0x73 (at char #66611, byte #65289)

 My earlier version was Solr 1.4 and this same document went into index
 successfully. Looking around, I see issue
 https://issues.apache.org/jira/browse/SOLR-2381 which seems to fix the
 issue. I thought this patch is already applied to Solr 3.4.0. Is there
 something I am missing?

 Is there anything else I need to mention? Logs/ My document details etc.?

 *Pranav Prakash*

 temet nosce

 Twitter http://twitter.com/pranavprakash | Bloghttp://blog.myblive.com |
 Google http://www.google.com/profiles/pranny



term vector parser in solr.NET

2011-09-19 Thread jame vaalet
hi,
i was wondering if there is any method to get back the term vector list from
solr through solr.NET?
from the source code for SOLR.NET i couldn't notice any term vector parser
in SOLR.NET .

-- 

-JAME


Re: Is it possible to use different types of datasource in DIH?

2011-09-19 Thread Ahmet Arslan
 I did some more testing and it seems
 that as soon as you use FileDataSource
 it overrides any other dataSource.
 
 dataConfig
 dataSource type=HttpDataSource name=url
 encoding=UTF-8
 connectionTimeout=3 readTimeout=3/
 dataSource type=FileDataSource encoding=UTF-8
 /
 document
 
 entity name=xmlroot datasource=url
 rootEntity=false
 url=http://www.server.com/rss.xml;
 processor=XPathEntityProcessor
 forEach=/rss/channel/item 
     field column=link
 xpath=/rss/channel/item/link/
  /entity
      
 /document
 /dataConfig
 
 will not work, unless you remove FileDataSource. Anyone
 know a way to fix
 this (except removing FileDataSource) ?

Did you try to give a name to FileDataSource? e.g.

dataSource type=FileDataSource encoding=UTF-8 name=fileData/


Re: Upgrading solr from 3.3 to 3.4

2011-09-19 Thread Isan Fulia
Thanks Erick.


On 19 September 2011 15:10, Erik Hatcher erik.hatc...@gmail.com wrote:

 Reindexing is not necessary.  Drop in 3.4 and go.

 For this sort of scenario, it's easy enough to try using a copy of your
 SOLR_HOME directory with an instance of the newest release of Solr.  If
 the release notes don't say a reindex is necessary, then it's not, but
 always a good idea to try it and run any tests you have handy.

Erik



 On Sep 19, 2011, at 00:02 , Isan Fulia wrote:

  Hi ,
 
  Ya we need to upgrade but my question is whether  reindexing of all cores
 is
  required
  or
  we can directly use already indexed data folders of solr 3.3 to solr 3.4.
 
  Thanks,
  Isan Fulia.
 
 
 
 
 
  On 19 September 2011 11:03, Wyhw Whon w...@microgle.com wrote:
 
  If you are already using Apache Lucene 3.1, 3.2 or 3.3, we strongly
  recommend you upgrade to 3.4.0 because of the index corruption bug on
  OS or computer crash or power loss (LUCENE-3418), now fixed in 3.4.0.
 
  2011/9/19 Isan Fulia isan.fu...@germinait.com
 
  Hi all,
 
  Does upgrading solr from 3.3 to 3.4 requires reindexing of all the
 cores
  or
  we can directly copy the data folders to
  the new solr ?
 
 
  --
  Thanks  Regards,
  Isan Fulia.
 
 
 
 
 
  --
  Thanks  Regards,
  Isan Fulia.




-- 
Thanks  Regards,
Isan Fulia.


Re: Is it possible to use different types of datasource in DIH?

2011-09-19 Thread O. Klein
Yeah, naming datasources maybe only works when they are of the same type.

I got this to work with URLdatasource and
url=file:///${crawl.fileAbsolutePath}  (2 forward slashes doesn't work)
for the local files.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-possible-to-use-different-types-of-datasource-in-DIH-tp3344380p3348257.html
Sent from the Solr - User mailing list archive at Nabble.com.


dataimport.properties still updated on error

2011-09-19 Thread Barry Harding
Hi I am currently using the DIH to connect to and import data from a MS SQL 
Server, and in general doing full, delta or deletes seems to work perfectly.

The issue is that I spotted some errors being logged in the tomcat logs for 
SOLR which are :

19-Sep-2011 07:45:25 org.apache.solr.common.SolrException log
SEVERE: Exception in entity : 
product:org.apache.solr.handler.dataimport.DataImportHandlerException: 
com.microsoft.sqlserver.jdbc.SQLServerException: Transaction (Process ID 125) 
was deadlocked on lock resources with another process and has been chosen as 
the deadlock victim. Rerun the transaction.
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:64)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:339)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$600(JdbcDataSource.java:228)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:262)
at 
org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:77)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591)
at 
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:302)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:178)
at 
org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:390)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:429)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Transaction 
(Process ID 125) was deadlocked on lock resources with another process and has 
been chosen as the deadlock victim. Rerun the transaction.
at 
com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:213)
at 
com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:4713)
at 
com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1671)
at 
com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:944)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:331)
... 11 more

19-Sep-2011 07:45:25 org.apache.solr.handler.dataimport.DocBuilder doDelta
INFO: Delta Import completed successfully
19-Sep-2011 07:45:25 org.apache.solr.handler.dataimport.DocBuilder finish


Now this SQL error I can deal with and I will probably switch to snapshot 
Isolation as these are constantly updated tables, but my issue is not the sql 
error but the fact that the delta import still reported that it had imported 
successfully and still wrote out the last updated time to the 
dataimport.properties file, so the next time it ran it missed a bunch of 
documents that should have been indexed.

If it had failed and just rolled back the changes and not updated the 
dataimport.properties file it would (assuming no more deadlocks) have caught 
all of the missed documents on the next delta import.

My connection to MS SQL is using the responseBuffering=adaptive setting to 
reduce memory overhead, So I guess what I am asking is there any way I can 
cause the DIH to roll back the import if an error occurs and to not update the 
dataimport.properties file.

Any help or suggestions would be appreciated

Thanks

Barry H
 
DISCLAIMER: This email and its attachments may be confidential and are intended 
solely for the use of the individual to whom it is addressed. Any views or 
opinions expressed are solely those of the author and do not necessarily 
represent those of Misco UK Ltd. Any unauthorised use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your 
cooperation. Misco UK Ltd, registered in Scotland Number 114143. Registered 
Office: Caledonian Exchange, 19a Canning Street, Edinburgh EH3 8EG. Telephone 
+44 (0)1933 686000. This e-mail message has been scanned by CA Gateway Security.


Re: OutOfMemoryError coming from TermVectorsReader

2011-09-19 Thread Glen Newton
Please include information about your heap size, (and other Java
command line arguments) as well a platform OS (version, swap size,
etc), Java version, underlying hardware (RAM, etc) for us to better
help you.

From the information you have given, increasing your heap size should help.

Thanks,
Glen

http://zzzoot.blogspot.com/


On Mon, Sep 19, 2011 at 1:34 AM,  anand.ni...@rbs.com wrote:
 Hi,

 I am new to solr. I an trying to index text documents of large size. On 
 searching from indexed documents I am getting following OutOfMemoryError. 
 Please help me in resolving this issue.

 The field which stores file content is configured in schema.xml as below:


 field name=Content type=text_token indexed=true stored=true 
 omitNorms=true termVectors=true termPositions=true termOffsets=true /

 and Highlighting is configured as below:


 str name=hlon/str

 str name=hl.fl${all.fields.list}/str

 str name=f.Content.hl.fragsize500/str

 str name=f.Content.hl.useFastVectorHighlightertrue/str



 2011-09-16 09:38:45.763 [http-thread-pool-9091(5)] ERROR - 
 java.lang.OutOfMemoryError: Java heap space
        at 
 org.apache.lucene.index.TermVectorsReader.readTermVector(TermVectorsReader.java:503)
        at 
 org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:263)
        at 
 org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:284)
        at 
 org.apache.lucene.index.SegmentReader.getTermFreqVector(SegmentReader.java:759)
        at 
 org.apache.lucene.index.DirectoryReader.getTermFreqVector(DirectoryReader.java:510)
        at 
 org.apache.solr.search.SolrIndexReader.getTermFreqVector(SolrIndexReader.java:234)
        at 
 org.apache.lucene.search.vectorhighlight.FieldTermStack.init(FieldTermStack.java:83)
        at 
 org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getFieldFragList(FastVectorHighlighter.java:175)
        at 
 org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBestFragments(FastVectorHighlighter.java:166)
        at 
 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastVectorHighlighter(DefaultSolrHighlighter.java:509)
        at 
 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:376)
        at 
 org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:116)
        at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
        at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256)
        at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:215)
        at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:279)
        at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
        at 
 org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:655)
        at 
 org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:595)
        at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:98)
        at 
 com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:91)
        at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:162)
        at 
 org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:326)
        at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:227)
        at 
 com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:170)
        at 
 com.sun.grizzly.http.ProcessorTask.invokeAdapter(ProcessorTask.java:822)
        at com.sun.grizzly.http.ProcessorTask.doProcess(ProcessorTask.java:719)
        at com.sun.grizzly.http.ProcessorTask.process(ProcessorTask.java:1013)

 Thanks  Regards
 Anand Nigam
 Developer


 ***
 The Royal Bank of Scotland plc. Registered in Scotland No 90312.
 Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB.
 Authorised and regulated by the Financial Services Authority. The
 Royal Bank of Scotland N.V. is authorised and regulated by the
 De Nederlandsche Bank and has its seat at Amsterdam, the
 Netherlands, and is registered in the Commercial Register under
 number 33002587. Registered Office: Gustav Mahlerlaan 350,
 Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and
 The Royal Bank of Scotland plc are authorised to act as agent for each
 other in certain jurisdictions.

 This e-mail message is confidential and 

Re: Lucene-SOLR transition

2011-09-19 Thread Michael Sokolov

On 9/19/2011 5:27 AM, Erik Hatcher wrote:

On Sep 18, 2011, at 19:43 , Michael Sokolov wrote:


On 9/15/2011 8:30 PM, Scott Smith wrote:

2.   Assuming that the answer to 1 is correct, then is there an easy way 
to take a lucene query (with nested Boolean queries, filter queries, etc.) and generate a 
SOLR query string with q and fq components?



I believe that Query.toString() will probably get you back something that can 
be parsed in turn by the traditional lucene QueryParser, thus completing the 
circle and returning your original Query.  But why would you want to do that?

No, you can't rely on Query.toString() roundtripping (think stemming, for 
example - but many other examples that won't work that way too).

Oops - thanks for clearing that up, Erik



Two unrelated questions

2011-09-19 Thread Olson, Ron
Hi all-

I'm not sure if I should break this out into two separate questions to the list 
for searching purposes, or if one is more acceptable (don't want to flood).

I have two (hopefully) straightforward questions:

1. Is it possible to expose the unique ID of a document to a DIH query? The 
reason I want to do this is because I use the unique ID of the row in the table 
as the unique ID of the Lucene document, but I've noticed that the counts of 
documents doesn't match the count in the table; I'd like to add these rows and 
was hoping to avoid writing a custom SolrJ app to do it.

2. Is there any limit to the number of conditions in a Boolean search? We're 
working on a new project where the user can choose either, for example, Ford 
Vehicles, in which case I can simply search for Ford, but if the user 
chooses specific makes and models, then I have to say something like Crown Vic 
OR Focus OR Taurus OR F-150, etc., where they could theoretically choose every 
model of Ford ever made except one. This could lead to a *very* large query, 
and was worried both that it was even possible, but also the impact on 
performance.


Thanks, and I apologize if this really should be two separate messages.

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Different Solr versions between Master and Slave(s)

2011-09-19 Thread Tommaso Teofili
Hi all,
while thinking about a migration plan of a Solr 1.4.1 master / slave
architecture (1 master with N slaves already in production) to Solr 3.x I
imagined to go for a graceful migration, starting with migrating only
one/two slaves, making the needed tests on those while still offering the
indexing and searching capabilities on top of the 1.4.1 instances.
I did a small test of this migration plan but I see that the 'javabin'
format used by the replication handler has changed (version 1 in 1.4.1,
version 2 in 3.x) so the slaves at 3.x seem not able to replicate from the
master (at 1.4.1).
Is it possible to use the older 'javabin' version in order to enable
replication from the master at 1.4.1 towards the slave at 3.x ?
Or is there a better migration approach that sounds better for the above
scenario?
Thanks in advance for your help.
Cheers,
Tommaso


Re: Different Solr versions between Master and Slave(s)

2011-09-19 Thread Markus Jelsma
The javabin versions are not compatible as well as the index format. I don't 
think it will even work.

Can you not reindex the master on a 3.x version?

On Monday 19 September 2011 18:17:45 Tommaso Teofili wrote:
 Hi all,
 while thinking about a migration plan of a Solr 1.4.1 master / slave
 architecture (1 master with N slaves already in production) to Solr 3.x I
 imagined to go for a graceful migration, starting with migrating only
 one/two slaves, making the needed tests on those while still offering the
 indexing and searching capabilities on top of the 1.4.1 instances.
 I did a small test of this migration plan but I see that the 'javabin'
 format used by the replication handler has changed (version 1 in 1.4.1,
 version 2 in 3.x) so the slaves at 3.x seem not able to replicate from the
 master (at 1.4.1).
 Is it possible to use the older 'javabin' version in order to enable
 replication from the master at 1.4.1 towards the slave at 3.x ?
 Or is there a better migration approach that sounds better for the above
 scenario?
 Thanks in advance for your help.
 Cheers,
 Tommaso

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


RE: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-19 Thread Burton-West, Tom
Thanks Robert,

Removing set from  setMaxMergedSegmentMB and using maxMergedSegmentMB 
fixed the problem.
( Sorry about the multiple posts.  Our mail server was being flaky and the 
client lied to me about whether the message had been sent.)

I'm still confused about the mergeFactor=10 setting in the example 
configuration.  Took a quick look at the code, but I'm obviously looking in the 
wrong place. Is mergeFactor=10 interpreted by TieredMergePolicy as
segmentsPerTier=10 and maxMergeAtOnce=10?   If I specify values for these is 
the mergeFactor setting ignored?

Tom



-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: Friday, September 16, 2011 7:09 PM
To: solr-user@lucene.apache.org
Subject: Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

On Fri, Sep 16, 2011 at 6:53 PM, Burton-West, Tom tburt...@umich.edu wrote:
 Hello,

 The TieredMergePolicy has become the default with Solr 3.3, but the 
 configuration in the example uses the mergeFactor setting which applys to the 
 LogByteSizeMergePolicy.

 How is the mergeFactor interpreted by the TieredMergePolicy?

 Is there an example somewhere showing how to configure the Solr 
 TieredMergePolicy to set the parameters:
 setMaxMergeAtOnce, setSegmentsPerTier, and setMaxMergedSegmentMB?

an example is here:
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/test-files/solr/conf/solrconfig-mergepolicy.xml


 I tried setting setMaxMergedSegmentMB in Solr 3.3
 mergePolicy class=org.apache.lucene.index.TieredMergePolicy
      int name=maxMergeAtOnce20/int
      int name=segmentsPerTier40/int
     !--400GB /20=20GB  or 2MB--
 double name=setMaxMergedSegmentMB2/double
    /mergePolicy


 and got this error message
 SEVERE: java.lang.RuntimeException: no setter corrresponding to 
 'setMaxMergedSegmentMB' in org.apache.lucene.index.TieredMergePolicy

Right, i think it should be:

double name=maxMergedSegmentMB2/double


-- 
lucidimagination.com


XPath value passed to SQL query

2011-09-19 Thread ntsrikanth
Hi, 

After little struggle figured out a way of joining xml files with database.
But for some reason it is not working. After the import, only the content
from xml is present in my index. Msql contents are missing. 

To debug, I replaced the parametrized query with a simple select statement
and it worked well. As a next step, I purposefully created a syntax error in
the sql and tried again. This time the import failed as expected printing
the values in the log file. 

What I found interesting is all the values eg. brochure_id are substituted
in the query by a enclosing square brackets. for example:  

SELECT * FROM accommodation_attribute_content where accommodation_code =
'[7850]' and brochure_year = [12] and brochure_id = '[55]'


I have the following in the schema.xml 
field indexed=true multiValued=true name=c_brochure_id
omitNorms=false omitTermFreqAndPositions=false stored=true
termVectors=false type=int/
613field indexed=true multiValued=true name=c_brochure_year
omitNorms=false omitTermFreqAndPositions=false stored=true
termVectors=false type=int/
614field indexed=true multiValued=true
name=c_accommodation_code omitNorms=false
omitTermFreqAndPositions=false stored=true termVectors=false
type=int/


And my data configuration: 


dataconfig.xml 
-- 
?xml version=1.0 encoding=UTF-8? 
dataConfig
 dataSource name=mysqlDS batchSize=-1 convertType=true
driver=com.mysql.jdbc.Driver password=stage
url=jdbc:mysql://x.x.x.x:3306/stagedb?useOldAliasMetadataBehavior=true
user=dev_stage /

dataSource type=FileDataSource /
document
entity name=f processor=FileListEntityProcessor
baseDir=/root/csvs/sample/output fileName=.*xml newerThan='NOW-5DAYS'
recursive=true rootEntity=false dataSource=null
entity name=x processor=XPathEntityProcessor
forEach=/packages/record url=${f.fileAbsolutePath} stream=true
logLevel=debug
   field column=id xpath=/packages/record/id /
   field column=c_brochure_id
xpath=/packages/record/brochure_id /
   field column=c_brochure_year
xpath=/packages/record/brochure_year /
   field column=c_accommodation_code
xpath=/packages/record/accommodation_code /

   entity name=accommodationAttribute query=SELECT * FROM
accommodation_attribute_content where accommodation_code =
'${x.c_accommodation_code}' and brochure_year = ${x.c_brochure_year} and
brochure_id = '${x.c_brochure_id}' dataSource=mysqlDS

/entity
/entity
/entity
/document
/dataConfig


Any idea why I am getting this weird substitution ? 

Thanks, 
Srikanth 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/XPath-value-passed-to-SQL-query-tp3348658p3348658.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Lucene-SOLR transition

2011-09-19 Thread Scott Smith
OK.  Thanks for all of the suggestions.

Cheers

Scott

-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Sent: Monday, September 19, 2011 3:27 AM
To: solr-user@lucene.apache.org
Subject: Re: Lucene-SOLR transition


On Sep 18, 2011, at 19:43 , Michael Sokolov wrote:

 On 9/15/2011 8:30 PM, Scott Smith wrote:
 
 2.   Assuming that the answer to 1 is correct, then is there an easy 
 way to take a lucene query (with nested Boolean queries, filter queries, 
 etc.) and generate a SOLR query string with q and fq components?
 
 
 I believe that Query.toString() will probably get you back something that can 
 be parsed in turn by the traditional lucene QueryParser, thus completing the 
 circle and returning your original Query.  But why would you want to do that?

No, you can't rely on Query.toString() roundtripping (think stemming, for 
example - but many other examples that won't work that way too).

What you can do, since you know Lucene's API well, is write a QParser(Plugin) 
that takes request parameters as strings and generates the Query from that like 
you are now with your Lucene app.

Erik



JSON indexing failing...

2011-09-19 Thread Pulkit Singhal
Hello,

I am running a simple test after reading:
http://wiki.apache.org/solr/UpdateJSON

I am only using one object from a large json file to test and see if
the indexing works:
curl 'http://localhost:8983/solr/update/json?commit=true'
--data-binary @productSample.json -H 'Content-type:application/json'

The data is from bbyopen.com, I've attached the one single object that
I'm testing with.

The indexing process fails with:
Sep 19, 2011 2:37:54 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: invalid key: url [1701]
at org.apache.solr.handler.JsonLoader.parseDoc(JsonLoader.java:355)

I thought that any json attributes that did not have a mapping in the
schema.xml file would simply not get indexed.
(a) Is this not true?

But this error made me retry after adding url to schema.xml file:
field name=url type=string indexed=false stored=true/
I retried after a restart but I still keep getting the same error!
(b) Can someone wise perhaps point me in the right direction for
troubleshooting this issue?

Thank You!
- Pulkit


productSample.json
Description: application/json


How does Solr deal with JSON data?

2011-09-19 Thread Pulkit Singhal
Hello Everyone,

I'm quite curious about how does the following data get understood and
indexed by Solr?
[{
id:Fubar,
url: null,
regularPrice: 3.99,
 offers: [
{
  url: ,
  text: On Sale,
  id: OS
}
 ]
}]

1) The field id is present as part of the main object and as part of
a nested offers object, so how does Solr make sense of it?
2) Is the data under offers expected to be stored as multi-value in
Solr? Or am I supposed to create offerURL, offerText and offerId
fields in schema.xml? Even if I do that how do I tell Solr what data
to match up where?

Please be kind, I know I'm not thinking about this in the right
manner, just gently set me straight about all this :)
- Pulkit


Re: JSON indexing failing...

2011-09-19 Thread Pulkit Singhal
Ok a little bit of deleting lines from the json file led me to realize
that Solr isn't happy with the following:
  offers: [
{
  url: ,
  text: On Sale,
  id: OS
}
  ],
But as to why? Or what to do to remedy this ... I have no clue :(

- Pulkit

On Mon, Sep 19, 2011 at 2:45 PM, Pulkit Singhal pulkitsing...@gmail.com wrote:
 Hello,

 I am running a simple test after reading:
 http://wiki.apache.org/solr/UpdateJSON

 I am only using one object from a large json file to test and see if
 the indexing works:
 curl 'http://localhost:8983/solr/update/json?commit=true'
 --data-binary @productSample.json -H 'Content-type:application/json'

 The data is from bbyopen.com, I've attached the one single object that
 I'm testing with.

 The indexing process fails with:
 Sep 19, 2011 2:37:54 PM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: invalid key: url [1701]
        at org.apache.solr.handler.JsonLoader.parseDoc(JsonLoader.java:355)

 I thought that any json attributes that did not have a mapping in the
 schema.xml file would simply not get indexed.
 (a) Is this not true?

 But this error made me retry after adding url to schema.xml file:
 field name=url type=string indexed=false stored=true/
 I retried after a restart but I still keep getting the same error!
 (b) Can someone wise perhaps point me in the right direction for
 troubleshooting this issue?

 Thank You!
 - Pulkit



Re: JSON indexing failing...

2011-09-19 Thread Jonathan Rochkind
So I'm not an expert in the Solr JSON update message, never used it 
before myself. It's documented here:


http://wiki.apache.org/solr/UpdateJSON

But Solr is not a structured data store like mongodb or something; you 
can send it an update command in JSON as a convenience, but don't let 
that make you think it can store arbitrarily nested structured data like 
mongodb or couchdb or something.


Solr has a single flat list of indexes, as well as stored fields which 
are also a single flat list per-document. You can format your update 
message as JSON in Solr 3.x, but you still can't tell it to do something 
it's incapable of. If a field is multi-valued, according to the 
documentation, the json value can be an array of values. But if the JSON 
value is a hash... there's nothing Solr can do with this, it's not how 
solr works.



It looks from the documentation that the value can sometimes be a hash 
when you're communicating other meta-data to Solr, like field boosts:


my_boosted_field: {/* use a map with boost/value for a 
boosted field */

  boost: 2.3,
  value: test
},

But you can't just give it arbitrary JSON, you have to give it JSON of 
the sort it expects. Which does not include arbitrarily nested data hashes.


Jonathan



How To perform SQL Like Join

2011-09-19 Thread Ahson Iqbal
Hi

As we do join two or more tables in sql, can we join 2 or more indexes in solr 
as well. if yes than in which version.

Regards
Ahsan