Re: Question about Fuzzy search in Solr

2012-09-17 Thread Rahul Warawdekar
Thanks.
Is any extra configuration from the Solr side to make this work ?
Any additional text files like synonyms.txt, any additional fields or any
changes in schema.xml or solrconfig.xml ?

On Mon, Sep 17, 2012 at 4:45 PM, Rafał Kuć r@solr.pl wrote:

 Hello!

 Is this what you are looking for

 https://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/queryparsersyntax.html#Fuzzy%20Searches
 ?

 --
 Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

  Hi,

  I need to know how we can implement fuzzy searches using Solr.
  Can someone provide any links to any relevant documentation ?




-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Question about Fuzzy search in Solr

2012-09-17 Thread Rahul Warawdekar
Got it.
Thanks Rafał !

On Mon, Sep 17, 2012 at 6:37 PM, Rafał Kuć r@solr.pl wrote:

 Hello!

 There is no need to include any changes or additional component to
 have fuzzy search working in Solr.

 --
 Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

  Thanks.
  Is any extra configuration from the Solr side to make this work ?
  Any additional text files like synonyms.txt, any additional fields or any
  changes in schema.xml or solrconfig.xml ?

  On Mon, Sep 17, 2012 at 4:45 PM, Rafał Kuć r@solr.pl wrote:

  Hello!
 
  Is this what you are looking for
 
 
 https://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/queryparsersyntax.html#Fuzzy%20Searches
  ?
 
  --
  Regards,
   Rafał Kuć
   Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
 ElasticSearch
 
   Hi,
 
   I need to know how we can implement fuzzy searches using Solr.
   Can someone provide any links to any relevant documentation ?
 
 





-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Question about Fuzzy search in Solr

2012-09-17 Thread Rahul Warawdekar
Thanks Jack.
We are using Solr 3.4.

On Mon, Sep 17, 2012 at 8:18 PM, Jack Krupansky j...@basetechnology.comwrote:

 That doc is out of date for 4.0. See the 4.0 Javadoc on FuzzyQuery for
 updated info. The tilda right operand is now an integer editing distance
 (number of times to insert char, delete char, change char, or transpose two
 adjacent chars to map index term to query term) that is limited to 2.

 Be aware that if you use fuzzy query in 3.6/3.6.1 or earlier, it will
 change when you go to 4.0.

 -- Jack Krupansky

 -Original Message- From: Rafał Kuć
 Sent: Monday, September 17, 2012 7:15 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Question about Fuzzy search in Solr


 Hello!

 Is this what you are looking for
 https://lucene.apache.org/**core/old_versioned_docs/**versions/3_0_0/**
 queryparsersyntax.html#Fuzzy%**20Searcheshttps://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/queryparsersyntax.html#Fuzzy%20Searches
 ?

 --
 Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

  Hi,


  I need to know how we can implement fuzzy searches using Solr.
 Can someone provide any links to any relevant documentation ?





-- 
Thanks and Regards
Rahul A. Warawdekar


Re: DIH XML configs for multi environment

2012-07-11 Thread Rahul Warawdekar
Hi Pranav,

If you are using Tomcat to host Solr, you can define your data source in
context.xml file under tomcat configuration.
You have to refer to this datasource with the same name in all the 3
environments from DIH data-config.xml.
This context.xml file will vary across 3 environments having different
credentials for dev, stag and prod.

eg
DIH data-config.xml will refer to the datasource as listed below
dataSource jndiName=java:comp/env/*YOUR_DATASOURCE_NAME*
type=JdbcDataSource readOnly=true /

context.xml file which is located under /TOMCAT_HOME/conf folder will
have the resource entry as follows
  Resource name=*YOUR_DATASOURCE_NAME* auth=Container
type= username=X password=X
driverClassName=
url=
maxActive=8
/

On Wed, Jul 11, 2012 at 1:31 PM, Pranav Prakash pra...@gmail.com wrote:

 The DIH XML config file has to be specified dataSource. In my case, and
 possibly with many others, the logon credentials as well as mysql server
 paths would differ based on environments (dev, stag, prod). I don't want to
 end up coming with three different DIH config files, three different
 handlers and so on.

 What is a good way to deal with this?


 *Pranav Prakash*

 temet nosce




-- 
Thanks and Regards
Rahul A. Warawdekar


Re: DIH XML configs for multi environment

2012-07-11 Thread Rahul Warawdekar
http://wiki.eclipse.org/Jetty/Howto/Configure_JNDI_Datasource
http://docs.codehaus.org/display/JETTY/DataSource+Examples


On Wed, Jul 11, 2012 at 2:30 PM, Pranav Prakash pra...@gmail.com wrote:

 That's cool. Is there something similar for Jetty as well? We use Jetty!

 *Pranav Prakash*

 temet nosce



 On Wed, Jul 11, 2012 at 1:49 PM, Rahul Warawdekar 
 rahul.warawde...@gmail.com wrote:

  Hi Pranav,
 
  If you are using Tomcat to host Solr, you can define your data source in
  context.xml file under tomcat configuration.
  You have to refer to this datasource with the same name in all the 3
  environments from DIH data-config.xml.
  This context.xml file will vary across 3 environments having different
  credentials for dev, stag and prod.
 
  eg
  DIH data-config.xml will refer to the datasource as listed below
  dataSource jndiName=java:comp/env/*YOUR_DATASOURCE_NAME*
  type=JdbcDataSource readOnly=true /
 
  context.xml file which is located under /TOMCAT_HOME/conf folder will
  have the resource entry as follows
Resource name=*YOUR_DATASOURCE_NAME* auth=Container
  type= username=X password=X
  driverClassName=
  url=
  maxActive=8
  /
 
  On Wed, Jul 11, 2012 at 1:31 PM, Pranav Prakash pra...@gmail.com
 wrote:
 
   The DIH XML config file has to be specified dataSource. In my case, and
   possibly with many others, the logon credentials as well as mysql
 server
   paths would differ based on environments (dev, stag, prod). I don't
 want
  to
   end up coming with three different DIH config files, three different
   handlers and so on.
  
   What is a good way to deal with this?
  
  
   *Pranav Prakash*
  
   temet nosce
  
 
 
 
  --
  Thanks and Regards
  Rahul A. Warawdekar
 




-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Can't index sub-entitties in DIH

2012-06-05 Thread Rahul Warawdekar
Hi,

One of the possibilities for this kind of issue to occur may be the case
sensitivity of column names in Oracle.
Can you apply a transformer and check the entity map which actually
contains the keys and their values ?
Also, please try specifying upper case field names for Oracle and try if
that works.
something like

entity name=tipodocumento query=SELECT *NOMBRE* FROM
 tipodocumento where *IDTIPODOCUMENTO* = '${documento.*TIPODOCUMENTO*}'
field column=*NOMBRE* name=nombre /
 /entity

On Tue, Jun 5, 2012 at 9:57 AM, Rafael Taboada kaliman.fore...@gmail.comwrote:

 Hi Gora,


  Your configuration files look fine. It would seem that something
  is going wrong with the SELECT in Oracle, or with the JDBC
  driver used to access Oracle. Could you try:

 * Manually doing the SELECT for the entity, and sub-entity
   to ensure that things are working.
 

 The SELECTs are working OK.



  * Check the JDBC settings.
 

 I'm using tha last version of jdbc6.jar for Oracle 11g. It seems JDBC
 setting is OK because solr brings data.



  Sorry, I do not have access to Oracle so that I cannot try this
  out myself.
 
  Also, have you checked the Solr logs for any error messages?
  Finally, I just noticed that you have extra quotes in:
  ...where usuario_idusuario = '${usuario.idusuario}'
  I doubt that is the cause of your problem, but you could try
  removing them.
 

 If I remove quotes, there is an error about this:

 SEVERE: Full Import failed:java.lang.RuntimeException:
 java.lang.RuntimeException:
 org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
 execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento =
  Processing Document # 1
 at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:264)
 at

 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375)
 at

 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445)
 at

 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426)
 Caused by: java.lang.RuntimeException:
 org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
 execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento =
  Processing Document # 1
 at

 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:621)
 at

 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327)
 at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225)
 ... 3 more
 Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
 Unable to execute query: SELECT nombre FROM tipodocumento WHERE
 idtipodocumento =  Processing Document # 1
 at

 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
 at

 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
 at

 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
 at

 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
 at

 org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
 at

 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
 at

 org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(EntityProcessorWrapper.java:330)
 at

 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:296)
 at

 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:683)
 at

 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:709)
 at

 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619)
 ... 5 more
 Caused by: java.sql.SQLSyntaxErrorException: ORA-00936: missing expression

 at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)
 at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
 at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:879)
 at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:450)
 at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
 at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)
 at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:193)
 at
 oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:873)
 at

 oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1167)
 at

 oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1289)
 at

 oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:1909)
 at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1871)
 at

 oracle.jdbc.driver.OracleStatementWrapper.execute(OracleStatementWrapper.java:318)
 at

 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:246
 My config files using Oracle are:


 db-data-config.xml
 dataConfig

Re: how to show DIH query sql in log file

2012-06-01 Thread Rahul Warawdekar
Hi,

Turn the Solr logging level to FINE for the DIH packages/classes and they
will show up in the log.
http://hostname:port/solr/core/admin/logging

On Fri, Jun 1, 2012 at 9:34 AM, wangjing ppm10...@gmail.com wrote:

 how to show DIH query's sql in log file for troubleshooting?

 thanks.




-- 
Thanks and Regards
Rahul A. Warawdekar


Re: possible status codes from solr during a (DIH) data import process

2012-05-31 Thread Rahul Warawdekar
Hi,

Thats correct.
For failure, you have to check for the text *Indexing failed. Rolled back
changes* under the lst name=statusMessages tag.
One more thing to note here is that there may be a time during the indexing
process where the indexing is complete but the index is not committed and
optimized yet.
You would need to check if the response listed below is present along with
the success message to term it as a complete success.

*str name=Committed2012-05-31 15:10:45/str
str name=Optimized2012-05-31 15:10:45/str*

On Thu, May 31, 2012 at 3:42 PM, geeky2 gee...@hotmail.com wrote:

 hello all,

 i have been asked to write a small polling script (bash) to periodically
 check the status of an import on our Master.  our import times are small,
 but there are business reasons why we want to know the status of an import
 after a specified amount of time.

 i need to perform certain actions based on the status of the import, and
 therefore need to quantify which tags to check and their appropriate
 states.

 i am using the command from the DataImportHandler HTTP API to get the
 status
 of the import:

 OUTPUT=$(curl -v
 http://${SERVER}:${PORT}/somecore/dataimport?command=status)




 can someone tell me if i have these rules correct?

 1) during an import - the status tag will have a busy state:

 example:

  str name=statusbusy/str

 2) at the completion of an import (regardless of failure or success) the
 status tag will have an idle state:

 example:

  str name=statusidle/str


 3) to determine if an import failed or succeeded - you must interrogate the
 tags under   lst name=statusMessages and specifically look for :

 success:
 str name=Indexing completed. Added/Updated: 603378 documents. Deleted 0
 documents./str

 failure:
 str name=Indexing completed. Added/Updated: 603378 documents. Deleted 0
 documents./str

 thank you,


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-tp3987110.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Not able to use the highlighting feature! Want to return snippets of text

2012-05-21 Thread Rahul Warawdekar
Hi,

Can you please provide the definitions of the following 3 objects from your
solrconfig.xml ?

str name =hl.fragListBuildersimple/str
str name =hl.fragmentsBuildercolored/str
str name=hl.fragmenterregex/str


For eg,
the simple hl.fragListBuilder should be defined as mentioned below in
your solrconfig.xml
   fragListBuilder name=simple
class=org.apache.solr.highlight.SimpleFragListBuilder default=true/


On Mon, May 21, 2012 at 2:06 PM, 12rad prama.an...@gmail.com wrote:

 The field I am trying to highlight is stored.


 field name=text type=text_en required=false  compressed=false
 omitNorms=false
indexed=true stored=true multiValued=true termVectors=true
 termPositions=true
termOffsets=true/


 In the searchHandler i've set the parameters as follows:

   str name=hlon/str
   str name=hl.fltext/str
   str name =hl.snippets5/str
   str name=hl.fragsize1000/str
   str name=hl.maxAnalyzedChars51/str
   str name=hl.requireFieldMatchtrue/str
   str name=hl.fragmenterregex/str
   str name =hl.fragListBuildersimple/str
   str name =hl.fragmentsBuildercolored/str
   str name=hl.phraseLimit1000/str
   str name=hl.usePhraseHighlightertrue/str
   str name=hl.highlightMultiTermtrue/str
   str name =hl.useFastVectorHighlighertrue/str


 I still don't see any highlighting. I've managed to get snippets of text
 but
 the actual word is not highlighted. I don't know where I am going wrong?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Not-able-to-use-the-highlighting-feature-Want-to-return-snippets-of-text-Urgent-tp3985012p3985174.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Not able to use the highlighting feature! Want to return snippets of text

2012-05-21 Thread Rahul Warawdekar
Hi,

I believe, in your colored fragmentsBuilder definition, you have not
mentioned anything in your pre and post tags and that may be the reason
that you are getting snippets of text, without highlighting.
Please refer http://wiki.apache.org/solr/HighlightingParameters and check
the hl.fragmentsBuilder section.
Try specifying the pre and post tags with information as mentioned below.
(same as wiki link above)

!-- multi-colored tag FragmentsBuilder --
fragmentsBuilder name=colored
class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder
  lst name=defaults
str name=hl.tag.pre![CDATA[
 b style=background:yellow,b style=background:lawgreen,
 b style=background:aquamarine,b style=background:magenta,
 b style=background:palegreen,b style=background:coral,
 b style=background:wheat,b style=background:khaki,
 b style=background:lime,b
style=background:deepskyblue]]/str
str name=hl.tag.post![CDATA[/b]]/str
  /lst
/fragmentsBuilder


On Mon, May 21, 2012 at 3:52 PM, 12rad prama.an...@gmail.com wrote:

 For the fragListBuilder
  it's
 fragListBuilder name=simple
   default=true
   class=solr.highlight.SimpleFragListBuilder/

 fragment builder is
 fragmentsBuilder name=colored
class=solr.highlight.ScoreOrderFragmentsBuilder
lst name=defaults
  str name=hl.tag.pre/str
  str name=hl.tag.post/str
/lst
  /fragmentsBuilder


  fragmenter name=regex
  class=solr.highlight.RegexFragmenter
lst name=defaults

  int name=hl.fragsize70/int

  float name=hl.regex.slop0.5/float

  str name=hl.regex.pattern[-\w ,/\n\quot;apos;]{20,200}/str
/lst
  /fragmenter


 Thanks!

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Not-able-to-use-the-highlighting-feature-Want-to-return-snippets-of-text-Urgent-tp3985012p3985212.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks and Regards
Rahul A. Warawdekar


Issue with DIH when database is down

2012-05-17 Thread Rahul Warawdekar
Hi,

I am using Solr 3.4 on Tomcat 6 and using DIH to index data from a MS SQL
Server 2008 database.

In case my database is down, or is refusing connections due to any reason,
DIH throws an exception as mentioned below

org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: ...

Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:1368)

But when the database is up and running and the next indexing job runs, it
gives me the same error.
I need to restart Tomcat in order to succesfully connect again to the
database.

My dataSource settings in data-config.xml are as follows
dataSource jndiName=java:comp/env/jdbc/XXX type=JdbcDataSource
readOnly=true /

Has anyone come across this issue before ?
If yes, what is the resolution ?
Am I missng anything in the dataSource attributes (autoCommit=true)  ??
-- 
Thanks and Regards
Rahul A. Warawdekar


Solr request tracking

2012-05-16 Thread Rahul Warawdekar
Hi,

Is there any mechanism by which we can track and trend the incoming Solr
search requests ?
Some mechanisms like logging all incoming Solr requests to a different log
file than Tomcat's and have a tool to trend the patterns ?


-- 
Thanks and Regards
Rahul A. Warawdekar


Re: how to limit solr indexing to specific number of rows

2012-05-03 Thread Rahul Warawdekar
Hi,

What is the error that you are getting ?
ROWNUM works fine with DIH, I have tried and tested it with Solr 3.1.

One thing that comes to my mind is the query that you are using to
implement ROWNUM.
Do you replaced the  in the query by a lt; in dataconfig.xml ?
like ROMNUM lt; =100 ?

On Thu, May 3, 2012 at 4:11 PM, srini softtec...@gmail.com wrote:

 I am doing database import using solr DIH. I would like to limit the solr
 indexing to specific number. In other words If Solr reaches indexing 100
 records I want to database import to stop importing.

 Not sure if there is any particular setting that would tell solr that I
 only
 want to import 100 rows from database and index those 100 records.

 I tried to give select query with ROMNUM=100 (using oracle) in
 data-config.xml, but it gave error. Any ideas!!!

 Thanks in Advance
 Srini

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/how-to-limit-solr-indexing-to-specific-number-of-rows-tp3960344.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks and Regards
Rahul A. Warawdekar


Re: solr replication failing with error: Master at: is not available. Index fetch failed

2012-04-25 Thread Rahul Warawdekar
Hi,

Is the replication still failing or working fine with that change ?

On Tue, Apr 24, 2012 at 2:16 PM, geeky2 gee...@hotmail.com wrote:

 that was it!

 thank you.

 i did notice something else in the logs now ...

 what is the meaning or implication of the message, Connection reset.?



 2012-04-24 12:59:19,996 INFO  [org.apache.solr.handler.SnapPuller]
 (pool-12-thread-1) Slave in sync with master.
 2012-04-24 12:59:39,998 INFO  [org.apache.solr.handler.SnapPuller]
 (pool-12-thread-1) Slave in sync with master.
 *2012-04-24 12:59:59,997 SEVERE [org.apache.solr.handler.SnapPuller]
 (pool-12-thread-1) Master at:
 http://bogus:bogusport/somepath/somecore/replication/ is not available.
 Index fetch failed. Exception: Connection reset*
 2012-04-24 13:00:19,998 INFO  [org.apache.solr.handler.SnapPuller]
 (pool-12-thread-1) Slave in sync with master.
 2012-04-24 13:00:40,004 INFO  [org.apache.solr.handler.SnapPuller]
 (pool-12-thread-1) Slave in sync with master.
 2012-04-24 13:00:59,992 INFO  [org.apache.solr.handler.SnapPuller]
 (pool-12-thread-1) Slave in sync with master.
 2012-04-24 13:01:19,993 INFO  [org.apache.solr.handler.SnapPuller]
 (pool-12-thread-1) Slave in sync with master.
 2012-04-24 13:01:39,992 INFO  [org.apache.solr.handler.SnapPuller]
 (pool-12-thread-1) Slave in sync with master.
 2012-04-24 13:01:59,989 INFO  [org.apache.solr.handler.SnapPuller]
 (pool-12-thread-1) Slave in sync with master.
 2012-04-24 13:02:19,990 INFO  [org.apache.solr.handler.SnapPuller]
 (pool-12-thread-1) Slave in sync with master.
 2012-04-24 13:02:39,989 INFO  [org.apache.solr.handler.SnapPuller]
 (pool-12-thread-1) Slave in sync with master.
 2012-04-24 13:02:59,991 INFO  [org.a

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3936107.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks and Regards
Rahul A. Warawdekar


Re: solr replication failing with error: Master at: is not available. Index fetch failed

2012-04-24 Thread Rahul Warawdekar
Hi,

In Solr wiki, for replication, the master url is defined as follows
str name=masterUrlhttp://master_host:port
/solr/corename/replication/str

This url does not contain admin in its path where as in the master url
provided by you, you have an additional admin in the url.
Not very sure if this might be an issue but you can just check removing
admin and check if replication works.


On Tue, Apr 24, 2012 at 11:49 AM, geeky2 gee...@hotmail.com wrote:

 hello,

 thank you for the reply,

 yes - master has been indexed.

 ok - makes sense - the polling interval needs to change

 i did check the solr war file on both boxes (master and slave).  they are
 identical.  actually - if they were not indentical - this would point to a
 different issue altogether - since our deployment infrastructure - rolls
 the
 war file to the slaves when you do a deployment on the master.

 this has me stumped - not sure what to check next.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3935699.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Solr with UIMA

2012-04-19 Thread Rahul Warawdekar
Hi Divakar,

Try making your updateRequestProcessorChain as default. Simply add
default=true as follows and check if that works.

updateRequestProcessorChain name=uima *default=true*


On Thu, Apr 19, 2012 at 12:01 PM, dsy99 ds...@rediffmail.com wrote:

 Hi Chris,
 Are you been able to get success to integrate the UIMA in SOLR.

 I too  tried to integrate Uima in Solr by following the instructions
 provided in README i.e. the following four steps:

 Step1. I set lib/ tags in solrconfig.xml appropriately to point the jar
 files.

   lib dir=../../contrib/uima/lib /
lib dir=../../dist/ regex=apache-solr-uima-\d.*\.jar /

 Step2. modified my schema.xml adding the fields I wanted to  hold
 metadata
 specifying proper values for type, indexed, stored and multiValued options
 as follows:

field name=language type=string indexed=true stored=true
 required=false/
  field name=concept type=string indexed=true stored=true
 multiValued=true required=false/
   field name=sentence type=text indexed=true stored=true
 multiValued=true required=false /

 Step3. modified my solrconfig.xml adding the following snippet:

  updateRequestProcessorChain name=uima
processor
 class=org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory
  lst name=uimaConfig
lst name=runtimeParameters
   str name=keyword_apikeyVALID_ALCHEMYAPI_KEY/str
  str name=concept_apikeyVALID_ALCHEMYAPI_KEY/str
  str name=lang_apikeyVALID_ALCHEMYAPI_KEY/str
  str name=cat_apikeyVALID_ALCHEMYAPI_KEY/str
  str name=entities_apikeyVALID_ALCHEMYAPI_KEY/str
  str name=oc_licenseIDVALID_OPENCALAIS_KEY/str
/lst
str

 name=analysisEngine/org/apache/uima/desc/OverridingParamsExtServicesAE.xml/str

bool name=ignoreErrorstrue/bool

 lst name=analyzeFields
  bool name=mergefalse/bool
  arr name=fields
 strtext/str
   /arr
/lst
lst name=fieldMappings
  lst name=type
str
 name=nameorg.apache.uima.alchemy.ts.concept.ConceptFS/str
lst name=mapping
  str name=featuretext/str
  str name=fieldconcept/str
/lst
  /lst
  lst name=type
str
 name=nameorg.apache.uima.alchemy.ts.language.LanguageFS/str
lst name=mapping
  str name=featurelanguage/str
  str name=fieldlanguage/str
/lst
  /lst
  lst name=type
str name=nameorg.apache.uima.SentenceAnnotation/str
lst name=mapping
  str name=featurecoveredText/str
  str name=fieldsentence/str
 /lst
  /lst
/lst
  /lst
/processor
processor class=solr.LogUpdateProcessorFactory /
processor class=solr.RunUpdateProcessorFactory /
  /updateRequestProcessorChain

 Step 4: and finally created a new UpdateRequestHandler with the following:
   requestHandler name=/update class=solr.XmlUpdateRequestHandler
lst name=defaults
  str name=update.processoruima/str
/lst


 Further I  indexed a word file called text.docx using the following
 command:

 curl
 
 http://localhost:8983/solr/update/extract?literal.id=doc1uprefix=attr_fmap.content=attr_contentcommit=true
 
 -F myfile=@UIMA_sample_test.docx

 When I searched the file I am not able to see the additional UIMA fields.

 Can you please help if you been able to solve the problem.


 With Regds  Thanks
 Divakar

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3923443.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks and Regards
Rahul A. Warawdekar


Re: DataImportHandler w/ multivalued fields

2011-12-01 Thread Rahul Warawdekar
Hi Briggs,

By saying multivalued fields are not getting indexed prperly, do you mean
to say that you are not able to search on those fields ?
Have you tried actually searching your Solr index for those multivalued
terms and make sure if it returns the search results ?

One possibility could be that the multivalued fields are getting indexed
correctly and are searchable.
However, since your schema.xml has a raw_tag field whose stored
attribute is set to false, you may not be able to see those fields.



On Thu, Dec 1, 2011 at 1:43 PM, Briggs Thompson w.briggs.thomp...@gmail.com
 wrote:

 In addition, I tried a query like below and changed the column definition
 to
field column=raw_tag name=raw_tag splitBy=, /
 and still no luck. It is indexing the full content now but not multivalued.
 It seems like the splitBy ins't working properly.

select group_concat(freetags.raw_tag separator ', ') as raw_tag, site.*
 from site
 left outer join
  (freetags inner join freetagged_objects)
 on (freetags.id = freetagged_objects.tag_id
   and site.siteId = freetagged_objects.object_id)
 group  by site.siteId

 Am I doing something wrong?
 Thanks,
 Briggs Thompson

 On Thu, Dec 1, 2011 at 11:46 AM, Briggs Thompson 
 w.briggs.thomp...@gmail.com wrote:

  Hello Solr Community!
 
  I am implementing a data connection to Solr through the Data Import
  Handler and non-multivalued fields are working correctly, but multivalued
  fields are not getting indexed properly.
 
  I am new to DataImportHandler, but from what I could find, the entity is
  the way to go for multivalued field. The weird thing is that data is
 being
  indexed for one row, meaning first raw_tag gets populated.
 
 
  Anyone have any ideas?
  Thanks,
  Briggs
 
  This is the relevant part of the schema:
 
 field name =raw_tag type=text_en_lessAggressive indexed=true
  stored=false multivalued=true/
 field name =raw_tag_string type=string indexed=false
  stored=true multivalued=true/
 copyField source=raw_tag dest=raw_tag_string/
 
  And the relevant part of data-import.xml:
 
  document name=merchant
  entity name=site
query=select * from site 
  field column=siteId name=siteId /
  field column=domain name=domain /
  field column=aliasFor name=aliasFor /
  field column=title name=title /
  field column=description name=description /
  field column=requests name=requests /
  field column=requiresModeration name=requiresModeration
 /
  field column=blocked name=blocked /
  field column=affiliateLink name=affiliateLink /
  field column=affiliateTracker name=affiliateTracker /
  field column=affiliateNetwork name=affiliateNetwork /
  field column=cjMerchantId name=cjMerchantId /
  field column=thumbNail name=thumbNail /
  field column=updateRankings name=updateRankings /
  field column=couponCount name=couponCount /
  field column=category name=category /
  field column=adult name=adult /
  field column=rank name=rank /
  field column=redirectsTo name=redirectsTo /
  field column=wwwRequired name=wwwRequired /
  field column=avgSavings name=avgSavings /
  field column=products name=products /
  field column=nameChecked name=nameChecked /
  field column=tempFlag name=tempFlag /
  field column=created name=created /
  field column=enableSplitTesting name=enableSplitTesting
 /
  field column=affiliateLinklock name=affiliateLinklock /
  field column=hasMobileSite name=hasMobileSite /
  field column=blockSite name=blockSite /
  entity name=merchant_tags pk=siteId
  query=select raw_tag, freetags.id,
  freetagged_objects.object_id as siteId
 from freetags
 inner join freetagged_objects
 on freetags.id=freetagged_objects.tag_id
  where freetagged_objects.object_id='${site.siteId}'
  field column=raw_tag name=raw_tag/
   /entity
  /entity
  /document
 




-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Architecture and Capacity planning for large Solr index

2011-11-21 Thread Rahul Warawdekar
Thanks !

My business requirements have changed a bit.
We need one year rolling data in Production.
The index size for the same comes to approximately 200 - 220 GB.
I am planning to address this using Solr distributed search as follows.

1. Whole index to be split up between 3 shards, with 3 masters and 6 slaves
(load balanced)
2. Master configuration
 will be 4 CPU


On Tue, Oct 11, 2011 at 2:05 PM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:

 Hi Rahul,

 This is unfortunately not enough information for anyone to give you very
 precise answers, so I'll just give some rough ones:

 * best disk - SSD :)
 * CPU - multicore, depends on query complexity, concurrency, etc.
 * sharded search and failover - start with SolrCloud, there are a couple
 of pages about it on the Wiki and
 http://blog.sematext.com/2011/09/14/solr-digest-spring-summer-2011-part-2-solr-cloud-and-near-real-time-search/

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/


 
 From: Rahul Warawdekar rahul.warawde...@gmail.com
 To: solr-user solr-user@lucene.apache.org
 Sent: Tuesday, October 11, 2011 11:47 AM
 Subject: Architecture and Capacity planning for large Solr index
 
 Hi All,
 
 I am working on a Solr search based project, and would highly appreciate
 help/suggestions from you all regarding Solr architecture and capacity
 planning.
 Details of the project are as follows
 
 1. There are 2 databases from which, data needs to be indexed and made
 searchable,
 - Production
 - Archive
 2. Production database will retain 6 months old data and archive data
 every
 month.
 3. Archive database will retain 3 years old data.
 4. Database is SQL Server 2008 and Solr version is 3.1
 
 Data to be indexed contains a huge volume of attachments (PDF, Word, excel
 etc..), approximately 200 GB per month.
 We are planning to do a full index every month (multithreaded) and
 incremental indexing on a daily basis.
 The Solr index size is coming to approximately 25 GB per month.
 
 If we were to use distributed search, what would be the best configuration
 for Production as well as Archive indexes ?
 What would be the best CPU/RAM/Disk configuration ?
 How can I implement failover mechanism for sharded searches ?
 
 Please let me know in case I need to share more information.
 
 
 --
 Thanks and Regards
 Rahul A. Warawdekar
 
 
 




-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Architecture and Capacity planning for large Solr index

2011-11-21 Thread Rahul Warawdekar
Thanks Otis !
Please ignore my earlier email which does not have all the information.

My business requirements have changed a bit.
We now need one year rolling data in Production, with the following details
- Number of records - 1.2 million
- Solr index size for these records comes to approximately 200 - 220
GB. (includes large attachments)
- Approx 250 users who will be searching the applicaiton with a peak of
1 search request every 40 seconds.

I am planning to address this using Solr distributed search on a VMWare
virtualized environment as follows.

1. Whole index to be split up between 3 shards, with 3 masters and 6 slaves
(load balanced)

2. Master configuration for each server is as follows
- 4 CPUs
- 16 GB RAM
- 300 GB disk space

3. Slave configuration for each server is as follows
- 4 CPUs
- 16 GB RAM
- 150 GB disk space

4. I am planning to use SAN instead of local storage to store Solr index.

And my questions are as follows:
Will 3 shards serve the purpose here ?
Is SAN a a good option for storing solr index, given the high index volume ?




On Mon, Nov 21, 2011 at 3:05 PM, Rahul Warawdekar 
rahul.warawde...@gmail.com wrote:

 Thanks !

 My business requirements have changed a bit.
 We need one year rolling data in Production.
 The index size for the same comes to approximately 200 - 220 GB.
 I am planning to address this using Solr distributed search as follows.

 1. Whole index to be split up between 3 shards, with 3 masters and 6
 slaves (load balanced)
 2. Master configuration
  will be 4 CPU



 On Tue, Oct 11, 2011 at 2:05 PM, Otis Gospodnetic 
 otis_gospodne...@yahoo.com wrote:

 Hi Rahul,

 This is unfortunately not enough information for anyone to give you very
 precise answers, so I'll just give some rough ones:

 * best disk - SSD :)
 * CPU - multicore, depends on query complexity, concurrency, etc.
 * sharded search and failover - start with SolrCloud, there are a couple
 of pages about it on the Wiki and
 http://blog.sematext.com/2011/09/14/solr-digest-spring-summer-2011-part-2-solr-cloud-and-near-real-time-search/

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/


 
 From: Rahul Warawdekar rahul.warawde...@gmail.com
 To: solr-user solr-user@lucene.apache.org
 Sent: Tuesday, October 11, 2011 11:47 AM
 Subject: Architecture and Capacity planning for large Solr index
 
 Hi All,
 
 I am working on a Solr search based project, and would highly appreciate
 help/suggestions from you all regarding Solr architecture and capacity
 planning.
 Details of the project are as follows
 
 1. There are 2 databases from which, data needs to be indexed and made
 searchable,
 - Production
 - Archive
 2. Production database will retain 6 months old data and archive data
 every
 month.
 3. Archive database will retain 3 years old data.
 4. Database is SQL Server 2008 and Solr version is 3.1
 
 Data to be indexed contains a huge volume of attachments (PDF, Word,
 excel
 etc..), approximately 200 GB per month.
 We are planning to do a full index every month (multithreaded) and
 incremental indexing on a daily basis.
 The Solr index size is coming to approximately 25 GB per month.
 
 If we were to use distributed search, what would be the best
 configuration
 for Production as well as Archive indexes ?
 What would be the best CPU/RAM/Disk configuration ?
 How can I implement failover mechanism for sharded searches ?
 
 Please let me know in case I need to share more information.
 
 
 --
 Thanks and Regards
 Rahul A. Warawdekar
 
 
 




 --
 Thanks and Regards
 Rahul A. Warawdekar




-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Ordered proximity search

2011-11-04 Thread Rahul Warawdekar
Hi Thomas,

Do you always need the ordered proximity search by default ?
You may want to check SpanNearQuery at 
http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/;.

We are using edismax query parser provided by Solr.
I had a similar type of requirement in our project in here is how we
addressed it

1. Wrote a customized query parser similar to edismax.
2. Identified the method in the code which takes care of PhraseQuery and
replaced it with a snippet of SpanNearQuery code.

Please check more on SpanNearQuery if that works for you.



On Thu, Nov 3, 2011 at 2:11 PM, LT.thomas t.latu...@itspree.pl wrote:

 Hi,

 By ordered I mean term1 will always come before term2 in the document.

 I have two documents:
 1. By ordered I mean term1 will always come before term2 in the document
 2. By ordered I mean term2 will always come before term1 in the document

 if I make the query:

 term1 term2~Integer.MAX_VALUE

 my results is: 2 documents

 How can I query to have one result (only if term1 come before term2):
 By ordered I mean term1 will always come before term2 in the document

 Thanks

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Ordered-proximity-search-tp3477946p3477946.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks and Regards
Rahul A. Warawdekar


Issue with Shard configuration in solrconfig.xml (Solr 3.1)

2011-10-20 Thread Rahul Warawdekar
Hi,

I am trying to evaluate distributed search for my project by splitting up
our single index on 2 shards with Solr 3.1
When I query the first solr server by passing the shards parameter, I get
correct search results from both shards.
(
http://server1:8080/solr/test/select/?shards=server1:8080/solr/test,server2:8080/solr/testq=solrstart=0rows=20
)

I want to avoid the use of this shards parameter in the http url and specify
it in solrconfig.xml as follows.

requestHandler name=my_custom_handler class=solr.SearchHandler
default=true
str name=shardsserver1:8080/solr/test,server2:8080/solr/test/str
..
/requestHandler

After adding the shards parameter in solrconfig.xml, I get search results
only from the first shard and not from the from the second one.
Am I missing any configuration ?

Also, can the urls with the shard parameter be load balanced for a failover
mechanism ?



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Trouble configuring multicore / accessing admin page

2011-09-28 Thread Rahul Warawdekar
Hi Joshua,

Can you try updating your solr.xml as follows:
Specify
core name=core0 instanceDir=/core0 / instead of
core name=core0 instanceDir=cores/core0 /

Basically remove the extra text cores in the core element from the
instanceDir attribute.

Just try and let us know if it works.

On Wed, Sep 28, 2011 at 3:40 PM, Joshua Miller jos...@itsecureadmin.comwrote:

 Hello,

 I am trying to get SOLR working with multiple cores and have a problem
 accessing the admin page once I configure multiple cores.

 Problem:
 When accessing the admin page via http://solrhost:8080/solr/admin, I get a
 404, missing core name in path.

 Question:  when using the multicore option, is the standard admin page
 still available?

 Environment:
 - solr 1.4.1
 - Windows server 2008 R2
 - Java SE 1.6u27
 - Tomcat 6.0.33
 - Solr Experience:  none

 I have set -Dsolr.solr.home=c:\solr and within that I have a solr.xml with
 the following contents:

 solr persistent=true sharedLib=lib
  cores adminPath=/admij/cores
core name=core0 instanceDir=cores/core0 /
core name=core1 instanceDir=cores/core1 /
  /cores
 /solr

 I have copied the example/solr directory to c:\solr and have populated that
 directory with the cores/{core{0,1}} as well as the proper configs and data
 directories within.

 When I restart tomcat, it shows a couple of exceptions related to
 queryElevationComponent and null pointers that I think are due to the DB not
 yet being available but I see that the cores appear to initialize properly
 other than that

 So the problem I'm looking to solve/clarify here is the admin page - should
 that remain available and usable when using the multicore configuration or
 am I doing something wrong?  Do I need to use the CoreAdminHandler type
 requests to manage multicore instead?

 Thanks,
 --
 Josh Miller
 Open Source Solutions Architect
 (425) 737-2590
 http://itsecureadmin.com/




-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Solr stopword problem in Query

2011-09-27 Thread Rahul Warawdekar
Hi Isan,

The schema.xml seems OK to me.

Is textForQuery the only field you are searching in ?
Are you also searching on any other non text based fields ? If yes, please
provide schema description for those fields also.
Also, provide your solrconfig.xml file.


On Tue, Sep 27, 2011 at 1:12 AM, Isan Fulia isan.fu...@germinait.comwrote:

 Hi Rahul,

 I also tried searching Coke Studio MTV but no documents were returned.

 Here is the snippet of my schema file.

  fieldType name=text class=solr.TextField
 positionIncrementGap=100 autoGeneratePhraseQueries=true

  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/

filter class=solr.StopFilterFactory
ignoreCase=true

words=stopwords_en.txt
enablePositionIncrements=true

/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/

filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/

filter class=solr.PorterStemFilterFactory/
  /analyzer

  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/

filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/

filter class=solr.StopFilterFactory
ignoreCase=true

words=stopwords_en.txt
enablePositionIncrements=true

/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/

filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/

filter class=solr.PorterStemFilterFactory/
  /analyzer

/fieldType


 *field name=content type=text indexed=false stored=true
 multiValued=false/
 field name=title type=text indexed=false stored=true
 multiValued=false/

 **field name=textForQuery type=text indexed=true stored=false
 multiValued=true omitTermFreqAndPositions=true/**

 copyField source=content dest=textForQuery/
 copyField source=title dest=textForQuery/*


 Thanks,
 Isan Fulia.


 On 26 September 2011 21:19, Rahul Warawdekar rahul.warawde...@gmail.com
 wrote:

  Hi Isan,
 
  Does your search return any documents when you remove the 'at' keyword
 and
  just search for Coke studio MTV ?
  Also, can you please provide the snippet of schema.xml file where you
 have
  mentioned this field name and its type description ?
 
  On Mon, Sep 26, 2011 at 6:09 AM, Isan Fulia isan.fu...@germinait.com
  wrote:
 
   Hi all,
  
   I have a text field named* textForQuery* .
   Following content has been indexed into solr in field textForQuery
   *Coke Studio at MTV*
  
   when i fired the query as
   *textForQuery:(coke studio at mtv)* the results showed 0 documents
  
   After runing the same query in debugMode i got the following results
  
   result name=response numFound=0 start=0/
   lst name=debug
   str name=rawquerystringtextForQuery:(coke studio at mtv)/str
   str name=querystringtextForQuery:(coke studio at mtv)/str
   str name=parsedqueryPhraseQuery(textForQuery:coke studio ?
  mtv)/str
   str name=parsedquery_toStringtextForQuery:coke studio *?
 *mtv/str
  
   Why the query did not matched any document even when there is a
 document
   with value of textForQuery as *Coke Studio at MTV*?
   Is this because of the stopword *at* present in stopwordList?
  
  
  
   --
   Thanks  Regards,
   Isan Fulia.
  
 
 
 
  --
  Thanks and Regards
  Rahul A. Warawdekar
 



 --
 Thanks  Regards,
 Isan Fulia.




-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Solr stopword problem in Query

2011-09-26 Thread Rahul Warawdekar
Hi Isan,

Does your search return any documents when you remove the 'at' keyword and
just search for Coke studio MTV ?
Also, can you please provide the snippet of schema.xml file where you have
mentioned this field name and its type description ?

On Mon, Sep 26, 2011 at 6:09 AM, Isan Fulia isan.fu...@germinait.comwrote:

 Hi all,

 I have a text field named* textForQuery* .
 Following content has been indexed into solr in field textForQuery
 *Coke Studio at MTV*

 when i fired the query as
 *textForQuery:(coke studio at mtv)* the results showed 0 documents

 After runing the same query in debugMode i got the following results

 result name=response numFound=0 start=0/
 lst name=debug
 str name=rawquerystringtextForQuery:(coke studio at mtv)/str
 str name=querystringtextForQuery:(coke studio at mtv)/str
 str name=parsedqueryPhraseQuery(textForQuery:coke studio ? mtv)/str
 str name=parsedquery_toStringtextForQuery:coke studio *? *mtv/str

 Why the query did not matched any document even when there is a document
 with value of textForQuery as *Coke Studio at MTV*?
 Is this because of the stopword *at* present in stopwordList?



 --
 Thanks  Regards,
 Isan Fulia.




-- 
Thanks and Regards
Rahul A. Warawdekar


Re: JdbcDataSource and threads

2011-09-23 Thread Rahul Warawdekar
I am using Solr 3.1.
But you can surely try the patch with 3.3.

On Fri, Sep 23, 2011 at 1:35 PM, Vazquez, Maria (STM) 
maria.vazq...@dexone.com wrote:

 Thanks Rahul.
 Are you using 3.3 or 3.4? I'm on 3.3 right now
 I will try the patch today
 Thanks again,
 Maria


 -Original Message-
 From: Rahul Warawdekar [mailto:rahul.warawde...@gmail.com]
 Sent: Thursday, September 22, 2011 12:46 PM
 To: solr-user@lucene.apache.org
 Subject: Re: JdbcDataSource and threads

 Hi,

 Have you applied the patch that is provided with the Jira you mentioned
 ?
 https://issues.apache.org/jira/browse/SOLR-2233

 Please apply the patch and check if you are getting the same exceptions.
 It has worked well for me till now.

 On Thu, Sep 22, 2011 at 3:17 PM, Vazquez, Maria (STM) 
 maria.vazq...@dexone.com wrote:

  Hi!
 
  So as of 3.4 JdbcDataSource doesn't work with threads, correct?
 
 
 
  https://issues.apache.org/jira/browse/SOLR-2233
 
 
 
  I'm using Microsoft SQL Server, my data-config.xml has a lot of very
  complex SQL queries and it takes a long time to index.
 
  I'm migrating from Lucene to Solr and the Lucene code uses threads so
 it
  takes little time to index, now in Solr if I add threads=xx to my
  rootEntity I get lots of errors about connections being closed.
 
 
 
  Thanks a lot,
 
  Maria
 
 


 --
 Thanks and Regards
 Rahul A. Warawdekar




-- 
Thanks and Regards
Rahul A. Warawdekar


Re: How to get the fields that match the request?

2011-09-22 Thread Rahul Warawdekar
Hi,

Before considering highlighting to address this requirement, you also need
to consider the performance implications of highlighting for large text
fields.

On Thu, Sep 22, 2011 at 11:42 AM, Nicolas Martin nmar...@doyousoft.comwrote:

 yes, highlights can help to do that, but if you wants to paginate your
 results, you can't use hl.

 It'd be great to have a scoring average by fields...





 On 22/09/2011 17:37, Tanner Postert wrote:

 this would be useful to me as well.

 even when searching with q=test, I know it defaults to the default search
 field, but it would helpful to know what field(s) match the query term.

 On Thu, Sep 22, 2011 at 3:29 AM, Nicolas Martinnmar...@doyousoft.com**
 wrote:



 Hi everyBody,

 I need your help to get more information in my solR query's response.

 i've got a simple input text which allows me to query several fields in
 the
 same query.

 So my query  looks like this
 q=email:martyn+OR+name:martynn+OR+commercial:martyn ...

 Is it possible in the response to know the fields where martynn has
 been
 found ?

 Thanks a Lot :-)









-- 
Thanks and Regards
Rahul A. Warawdekar


Re: JdbcDataSource and threads

2011-09-22 Thread Rahul Warawdekar
Hi,

Have you applied the patch that is provided with the Jira you mentioned ?
https://issues.apache.org/jira/browse/SOLR-2233

Please apply the patch and check if you are getting the same exceptions.
It has worked well for me till now.

On Thu, Sep 22, 2011 at 3:17 PM, Vazquez, Maria (STM) 
maria.vazq...@dexone.com wrote:

 Hi!

 So as of 3.4 JdbcDataSource doesn't work with threads, correct?



 https://issues.apache.org/jira/browse/SOLR-2233



 I'm using Microsoft SQL Server, my data-config.xml has a lot of very
 complex SQL queries and it takes a long time to index.

 I'm migrating from Lucene to Solr and the Lucene code uses threads so it
 takes little time to index, now in Solr if I add threads=xx to my
 rootEntity I get lots of errors about connections being closed.



 Thanks a lot,

 Maria




-- 
Thanks and Regards
Rahul A. Warawdekar


Re: DIH delta last_index_time

2011-09-14 Thread Rahul Warawdekar
Hi Maria/Gora,

I see this as more of a problem with the timezones in which the Solr server
and the database server are located.
Is this true ?
If yes, one more possibility of handling this scenario would be to customize
DataImportHandler code as follows

1. Add one more configuration property named dbTimeZone at the entity
level in data-config.xml file
2. While saving the lastIndexTime in the properties file, save it according
to the timezone specified in the config so that it is in sync with the
database
server time.

Basically customize the code so that all the time related updates to the
dataimport.properties file should be timezone specific.


On Wed, Sep 14, 2011 at 4:31 AM, Gora Mohanty g...@mimirtech.com wrote:

 On Wed, Sep 14, 2011 at 11:23 AM, Maria Vazquez
 maria.vazq...@dexone.com wrote:
  Hi,
  How do you handle the situation where the time on the server running Solr
  doesn¹t match the time in the database?

 Firstly, why is that the case? NTP is pretty universal
 these days.

  I¹m using the last_index_time saved by Solr in the delta query checking
 it
  against lastModifiedDate field in the database but the times are not in
 sync
  so I might lose some changes.
  Can we use something else other than last_index_time? Maybe something
 like
  last_pk or something.

 One possible way is to edit dataimport.properties, manually or through
 a script, to put the last_index_time back to a safe value.

 Regards,
 Gora




-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Index not getting refreshed

2011-09-14 Thread Rahul Warawdekar
Hi Pawan,

Can you please share more details on the indexing mechanism ? (DIH,  SolrJ
or any other)
Please let us know the configuration details.


On Wed, Sep 14, 2011 at 12:48 PM, Pawan Darira pawan.dar...@gmail.comwrote:

 Hi

 I am using Solr 3.2 on a live website. i get live user's data of about 2000
 per day. I do an incremental index every 8 hours. but my search results
 always show the same result with same sorting order. when i check the same
 search from corresponding db, it gives me different results always (as new
 data regularly gets added)

 please suggest what might be the issue. is there any cache related problem
 at SOLR level

 thanks
 pawan




-- 
Thanks and Regards
Rahul A. Warawdekar


Re: FastVectorHighlighter with wildcard queries

2011-09-12 Thread Rahul Warawdekar
Hi Koji,

Thanks for the information !
I will try the patches provided by you.

On 9/8/11, Koji Sekiguchi k...@r.email.ne.jp wrote:
 (11/09/09 6:16), Rahul Warawdekar wrote:
 Hi,

 I am currently evaluating the FastVectorHighlighter in a Solr search based
 project and have a couple of questions

 1. Is there any specific reason why the FastVectorHighlighter does not
 provide support for multiterm(wildcard) queries ?
 2. What are the other constraints when using FastVectorHighlighter ?


 FVH used to have typical constrains:

 1. supports only TermQuery and PhraseQuery (and
 BooleanQuery/DisjunctionMaxQuery that
 include TQ and PQ)
 2. ignores word boundary

 But now for 1, FVH will support other queries:

 https://issues.apache.org/jira/browse/LUCENE-1889

 I believe it is almost closed to be fixed. For 2, FVH in the latest
 trunk/3x, pays
 regard to word or sentence boundary through BoundaryScanner:

 https://issues.apache.org/jira/browse/LUCENE-1824

 koji
 --
 Check out Query Log Visualizer for Apache Solr
 http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
 http://www.rondhuit.com/en/



-- 
Thanks and Regards
Rahul A. Warawdekar


Solr: Return field names that contain search term

2011-09-12 Thread Rahul Warawdekar
Hi,

I have a a query on Solr search as follows.

I am indexing an entity which includes a multivalued field using DIH.
This multivalued field contains content from multiple attachments for
a single entity.

Now, for eg. if i search for the term solr, will I be able to know
which field contains this search term ?
And if it is a multivaued field, which field number in that
multivalued field contains the search term ?

Currently, to achieve this, I am using a workaround using the
highlighting feature.
I am indexing all the multiple attachments within a single entity and
document as dynamic fields attachment_id_i.

While searching, I am highlighting on these dynamic fields (hl.fl=*_i)
and from the highlighitng section in the results, I am able to get the
attachment number which contains the search term.
But since this approach involves highlighting large attachments, the
search response times are very slow.

Would highly appreciate if someone can suggest other efficient ways to
address this kind of a requirement.

-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Solr: Return field names that contain search term

2011-09-12 Thread Rahul Warawdekar
Thanks Chris !

Will try out the second approach you suggested and share my findings.

On Mon, Sep 12, 2011 at 5:03 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 :  Would highly appreciate if someone can suggest other efficient ways to
 :  address this kind of a requirement.

 one approach would be to index each attachment as it's own document and
 search those.  you could then use things like the group collapsing
 features to return onlly the main type documents when multiple
 attachments match.

 similarly: you could still index each main document with a giant
 text field containing all of the attachment text, *and* you could indx
 each attachment as it's own document.  You would search on the main docs
 as you do now, but then your app could issue a secondary request searching
 for all  attachment docs that match on one of the main docIds in a
 special field, and use the results to note which attachment of each doc
 (if any) caused the match.

 -Hoss




-- 
Thanks and Regards
Rahul A. Warawdekar


FastVectorHighlighter with wildcard queries

2011-09-08 Thread Rahul Warawdekar
Hi,

I am currently evaluating the FastVectorHighlighter in a Solr search based
project and have a couple of questions

1. Is there any specific reason why the FastVectorHighlighter does not
provide support for multiterm(wildcard) queries ?
2. What are the other constraints when using FastVectorHighlighter ?

-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Delta import issue

2011-07-12 Thread Rahul Warawdekar
Hi Peter,

Try adding the primary key attribute to the root entity 'ad' and check if
delta import works.
By the way, which database are you using ?

On Tue, Jul 12, 2011 at 10:27 AM, PeterKerk vettepa...@hotmail.com wrote:


 I'm having an issue with a delta import.

 I have the following in my data-config.xml:

document name=ads
entity name=ad
query=select * from ads WHERE approvedate  '1/1/1900' and
 publishdate
  getdate() AND depublishdate  getdate() and deletedate = '1/1/1900'
deltaImportQuery=select * from ads WHERE approvedate 
 '1/1/1900' and
 publishdate  getdate() AND depublishdate  getdate() and deletedate =
 '1/1/1900' and id='${dataimporter.delta.id}'
deltaQuery=select id from ads where updatedate 
 '${dataimporter.last_index_time}'

entity name=photo query=select locpath as
 locpath FROM ad_photos
 where adid=${ad.id}
deltaImportQuery=select locpath as locpath FROM
 ad_photos where
 adid='${dataimporter.delta.id}'
deltaQuery=select locpath as locpath FROM ad_photos
 where createdate
  '${dataimporter.last_index_time}'
field name=photos column=locpath  /
/entity

/entity
/document

 Now, when I add a new photo to the ad_photos table, its not index when I
 perform a delta import like so:
 http://localhost:8983/solr/i2m/dataimport?command=delta-import.
 When I do a FULL import I do see the new images.


 Here's the definition of ad_photos table:

 CREATE TABLE [dbo].[ad_photos](
[id] [int] IDENTITY(1,1) NOT NULL,
[adid] [int] NOT NULL,
[locpath] [nvarchar](150) NOT NULL,
[title] [nvarchar](50) NULL,
[createdate] [datetime] NOT NULL,
  CONSTRAINT [PK_ad_photos] PRIMARY KEY CLUSTERED
 (
[id] ASC
 )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY =
 OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
 ) ON [PRIMARY]

 GO



 What am I doing wrong?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Delta-import-issue-tp3162581p3162581.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Delta import issue

2011-07-12 Thread Rahul Warawdekar
entity *pk=id* name=ad  .

On Tue, Jul 12, 2011 at 11:34 AM, PeterKerk vettepa...@hotmail.com wrote:

 Hi Rahul,

 Not sure how I would do this Try adding the primary key attribute to the
 root entity 'ad'?

 In my entity ad I already have these fields (I left those out earlier for
 readability):
 field name=id column=ID /   -- this is primary key of ads table
 field name=userid column=userid /
 field name=title column=title /

 Is that what you mean?

 And I'm using MSSQL2008


 Thanks!

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Delta-import-issue-tp3162581p3162809.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks and Regards
Rahul A. Warawdekar


Solr Multithreading

2011-06-19 Thread Rahul Warawdekar
Hi,

I am currently working on a search based project which involves
indexing data from a SQL Server database including attachments using
DIH.
For indexing attachments (varbinary DB objects), I am using TikaEntityProcessor.

I am trying to use the multithreading to speed up the indexing but it
seems to fail when indexing attachments, even after appying a few Solr
fix patches.

My question is, Is the current multithreading feature stable in Solr
3.1 or it needs further enhancements ?

-- 
Thanks and Regards
Rahul A. Warawdekar


Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor

2011-05-26 Thread Rahul Warawdekar
Hi All,

I am using Solr 3.1 for one of our search based applications.
We are using DIH to index our data and TikaEntityProcessor to index
attachments.
Currently we are running into an issue while extracting content from one of
our MS Excel 2007 files, using TikaEntityProcessor.

The issue is the TikaEntityProcessor is hung without throwing any exception
which in tuen causes the indexing to be hung on the server.

Has anyone faced a similar kind of issue in the past with
TikaEntityProcessor ?

Also, does someone know of a way to just skip this type of behaviour for
that file and move to the next document to be indexed ?



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor

2011-05-26 Thread Rahul Warawdekar
Hi Markus,

It is Tika.
I tried using tika standalone.

On 5/26/11, Markus Jelsma markus.jel...@openindex.io wrote:
 Can you rule out Tika or Solr by trying to parse the file with a stand-alone
 Tika?

 Hi All,

 I am using Solr 3.1 for one of our search based applications.
 We are using DIH to index our data and TikaEntityProcessor to index
 attachments.
 Currently we are running into an issue while extracting content from one
 of
 our MS Excel 2007 files, using TikaEntityProcessor.

 The issue is the TikaEntityProcessor is hung without throwing any
 exception
 which in tuen causes the indexing to be hung on the server.

 Has anyone faced a similar kind of issue in the past with
 TikaEntityProcessor ?

 Also, does someone know of a way to just skip this type of behaviour for
 that file and move to the next document to be indexed ?



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: 2 index within the same Solr server ?

2011-03-29 Thread Rahul Warawdekar
Please refer
http://wiki.apache.org/solr/MultipleIndexes

On 3/29/11, Amel Fraisse amel.frai...@gmail.com wrote:
 Hello every body,

 Is it possible to create 2 index within the same Solr server ?

 Thank you.

 Amel.



-- 
Thanks and Regards
Rahul A. Warawdekar


Query regarding search term count in Solr

2011-02-09 Thread Rahul Warawdekar
Hi All,

This is Rahul and am using Solr for one of my upcoming projects.
I had a query regarding search term count using Solr.
We have a requirement in one of our search based projects to search the
results based on search term counts per document.

For eg,
if a user searches for something like solr[4:9], this query should return
only documents in which solr appears between 4 and 9 times (inclusively).
 if a user searches for something like solr lucene[4:9], this query should
return only documents in which the phrase solr lucene appears between 4
and 9 times (inclusively).

Is there any way from Solr to return results based on the search term and
phrase counts ?
If  not, can it be customized by extending existing Solr/Lucene libraries ?


-- 
Thanks and Regards
Rahul A. Warawdekar