Re: input XSLT

2009-03-13 Thread CIF Search
There is a fundamental problem with using 'pull' approach using DIH.
Normally people want a delta imports which are done using a timestamp field.
Now it may not always be possible for application servers to sync their
timestamps (given protocol restrictions due to security reasons). Due to
this Solr application is likely to miss a few records occasionally. Such a
problem does not arise if applications themseleves identify their records
and post. Should we not have such a feature in Solr, which will allow users
to push data onto the index in whichever format they wish to? This will also
facilitate plugging in solr seamlessly with all kinds of applications.

Regards,
CI

On Wed, Mar 11, 2009 at 11:52 PM, Noble Paul നോബിള്‍ नोब्ळ् 
noble.p...@gmail.com wrote:

  On Tue, Mar 10, 2009 at 12:17 PM, CIF Search cifsea...@gmail.com wrote:
  Just as you have an xslt response writer to convert Solr xml response to
  make it compatible with any application, on the input side do you have an
  xslt module that will parse xml documents to solr format before posting
 them
  to solr indexer. I have gone through dataimporthandler, but it works in
 data
  'pull' mode i.e. solr pulls data from the given location. I would still
 want
  to work with applications 'posting' documents to solr indexer as and when
  they want.
 it is a limitation of DIH, but if you can put your xml in a file
 behind an http server then you can fire a command to DIH to pull data
 from the url quite easily.
 
  Regards,
  CI
 



 --
 --Noble Paul



Re: How to correctly boost results in Solr Dismax query

2009-03-13 Thread dabboo

Hi Pete,

bq parameter works with q,alt query parameter. If you are passing the search
criteria using q.alt query parameter then this bq parameter comes into
picture. Also, q.alt doesnt support field boosting.

If you want to boost the records with their field value then you must use q
query parameter instead of q.alt. 'q' parameter actually uses qf parameters
from solrConfig for field boosting.

Let me know if you have any questions.

Thanks,
Amit Garg





Pete Smith-3 wrote:
 
 Hi,
 
 I have managed to build an index in Solr which I can search on keyword,
 produce facets, query facets etc. This is all working great. I have
 implemented my search using a dismax query so it searches predetermined
 fields.
 
 However, my results are coming back sorted by score which appears to be
 calculated by keyword relevancy only. I would like to adjust the score
 where fields have pre-determined values. I think I can do this with
 boost query and boost functions but the documentation here:
 
 http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3
 
 Is not particularly helpful. I tried adding adding a bq argument to my
 search: 
 
 bq=media:DVD^2
 
 (yes, this is an index of films!) but I find when I start adding more
 and more:
 
 bq=media:DVD^2bq=media:BLU-RAY^1.5
 
 I find the negative results - e.g. films that are DVD but are not
 BLU-RAY get negatively affected in their score. In the end it all seems
 to even out and my score is as it was before i started boosting.
 
 I must be doing this wrong and I wonder whether boost function comes
 in somewhere. Any ideas on how to correctly use boost?
 
 Cheers,
 Pete
 
 -- 
 Pete Smith
 Developer
 
 No.9 | 6 Portal Way | London | W3 6RU |
 T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111
 
 LOVEFiLM.com
 
 

-- 
View this message in context: 
http://www.nabble.com/How-to-correctly-boost-results-in-Solr-Dismax-query-tp22476204p22490850.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: input XSLT

2009-03-13 Thread Shalin Shekhar Mangar
On Fri, Mar 13, 2009 at 11:36 AM, CIF Search cifsea...@gmail.com wrote:

 There is a fundamental problem with using 'pull' approach using DIH.
 Normally people want a delta imports which are done using a timestamp
 field.
 Now it may not always be possible for application servers to sync their
 timestamps (given protocol restrictions due to security reasons). Due to
 this Solr application is likely to miss a few records occasionally. Such a
 problem does not arise if applications themseleves identify their records
 and post. Should we not have such a feature in Solr, which will allow users
 to push data onto the index in whichever format they wish to? This will
 also
 facilitate plugging in solr seamlessly with all kinds of applications.


You can of course push your documents to Solr using the XML/CSV update (or
using the solrj client). It's just that you can't push documents with DIH.

http://wiki.apache.org/solr/#head-98c3ee61c5fc837b09e3dfe3fb420491c9071be3

-- 
Regards,
Shalin Shekhar Mangar.


Re: input XSLT

2009-03-13 Thread CIF Search
But these documents have to be converted to a particular format before being
posted. Any XML document cannot be posted to Solr (with XSLT handled by Solr
internally).
DIH handles any xml format, but it operates in pull mode.


On Fri, Mar 13, 2009 at 11:45 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Fri, Mar 13, 2009 at 11:36 AM, CIF Search cifsea...@gmail.com wrote:

  There is a fundamental problem with using 'pull' approach using DIH.
  Normally people want a delta imports which are done using a timestamp
  field.
  Now it may not always be possible for application servers to sync their
  timestamps (given protocol restrictions due to security reasons). Due to
  this Solr application is likely to miss a few records occasionally. Such
 a
  problem does not arise if applications themseleves identify their records
  and post. Should we not have such a feature in Solr, which will allow
 users
  to push data onto the index in whichever format they wish to? This will
  also
  facilitate plugging in solr seamlessly with all kinds of applications.
 

 You can of course push your documents to Solr using the XML/CSV update (or
 using the solrj client). It's just that you can't push documents with DIH.

 http://wiki.apache.org/solr/#head-98c3ee61c5fc837b09e3dfe3fb420491c9071be3

 --
 Regards,
 Shalin Shekhar Mangar.



Re: Compound word search (maybe DisMaxQueryPaser problem)

2009-03-13 Thread Tobias Dittrich
First of all: sorry Chris, Walter .. I did not mean to put 
pressure on anyone. It's just that if you're stuck with 
something and you have that little needle stinging saying: 
maybe you're just too damn stupid for this ... :) So, thanks 
a lot for your answers.


As for index time expansion using synonyms: I think this is 
not an option for me since it would mean that I have to a) 
find all such words that might cause problems and b) find 
every variant that might possibly be used by customers. And 
then in the end I have to keep all my synonym files 
up-to-date. But the main design goal for my search 
implementation is little to no maintainance.


My original assumption for the DisMax Handler was, that it 
will just take the original query string and pass it to 
every field in its fieldlist using the fields configured 
analyzer stack. Maybe in the end add some stuff for the 
special options and so ... and then send the query to 
lucene. Can you explain why this approach was not choosen?


Thanks
Tobi


Chris Hostetter schrieb:

: Hmmm was my mail so weird or my question so stupid ... or is there simply
: noone with an answer? Not even a hint? :(

patience my freind, i've got a backlog of ~~500 Lucene related messages in 
my INBOX, and i was just reading your original email when this reply came 
in.


In generally this is a fairly hard problem ... the easiest solution i know 
of that works in most cases is to do index time expansion using the 
SYnonymFilter, so regardless of wether a document contains usbcable 
usb-cable or usb cable all three varients get indexed, and then the 
user can search for any of them.


the downside is that it can throw off your tf/idf stats for some terms (if 
they apear by themselves, and as part of a compound) and it can result in 
false positives for esoteric phrase searches (but that tends to be more of 
a theoretical problem then an actual one.


:  But this never happens since with the DisMax Searcher the parser produces a
:  query like this:
:  
:  ((category:blue | name:blue)~0.1 (category:tooth | name:tooth)~0.1)

...
:  to deal with this compound word problem? Is there another query parser that
:  already does the trick?

take a look at the FieldQParserPlugin ... it passes the raw query string 
to the analyser of a specified field -- this would let your TokenFilters 
see the stream of tokens (which isn't possible with the conventional 
QueryParser tokenization rules) but it doesn't have any of the 
field/query matric cross product goodness of dismax -- you'd only be 
able to query the one field.


(Hmmm i wonder if DisMaxQParser 2.0 could have an option to let you 
specify a FieldType whose analyzer was used to tokenize the query string 
instead of using the Lucene QueryParser JavaCC tokenization, and *then* 
the tokens resulting from that initial analyzer could be passed to the 
analyzers of the various qf fields ... hmmm, that might be just crazy 
enough to be too crazy to work)





-Hoss






Re: SolrJ : EmbeddedSolrServer and database data indexing

2009-03-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
nope ..
But you can still use SolrJ to invoke DIH.
cretae a ModifiableSolrParams with the required request parameters
create a QueryRequest with the params and then set the path as /dataimport

and invoke the command with the CommonsHttpSolrServer#request()



On Fri, Mar 13, 2009 at 8:40 AM, Ashish P ashish.ping...@gmail.com wrote:

 Is there any api in SolrJ that calls the dataImportHandler to execute
 commands like full-import and delta-import.
 Please help..


 Ashish P wrote:

 Is it possible to index DB data directly to solr using EmbeddedSolrServer.
 I tried using data-Config File and Full-import commad, it works. So
 assuming using CommonsHttpServer will also work. But can I do it with
 EmbeddedSolrServer??

 Thanks in advance...
 Ashish


 --
 View this message in context: 
 http://www.nabble.com/SolrJ-%3A-EmbeddedSolrServer-and-database-data-indexing-tp22488697p22489420.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
--Noble Paul


Two way Synonyms in Solr

2009-03-13 Thread dabboo

Hi,

I am implementing 2 way synonyms in solr using q query parameter. One way
synonym is working fine with q query parameter but 2 way is not working. 

for e.g.
If I defined 2 way synonyms in the file like:
value1, value2

It doesnt show any result for either of the value. 

Please suggest.

Thanks,
Amit Garg
-- 
View this message in context: 
http://www.nabble.com/Two-way-Synonyms-in-Solr-tp22492439p22492439.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Two way Synonyms in Solr

2009-03-13 Thread Koji Sekiguchi

dabboo wrote:

Hi,

I am implementing 2 way synonyms in solr using q query parameter. One way
synonym is working fine with q query parameter but 2 way is not working. 


for e.g.
If I defined 2 way synonyms in the file like:
value1, value2

It doesnt show any result for either of the value. 


Please suggest.

Thanks,
Amit Garg
  


Are you sure you have expand=true on your synonym definition?

Also you can use /admin/analysis.jsp for debugging the field.


Koji




Phrase Synonyms in solr

2009-03-13 Thread dabboo

Hi,

Can someone please tell me how to implement phrase synonyms in solr.

Thanks,
Amit
-- 
View this message in context: 
http://www.nabble.com/Phrase-Synonyms-in-solr-tp22492440p22492440.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr: ERRORs at Startup

2009-03-13 Thread Giovanni De Stefano
Hello everybody,

I am currently using:

   - Solr v1.3.0
   - Jboss jboss-5.0.1.GA http://jboss-5.0.1.ga/
   - Java jdk 1.5_06


When I start Solr within Jboss I see a lot of errors in the log but Solr
seems working (meaning I can see the admin interface but I cannot index my
DB...but that is another story :-) ).

Attached is the log file. Here just some of the error messages I see:
...
10:51:19,976 INFO  [ConnectionFactoryBindingService] Bound ConnectionManager
'jboss.jca:service=ConnectionFactoryBinding,name=JmsXA' to JNDI name
'java:JmsXA'
10:51:20,006 INFO  [TomcatDeployment] deploy, ctxPath=/
10:51:20,126 INFO  [TomcatDeployment] deploy, ctxPath=/jmx-console
10:51:20,525 INFO  [TomcatDeployment] deploy, ctxPath=/solr
10:51:20,617 ERROR [STDERR] Mar 13, 2009 10:51:20 AM
org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init()
10:51:20,631 ERROR [STDERR] Mar 13, 2009 10:51:20 AM
org.apache.solr.core.SolrResourceLoader locateInstanceDir
INFO: No /solr/home in JNDI
10:51:20,631 ERROR [STDERR] Mar 13, 2009 10:51:20 AM
org.apache.solr.core.SolrResourceLoader locateInstanceDir
INFO: using system property solr.solr.home:
/home/giovanni/development/search/solr
10:51:20,637 ERROR [STDERR] Mar 13, 2009 10:51:20 AM
org.apache.solr.core.CoreContainer$Initializer initialize
INFO: looking for solr.xml: /home/giovanni/development/search/solr/solr.xml
10:51:20,637 ERROR [STDERR] Mar 13, 2009 10:51:20 AM
org.apache.solr.core.SolrResourceLoader init
INFO: Solr home set to '/home/giovanni/development/search/solr/'
10:51:20,710 ERROR [STDERR] Mar 13, 2009 10:51:20 AM
org.apache.solr.core.SolrResourceLoader createClassLoader
INFO: Adding 'file:/home/giovanni/development/search/solr/lib/ojdbc14.jar'
to Solr classloader
10:51:20,734 ERROR [STDERR] Mar 13, 2009 10:51:20 AM
org.apache.solr.core.SolrResourceLoader locateInstanceDir
INFO: No /solr/home in JNDI
10:51:20,734 ERROR [STDERR] Mar 13, 2009 10:51:20 AM
org.apache.solr.core.SolrResourceLoader locateInstanceDir
INFO: using system property solr.solr.home:
/home/giovanni/development/search/solr
10:51:20,735 ERROR [STDERR] Mar 13, 2009 10:51:20 AM
org.apache.solr.core.SolrResourceLoader init
INFO: Solr home set to '/home/giovanni/development/search/solr/'
10:51:20,736 ERROR [STDERR] Mar 13, 2009 10:51:20 AM
org.apache.solr.core.SolrResourceLoader createClassLoader
INFO: Adding 'file:/home/giovanni/development/search/solr/lib/ojdbc14.jar'
to Solr classloader
10:51:21,964 ERROR [STDERR] Mar 13, 2009 10:51:21 AM
org.apache.solr.core.SolrConfig init
INFO: Loaded SolrConfig: solrconfig.xml
10:51:21,977 ERROR [STDERR] Mar 13, 2009 10:51:21 AM
org.apache.solr.core.SolrCore init
INFO: Opening new SolrCore at /home/giovanni/development/search/solr/,
dataDir=./solr/data/
10:51:21,991 ERROR [STDERR] Mar 13, 2009 10:51:21 AM
org.apache.solr.schema.IndexSchema readSchema
INFO: Reading Solr Schema
10:51:22,027 ERROR [STDERR] Mar 13, 2009 10:51:22 AM
org.apache.solr.schema.IndexSchema readSchema
INFO: Schema name=search
10:51:22,051 ERROR [STDERR] Mar 13, 2009 10:51:22 AM
org.apache.solr.util.plugin.AbstractPluginLoader load
INFO: created string: org.apache.solr.schema.StrField
10:51:22,061 ERROR [STDERR] Mar 13, 2009 10:51:22 AM
org.apache.solr.util.plugin.AbstractPluginLoader load
INFO: created boolean: org.apache.solr.schema.BoolField
10:51:22,067 ERROR [STDERR] Mar 13, 2009 10:51:22 AM
org.apache.solr.util.plugin.AbstractPluginLoader load
INFO: created integer: org.apache.solr.schema.IntField

10:51:22,472 ERROR [STDERR] Mar 13, 2009 10:51:22 AM
org.apache.solr.util.plugin.AbstractPluginLoader load
INFO: created ignored: org.apache.solr.schema.StrField
10:51:22,483 ERROR [STDERR] Mar 13, 2009 10:51:22 AM
org.apache.solr.schema.IndexSchema readSchema
INFO: default search field is text
10:51:22,485 ERROR [STDERR] Mar 13, 2009 10:51:22 AM
org.apache.solr.schema.IndexSchema readSchema
INFO: query parser default operator is OR
10:51:22,486 ERROR [STDERR] Mar 13, 2009 10:51:22 AM
org.apache.solr.schema.IndexSchema readSchema
INFO: unique key field: uri
10:51:22,541 ERROR [STDERR] Mar 13, 2009 10:51:22 AM
org.apache.solr.core.JmxMonitoredMap init
INFO: JMX monitoring is enabled. Adding Solr mbeans to JMX Server:
org.jboss.mx.server.mbeanserveri...@3deff3[ defaultDomain='jboss' ]
10:51:22,543 ERROR [STDERR] Mar 13, 2009 10:51:22 AM
org.apache.solr.core.SolrCore parseListener
INFO: Searching for listeners: //listen...@event=firstSearcher]
10:51:22,564 ERROR [STDERR] Mar 13, 2009 10:51:22 AM
org.apache.solr.core.SolrCore parseListener
INFO: Added SolrEventListener:
org.apache.solr.core.QuerySenderListener{queries=[{q=fast_warm,start=0,rows=10},
{q=static firstSearcher warming query from solrconfig.xml}]}

What am I missing? :-(

Any idea?

thanks in advance.

Giovanni
10:50:41,204 INFO  [ServerImpl] Starting JBoss (Microcontainer)...
10:50:41,207 INFO  [ServerImpl] Release ID: JBoss [Morpheus] 5.0.1.GA (build: 
SVNTag=JBoss_5_0_1_GA date=200902231221)
10:50:41,208 

Re: How to correctly boost results in Solr Dismax query

2009-03-13 Thread Pete Smith
Hi Amit,

Thanks very much for your reply. What you said makes things a bit
clearer but I am still a bit confused.

On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
 If you want to boost the records with their field value then you must use q
 query parameter instead of q.alt. 'q' parameter actually uses qf parameters
 from solrConfig for field boosting.

From the documentation for Dismax queries, I thought that q is simply
a keyword parameter:

From http://wiki.apache.org/solr/DisMaxRequestHandler:
q
The guts of the search defining the main query. This is designed to be
support raw input strings provided by users with no special escaping.
'+' and '-' characters are treated as mandatory and prohibited
modifiers for the subsequent terms. Text wrapped in balanced quote
characters '' are treated as phrases, any query containing an odd
number of quote characters is evaluated as if there were no quote
characters at all. Wildcards in this q parameter are not supported. 

And I thought 'qf' is a list of fields and boost scores:

From http://wiki.apache.org/solr/DisMaxRequestHandler:
qf (Query Fields)
List of fields and the boosts to associate with each of them when
building DisjunctionMaxQueries from the user's query. The format
supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that
fieldOne has a boost of 2.3, fieldTwo has the default boost, and
fieldThree has a boost of 0.4 ... this indicates that matches in
fieldOne are much more significant than matches in fieldTwo, which are
more significant than matches in fieldThree. 

But if I want to, say, search for films with 'indiana' in the title,
with media=DVD scoring higher than media=BLU-RAY then do I need to do
something like:

solr/select?q=indiana

And in my config:

str name=qfmedia^2/str

But I don't see where the actual *contents* of the media field would
determine the boost.

Sorry if I have misunderstood what you mean.

Cheers,
Pete

 Pete Smith-3 wrote:
  
  Hi,
  
  I have managed to build an index in Solr which I can search on keyword,
  produce facets, query facets etc. This is all working great. I have
  implemented my search using a dismax query so it searches predetermined
  fields.
  
  However, my results are coming back sorted by score which appears to be
  calculated by keyword relevancy only. I would like to adjust the score
  where fields have pre-determined values. I think I can do this with
  boost query and boost functions but the documentation here:
  
  http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3
  
  Is not particularly helpful. I tried adding adding a bq argument to my
  search: 
  
  bq=media:DVD^2
  
  (yes, this is an index of films!) but I find when I start adding more
  and more:
  
  bq=media:DVD^2bq=media:BLU-RAY^1.5
  
  I find the negative results - e.g. films that are DVD but are not
  BLU-RAY get negatively affected in their score. In the end it all seems
  to even out and my score is as it was before i started boosting.
  
  I must be doing this wrong and I wonder whether boost function comes
  in somewhere. Any ideas on how to correctly use boost?
  
  Cheers,
  Pete
  
  -- 
  Pete Smith
  Developer
  
  No.9 | 6 Portal Way | London | W3 6RU |
  T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111
  
  LOVEFiLM.com
  
  
 
-- 
Pete Smith
Developer

No.9 | 6 Portal Way | London | W3 6RU |
T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111

LOVEFiLM.com


Re: Solr: ERRORs at Startup

2009-03-13 Thread Toby Cole

Hi Giovanni,
	It looks like logging is configured strangely. Those messages in my  
solr setup (on tomcat 6 or jetty) appear as INFO level messages.
It could have something to do with your SLF4J setup, but I'm no expert  
on that side of things.
I wouldn't worry too much, the content of the messages doesn't imply  
anything bad going on.


Toby.

On 13 Mar 2009, at 09:57, Giovanni De Stefano wrote:


Hello everybody,

I am currently using:
Solr v1.3.0
Jboss jboss-5.0.1.GA
Java jdk 1.5_06

When I start Solr within Jboss I see a lot of errors in the log but  
Solr seems working (meaning I can see the admin interface but I  
cannot index my DB...but that is another story :-) ).


Attached is the log file. Here just some of the error messages I see:
...
10:51:19,976 INFO  [ConnectionFactoryBindingService] Bound  
ConnectionManager  
'jboss.jca:service=ConnectionFactoryBinding,name=JmsXA' to JNDI name  
'java:JmsXA'

10:51:20,006 INFO  [TomcatDeployment] deploy, ctxPath=/
10:51:20,126 INFO  [TomcatDeployment] deploy, ctxPath=/jmx-console
10:51:20,525 INFO  [TomcatDeployment] deploy, ctxPath=/solr
10:51:20,617 ERROR [STDERR] Mar 13, 2009 10:51:20 AM  
org.apache.solr.servlet.SolrDispatchFilter init

INFO: SolrDispatchFilter.init()
10:51:20,631 ERROR [STDERR] Mar 13, 2009 10:51:20 AM  
org.apache.solr.core.SolrResourceLoader locateInstanceDir

INFO: No /solr/home in JNDI
10:51:20,631 ERROR [STDERR] Mar 13, 2009 10:51:20 AM  
org.apache.solr.core.SolrResourceLoader locateInstanceDir
INFO: using system property solr.solr.home: /home/giovanni/ 
development/search/solr
10:51:20,637 ERROR [STDERR] Mar 13, 2009 10:51:20 AM  
org.apache.solr.core.CoreContainer$Initializer initialize
INFO: looking for solr.xml: /home/giovanni/development/search/solr/ 
solr.xml
10:51:20,637 ERROR [STDERR] Mar 13, 2009 10:51:20 AM  
org.apache.solr.core.SolrResourceLoader init

INFO: Solr home set to '/home/giovanni/development/search/solr/'
10:51:20,710 ERROR [STDERR] Mar 13, 2009 10:51:20 AM  
org.apache.solr.core.SolrResourceLoader createClassLoader
INFO: Adding 'file:/home/giovanni/development/search/solr/lib/ 
ojdbc14.jar' to Solr classloader
10:51:20,734 ERROR [STDERR] Mar 13, 2009 10:51:20 AM  
org.apache.solr.core.SolrResourceLoader locateInstanceDir

INFO: No /solr/home in JNDI
10:51:20,734 ERROR [STDERR] Mar 13, 2009 10:51:20 AM  
org.apache.solr.core.SolrResourceLoader locateInstanceDir
INFO: using system property solr.solr.home: /home/giovanni/ 
development/search/solr
10:51:20,735 ERROR [STDERR] Mar 13, 2009 10:51:20 AM  
org.apache.solr.core.SolrResourceLoader init

INFO: Solr home set to '/home/giovanni/development/search/solr/'
10:51:20,736 ERROR [STDERR] Mar 13, 2009 10:51:20 AM  
org.apache.solr.core.SolrResourceLoader createClassLoader
INFO: Adding 'file:/home/giovanni/development/search/solr/lib/ 
ojdbc14.jar' to Solr classloader
10:51:21,964 ERROR [STDERR] Mar 13, 2009 10:51:21 AM  
org.apache.solr.core.SolrConfig init

INFO: Loaded SolrConfig: solrconfig.xml
10:51:21,977 ERROR [STDERR] Mar 13, 2009 10:51:21 AM  
org.apache.solr.core.SolrCore init
INFO: Opening new SolrCore at /home/giovanni/development/search/ 
solr/, dataDir=./solr/data/
10:51:21,991 ERROR [STDERR] Mar 13, 2009 10:51:21 AM  
org.apache.solr.schema.IndexSchema readSchema

INFO: Reading Solr Schema
10:51:22,027 ERROR [STDERR] Mar 13, 2009 10:51:22 AM  
org.apache.solr.schema.IndexSchema readSchema

INFO: Schema name=search
10:51:22,051 ERROR [STDERR] Mar 13, 2009 10:51:22 AM  
org.apache.solr.util.plugin.AbstractPluginLoader load

INFO: created string: org.apache.solr.schema.StrField
10:51:22,061 ERROR [STDERR] Mar 13, 2009 10:51:22 AM  
org.apache.solr.util.plugin.AbstractPluginLoader load

INFO: created boolean: org.apache.solr.schema.BoolField
10:51:22,067 ERROR [STDERR] Mar 13, 2009 10:51:22 AM  
org.apache.solr.util.plugin.AbstractPluginLoader load

INFO: created integer: org.apache.solr.schema.IntField

10:51:22,472 ERROR [STDERR] Mar 13, 2009 10:51:22 AM  
org.apache.solr.util.plugin.AbstractPluginLoader load

INFO: created ignored: org.apache.solr.schema.StrField
10:51:22,483 ERROR [STDERR] Mar 13, 2009 10:51:22 AM  
org.apache.solr.schema.IndexSchema readSchema

INFO: default search field is text
10:51:22,485 ERROR [STDERR] Mar 13, 2009 10:51:22 AM  
org.apache.solr.schema.IndexSchema readSchema

INFO: query parser default operator is OR
10:51:22,486 ERROR [STDERR] Mar 13, 2009 10:51:22 AM  
org.apache.solr.schema.IndexSchema readSchema

INFO: unique key field: uri
10:51:22,541 ERROR [STDERR] Mar 13, 2009 10:51:22 AM  
org.apache.solr.core.JmxMonitoredMap init
INFO: JMX monitoring is enabled. Adding Solr mbeans to JMX Server:  
org.jboss.mx.server.mbeanserveri...@3deff3[ defaultDomain='jboss' ]
10:51:22,543 ERROR [STDERR] Mar 13, 2009 10:51:22 AM  
org.apache.solr.core.SolrCore parseListener

INFO: Searching for listeners: //listen...@event=firstSearcher]
10:51:22,564 ERROR [STDERR] Mar 13, 2009 10:51:22 AM  

Re: How to correctly boost results in Solr Dismax query

2009-03-13 Thread dabboo

Pete,

Sorry, if wasnt clear. Here is the explanation.

Suppose you have 2 records and they have films and media as 2 columns.

Now first record has values like films=Indiana and media=blue ray
and 2nd record has values like films=Bond and media=Indiana

Values for qf parameters

str name=qfmedia^2.0 films^1.0/str

Now, search for q=Indiana .. it should display both of the records but
record #2 will display above than the 1st.

Let me know if you still have questions.

Cheers,
amit


Pete Smith-3 wrote:
 
 Hi Amit,
 
 Thanks very much for your reply. What you said makes things a bit
 clearer but I am still a bit confused.
 
 On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
 If you want to boost the records with their field value then you must use
 q
 query parameter instead of q.alt. 'q' parameter actually uses qf
 parameters
 from solrConfig for field boosting.
 
From the documentation for Dismax queries, I thought that q is simply
 a keyword parameter:
 
From http://wiki.apache.org/solr/DisMaxRequestHandler:
 q
 The guts of the search defining the main query. This is designed to be
 support raw input strings provided by users with no special escaping.
 '+' and '-' characters are treated as mandatory and prohibited
 modifiers for the subsequent terms. Text wrapped in balanced quote
 characters '' are treated as phrases, any query containing an odd
 number of quote characters is evaluated as if there were no quote
 characters at all. Wildcards in this q parameter are not supported. 
 
 And I thought 'qf' is a list of fields and boost scores:
 
From http://wiki.apache.org/solr/DisMaxRequestHandler:
 qf (Query Fields)
 List of fields and the boosts to associate with each of them when
 building DisjunctionMaxQueries from the user's query. The format
 supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that
 fieldOne has a boost of 2.3, fieldTwo has the default boost, and
 fieldThree has a boost of 0.4 ... this indicates that matches in
 fieldOne are much more significant than matches in fieldTwo, which are
 more significant than matches in fieldThree. 
 
 But if I want to, say, search for films with 'indiana' in the title,
 with media=DVD scoring higher than media=BLU-RAY then do I need to do
 something like:
 
 solr/select?q=indiana
 
 And in my config:
 
 str name=qfmedia^2/str
 
 But I don't see where the actual *contents* of the media field would
 determine the boost.
 
 Sorry if I have misunderstood what you mean.
 
 Cheers,
 Pete
 
 Pete Smith-3 wrote:
  
  Hi,
  
  I have managed to build an index in Solr which I can search on keyword,
  produce facets, query facets etc. This is all working great. I have
  implemented my search using a dismax query so it searches predetermined
  fields.
  
  However, my results are coming back sorted by score which appears to be
  calculated by keyword relevancy only. I would like to adjust the score
  where fields have pre-determined values. I think I can do this with
  boost query and boost functions but the documentation here:
  
 
 http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3
  
  Is not particularly helpful. I tried adding adding a bq argument to my
  search: 
  
  bq=media:DVD^2
  
  (yes, this is an index of films!) but I find when I start adding more
  and more:
  
  bq=media:DVD^2bq=media:BLU-RAY^1.5
  
  I find the negative results - e.g. films that are DVD but are not
  BLU-RAY get negatively affected in their score. In the end it all seems
  to even out and my score is as it was before i started boosting.
  
  I must be doing this wrong and I wonder whether boost function comes
  in somewhere. Any ideas on how to correctly use boost?
  
  Cheers,
  Pete
  
  -- 
  Pete Smith
  Developer
  
  No.9 | 6 Portal Way | London | W3 6RU |
  T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111
  
  LOVEFiLM.com
  
  
 
 -- 
 Pete Smith
 Developer
 
 No.9 | 6 Portal Way | London | W3 6RU |
 T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111
 
 LOVEFiLM.com
 
 

-- 
View this message in context: 
http://www.nabble.com/How-to-correctly-boost-results-in-Solr-Dismax-query-tp22476204p22493646.html
Sent from the Solr - User mailing list archive at Nabble.com.



DIH with outer joins

2009-03-13 Thread Rui António da Cruz Pereira
I have queries with outer joins defined in some entities and for the 
same root object I can have two or more lines with different objects, 
for example:


Taking the following 3 tables, andquery defined in the entity with outer 
joins between tables:

Table1 - Table2 - Table3

I can have the following lines returned by the query:
Table1Instance1 - Table2Instance1 - Table3Instance1
Table1Instance1 - Table2Instance1 - Table3Instance2
Table1Instance1 - Table2Instance2 - Table3Instance3
Table1Instance2 - Table2Instance3 - Table3Instance4

I wanted to have a single document per root object instance (in this 
case per Table1 instance) but with the values from the different lines 
returned.


Is it possible to have this behavior in DataImportHandler? How?

Thanks in advance,
  Rui Pereira


Re: How to correctly boost results in Solr Dismax query

2009-03-13 Thread Pete Smith
Hi Amit,

Thanks again for your reply. I am understanding it a bit better but I
think it would help if I posted an example. Say I have three records:

doc
long name=id1/long
str name=mediaBLU-RAY/str
str name=titleIndiana Jones and the Kingdom of the Crystal
Skull/str
/doc
doc
long name=id2/long
str name=mediaDVD/str
str name=titleIndiana Jones and the Kingdom of the Crystal
Skull/str
/doc
doc
long name=id3/long
str name=mediaDVD/str
str name=titleCasino Royale/str
/doc

Now, if I search for indiana: select?q=indiana

I want the first two rows to come back (not the third as it does not
contain 'indiana'). I would like record 2 to be scored higher than
record 1 as it's media type is DVD.

At the moment I have in my config:

str name=qftitle/str

And i was trying to boost by media having a specific value by using 'bq'
but from what you told me that is incorrect.

Cheers,
Pete


On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote:
 Pete,
 
 Sorry, if wasnt clear. Here is the explanation.
 
 Suppose you have 2 records and they have films and media as 2 columns.
 
 Now first record has values like films=Indiana and media=blue ray
 and 2nd record has values like films=Bond and media=Indiana
 
 Values for qf parameters
 
 str name=qfmedia^2.0 films^1.0/str
 
 Now, search for q=Indiana .. it should display both of the records but
 record #2 will display above than the 1st.
 
 Let me know if you still have questions.
 
 Cheers,
 amit
 
 
 Pete Smith-3 wrote:
  
  Hi Amit,
  
  Thanks very much for your reply. What you said makes things a bit
  clearer but I am still a bit confused.
  
  On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
  If you want to boost the records with their field value then you must use
  q
  query parameter instead of q.alt. 'q' parameter actually uses qf
  parameters
  from solrConfig for field boosting.
  
 From the documentation for Dismax queries, I thought that q is simply
  a keyword parameter:
  
 From http://wiki.apache.org/solr/DisMaxRequestHandler:
  q
  The guts of the search defining the main query. This is designed to be
  support raw input strings provided by users with no special escaping.
  '+' and '-' characters are treated as mandatory and prohibited
  modifiers for the subsequent terms. Text wrapped in balanced quote
  characters '' are treated as phrases, any query containing an odd
  number of quote characters is evaluated as if there were no quote
  characters at all. Wildcards in this q parameter are not supported. 
  
  And I thought 'qf' is a list of fields and boost scores:
  
 From http://wiki.apache.org/solr/DisMaxRequestHandler:
  qf (Query Fields)
  List of fields and the boosts to associate with each of them when
  building DisjunctionMaxQueries from the user's query. The format
  supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that
  fieldOne has a boost of 2.3, fieldTwo has the default boost, and
  fieldThree has a boost of 0.4 ... this indicates that matches in
  fieldOne are much more significant than matches in fieldTwo, which are
  more significant than matches in fieldThree. 
  
  But if I want to, say, search for films with 'indiana' in the title,
  with media=DVD scoring higher than media=BLU-RAY then do I need to do
  something like:
  
  solr/select?q=indiana
  
  And in my config:
  
  str name=qfmedia^2/str
  
  But I don't see where the actual *contents* of the media field would
  determine the boost.
  
  Sorry if I have misunderstood what you mean.
  
  Cheers,
  Pete
  
  Pete Smith-3 wrote:
   
   Hi,
   
   I have managed to build an index in Solr which I can search on keyword,
   produce facets, query facets etc. This is all working great. I have
   implemented my search using a dismax query so it searches predetermined
   fields.
   
   However, my results are coming back sorted by score which appears to be
   calculated by keyword relevancy only. I would like to adjust the score
   where fields have pre-determined values. I think I can do this with
   boost query and boost functions but the documentation here:
   
  
  http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3
   
   Is not particularly helpful. I tried adding adding a bq argument to my
   search: 
   
   bq=media:DVD^2
   
   (yes, this is an index of films!) but I find when I start adding more
   and more:
   
   bq=media:DVD^2bq=media:BLU-RAY^1.5
   
   I find the negative results - e.g. films that are DVD but are not
   BLU-RAY get negatively affected in their score. In the end it all seems
   to even out and my score is as it was before i started boosting.
   
   I must be doing this wrong and I wonder whether boost function comes
   in somewhere. Any ideas on how to correctly use boost?
   
   Cheers,
   Pete
   
   -- 
   Pete Smith
   Developer
   
   No.9 | 6 Portal Way | London | W3 6RU |
   T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111
   
   LOVEFiLM.com
   
   
  
  -- 
  Pete Smith
  Developer
  
  No.9 | 

Re: How to correctly boost results in Solr Dismax query

2009-03-13 Thread dabboo

Pete,

bq works only with q.alt query and not with q queries. So, in your case you
would be using qf parameter for field boosting, you will have to give both
the fields in qf parameter i.e. both title and media.

try this

str name=qfmedia^1.0 title^100.0/str



Pete Smith-3 wrote:
 
 Hi Amit,
 
 Thanks again for your reply. I am understanding it a bit better but I
 think it would help if I posted an example. Say I have three records:
 
 doc
 long name=id1/long
 str name=mediaBLU-RAY/str
 str name=titleIndiana Jones and the Kingdom of the Crystal
 Skull/str
 /doc
 doc
 long name=id2/long
 str name=mediaDVD/str
 str name=titleIndiana Jones and the Kingdom of the Crystal
 Skull/str
 /doc
 doc
 long name=id3/long
 str name=mediaDVD/str
 str name=titleCasino Royale/str
 /doc
 
 Now, if I search for indiana: select?q=indiana
 
 I want the first two rows to come back (not the third as it does not
 contain 'indiana'). I would like record 2 to be scored higher than
 record 1 as it's media type is DVD.
 
 At the moment I have in my config:
 
 str name=qftitle/str
 
 And i was trying to boost by media having a specific value by using 'bq'
 but from what you told me that is incorrect.
 
 Cheers,
 Pete
 
 
 On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote:
 Pete,
 
 Sorry, if wasnt clear. Here is the explanation.
 
 Suppose you have 2 records and they have films and media as 2 columns.
 
 Now first record has values like films=Indiana and media=blue ray
 and 2nd record has values like films=Bond and media=Indiana
 
 Values for qf parameters
 
 str name=qfmedia^2.0 films^1.0/str
 
 Now, search for q=Indiana .. it should display both of the records but
 record #2 will display above than the 1st.
 
 Let me know if you still have questions.
 
 Cheers,
 amit
 
 
 Pete Smith-3 wrote:
  
  Hi Amit,
  
  Thanks very much for your reply. What you said makes things a bit
  clearer but I am still a bit confused.
  
  On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
  If you want to boost the records with their field value then you must
 use
  q
  query parameter instead of q.alt. 'q' parameter actually uses qf
  parameters
  from solrConfig for field boosting.
  
 From the documentation for Dismax queries, I thought that q is simply
  a keyword parameter:
  
 From http://wiki.apache.org/solr/DisMaxRequestHandler:
  q
  The guts of the search defining the main query. This is designed to
 be
  support raw input strings provided by users with no special escaping.
  '+' and '-' characters are treated as mandatory and prohibited
  modifiers for the subsequent terms. Text wrapped in balanced quote
  characters '' are treated as phrases, any query containing an odd
  number of quote characters is evaluated as if there were no quote
  characters at all. Wildcards in this q parameter are not supported. 
  
  And I thought 'qf' is a list of fields and boost scores:
  
 From http://wiki.apache.org/solr/DisMaxRequestHandler:
  qf (Query Fields)
  List of fields and the boosts to associate with each of them when
  building DisjunctionMaxQueries from the user's query. The format
  supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that
  fieldOne has a boost of 2.3, fieldTwo has the default boost, and
  fieldThree has a boost of 0.4 ... this indicates that matches in
  fieldOne are much more significant than matches in fieldTwo, which are
  more significant than matches in fieldThree. 
  
  But if I want to, say, search for films with 'indiana' in the title,
  with media=DVD scoring higher than media=BLU-RAY then do I need to do
  something like:
  
  solr/select?q=indiana
  
  And in my config:
  
  str name=qfmedia^2/str
  
  But I don't see where the actual *contents* of the media field would
  determine the boost.
  
  Sorry if I have misunderstood what you mean.
  
  Cheers,
  Pete
  
  Pete Smith-3 wrote:
   
   Hi,
   
   I have managed to build an index in Solr which I can search on
 keyword,
   produce facets, query facets etc. This is all working great. I have
   implemented my search using a dismax query so it searches
 predetermined
   fields.
   
   However, my results are coming back sorted by score which appears to
 be
   calculated by keyword relevancy only. I would like to adjust the
 score
   where fields have pre-determined values. I think I can do this with
   boost query and boost functions but the documentation here:
   
  
 
 http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3
   
   Is not particularly helpful. I tried adding adding a bq argument to
 my
   search: 
   
   bq=media:DVD^2
   
   (yes, this is an index of films!) but I find when I start adding
 more
   and more:
   
   bq=media:DVD^2bq=media:BLU-RAY^1.5
   
   I find the negative results - e.g. films that are DVD but are not
   BLU-RAY get negatively affected in their score. In the end it all
 seems
   to even out and my score is as it was before i started boosting.
   
   I must be doing this 

Solr: is there a default ClobTransformer?

2009-03-13 Thread Giovanni De Stefano
Hello all,

I am trying to index an Oracle DB with some Clob columns.

Following the doc I see that I need to transform my entity with a
ClobTransformer.

Now, my log says the following:

12:05:52,901 ERROR [STDERR] Mar 13, 2009 12:05:52 PM
org.apache.solr.handler.dataimport.EntityProcessorBase loadTransformers
SEVERE: Unable to load Transformer: ClobTransformer
java.lang.ClassNotFoundException: Unable to load ClobTransformer or
org.apache.solr.handler.dataimport.ClobTransformer
at
org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:587)
at
org.apache.solr.handler.dataimport.EntityProcessorBase.loadTransformers(EntityProcessorBase.java:96)
at
org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:159)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:80)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)

this is pretty easy to understand: no ClobTransformer implementation is
found in the classpath.

The question is: is there any default ClobTransformer shipped with Solrl or
do I have to implement a custom one?

Thanks,
Giovanni


Re: How to correctly boost results in Solr Dismax query

2009-03-13 Thread Pete Smith
Hi,

On Fri, 2009-03-13 at 03:57 -0700, dabboo wrote:
 bq works only with q.alt query and not with q queries. So, in your case you
 would be using qf parameter for field boosting, you will have to give both
 the fields in qf parameter i.e. both title and media.
 
 try this
 
 str name=qfmedia^1.0 title^100.0/str

But with that, how will it know to rank media:DVD higher than
media:BLU-RAY?

Cheers,
Pete


 Pete Smith-3 wrote:
  
  Hi Amit,
  
  Thanks again for your reply. I am understanding it a bit better but I
  think it would help if I posted an example. Say I have three records:
  
  doc
  long name=id1/long
  str name=mediaBLU-RAY/str
  str name=titleIndiana Jones and the Kingdom of the Crystal
  Skull/str
  /doc
  doc
  long name=id2/long
  str name=mediaDVD/str
  str name=titleIndiana Jones and the Kingdom of the Crystal
  Skull/str
  /doc
  doc
  long name=id3/long
  str name=mediaDVD/str
  str name=titleCasino Royale/str
  /doc
  
  Now, if I search for indiana: select?q=indiana
  
  I want the first two rows to come back (not the third as it does not
  contain 'indiana'). I would like record 2 to be scored higher than
  record 1 as it's media type is DVD.
  
  At the moment I have in my config:
  
  str name=qftitle/str
  
  And i was trying to boost by media having a specific value by using 'bq'
  but from what you told me that is incorrect.
  
  Cheers,
  Pete
  
  
  On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote:
  Pete,
  
  Sorry, if wasnt clear. Here is the explanation.
  
  Suppose you have 2 records and they have films and media as 2 columns.
  
  Now first record has values like films=Indiana and media=blue ray
  and 2nd record has values like films=Bond and media=Indiana
  
  Values for qf parameters
  
  str name=qfmedia^2.0 films^1.0/str
  
  Now, search for q=Indiana .. it should display both of the records but
  record #2 will display above than the 1st.
  
  Let me know if you still have questions.
  
  Cheers,
  amit
  
  
  Pete Smith-3 wrote:
   
   Hi Amit,
   
   Thanks very much for your reply. What you said makes things a bit
   clearer but I am still a bit confused.
   
   On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
   If you want to boost the records with their field value then you must
  use
   q
   query parameter instead of q.alt. 'q' parameter actually uses qf
   parameters
   from solrConfig for field boosting.
   
  From the documentation for Dismax queries, I thought that q is simply
   a keyword parameter:
   
  From http://wiki.apache.org/solr/DisMaxRequestHandler:
   q
   The guts of the search defining the main query. This is designed to
  be
   support raw input strings provided by users with no special escaping.
   '+' and '-' characters are treated as mandatory and prohibited
   modifiers for the subsequent terms. Text wrapped in balanced quote
   characters '' are treated as phrases, any query containing an odd
   number of quote characters is evaluated as if there were no quote
   characters at all. Wildcards in this q parameter are not supported. 
   
   And I thought 'qf' is a list of fields and boost scores:
   
  From http://wiki.apache.org/solr/DisMaxRequestHandler:
   qf (Query Fields)
   List of fields and the boosts to associate with each of them when
   building DisjunctionMaxQueries from the user's query. The format
   supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that
   fieldOne has a boost of 2.3, fieldTwo has the default boost, and
   fieldThree has a boost of 0.4 ... this indicates that matches in
   fieldOne are much more significant than matches in fieldTwo, which are
   more significant than matches in fieldThree. 
   
   But if I want to, say, search for films with 'indiana' in the title,
   with media=DVD scoring higher than media=BLU-RAY then do I need to do
   something like:
   
   solr/select?q=indiana
   
   And in my config:
   
   str name=qfmedia^2/str
   
   But I don't see where the actual *contents* of the media field would
   determine the boost.
   
   Sorry if I have misunderstood what you mean.
   
   Cheers,
   Pete
   
   Pete Smith-3 wrote:

Hi,

I have managed to build an index in Solr which I can search on
  keyword,
produce facets, query facets etc. This is all working great. I have
implemented my search using a dismax query so it searches
  predetermined
fields.

However, my results are coming back sorted by score which appears to
  be
calculated by keyword relevancy only. I would like to adjust the
  score
where fields have pre-determined values. I think I can do this with
boost query and boost functions but the documentation here:

   
  
  http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3

Is not particularly helpful. I tried adding adding a bq argument to
  my
search: 

bq=media:DVD^2

(yes, this is an index of films!) but I find when I start adding
  more

Re: DIH with outer joins

2009-03-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
it is not very clear to me on how it works

probably you can put in the queries here.

you can do all the joins in the db in one complex query and use that
straightaway in an entity. You do not have to do any joins inside DIH
itself

On Fri, Mar 13, 2009 at 4:47 PM, Rui António da Cruz Pereira
ruipereira...@gmail.com wrote:
 I have queries with outer joins defined in some entities and for the same
 root object I can have two or more lines with different objects, for
 example:

 Taking the following 3 tables, andquery defined in the entity with outer
 joins between tables:
 Table1 - Table2 - Table3

 I can have the following lines returned by the query:
 Table1Instance1 - Table2Instance1 - Table3Instance1
 Table1Instance1 - Table2Instance1 - Table3Instance2
 Table1Instance1 - Table2Instance2 - Table3Instance3
 Table1Instance2 - Table2Instance3 - Table3Instance4

 I wanted to have a single document per root object instance (in this case
 per Table1 instance) but with the values from the different lines returned.

 Is it possible to have this behavior in DataImportHandler? How?

 Thanks in advance,
  Rui Pereira




-- 
--Noble Paul


Re: Solr: is there a default ClobTransformer?

2009-03-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
ClobTranformer is a Solr1.4 feature. which one are you using?

On Fri, Mar 13, 2009 at 4:39 PM, Giovanni De Stefano
giovanni.destef...@gmail.com wrote:
 Hello all,

 I am trying to index an Oracle DB with some Clob columns.

 Following the doc I see that I need to transform my entity with a
 ClobTransformer.

 Now, my log says the following:

 12:05:52,901 ERROR [STDERR] Mar 13, 2009 12:05:52 PM
 org.apache.solr.handler.dataimport.EntityProcessorBase loadTransformers
 SEVERE: Unable to load Transformer: ClobTransformer
 java.lang.ClassNotFoundException: Unable to load ClobTransformer or
 org.apache.solr.handler.dataimport.ClobTransformer
    at
 org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:587)
    at
 org.apache.solr.handler.dataimport.EntityProcessorBase.loadTransformers(EntityProcessorBase.java:96)
    at
 org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:159)
    at
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:80)
    at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285)
    at
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
    at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
    at
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)

 this is pretty easy to understand: no ClobTransformer implementation is
 found in the classpath.

 The question is: is there any default ClobTransformer shipped with Solrl or
 do I have to implement a custom one?

 Thanks,
 Giovanni




-- 
--Noble Paul


Re: DIH with outer joins

2009-03-13 Thread Rui António da Cruz Pereira
I thought that I could remove the uniqueKey in Solr and then have more 
that one document with the same id, but then I don't know if in 
delta-imports the documents outdated or deleted are updated (updated 
document is added and then we would have the outdated and the updated 
document in the index) or removed.


Noble Paul നോബിള്‍ नोब्ळ् wrote:

it is not very clear to me on how it works

probably you can put in the queries here.

you can do all the joins in the db in one complex query and use that
straightaway in an entity. You do not have to do any joins inside DIH
itself

On Fri, Mar 13, 2009 at 4:47 PM, Rui António da Cruz Pereira
ruipereira...@gmail.com wrote:
  

I have queries with outer joins defined in some entities and for the same
root object I can have two or more lines with different objects, for
example:

Taking the following 3 tables, andquery defined in the entity with outer
joins between tables:
Table1 - Table2 - Table3

I can have the following lines returned by the query:
Table1Instance1 - Table2Instance1 - Table3Instance1
Table1Instance1 - Table2Instance1 - Table3Instance2
Table1Instance1 - Table2Instance2 - Table3Instance3
Table1Instance2 - Table2Instance3 - Table3Instance4

I wanted to have a single document per root object instance (in this case
per Table1 instance) but with the values from the different lines returned.

Is it possible to have this behavior in DataImportHandler? How?

Thanks in advance,
 Rui Pereira






  




Re: Two way Synonyms in Solr

2009-03-13 Thread dabboo

Yes, I have defined expand=true for synonym definition.
But still, 2 way synonym are not working.

Also, is there any way, phrase synonym starts working.




Koji Sekiguchi-2 wrote:
 
 dabboo wrote:
 Hi,

 I am implementing 2 way synonyms in solr using q query parameter. One way
 synonym is working fine with q query parameter but 2 way is not working. 

 for e.g.
 If I defined 2 way synonyms in the file like:
 value1, value2

 It doesnt show any result for either of the value. 

 Please suggest.

 Thanks,
 Amit Garg
   
 
 Are you sure you have expand=true on your synonym definition?
 
 Also you can use /admin/analysis.jsp for debugging the field.
 
 
 Koji
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Two-way-Synonyms-in-Solr-tp22492439p22494772.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DIH with outer joins

2009-03-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
have one root entity which just does a select id from Table1  
.Then have a child entiy which does all the joins and return all other
columns for that 'id'.

On Fri, Mar 13, 2009 at 5:10 PM, Rui António da Cruz Pereira
ruipereira...@gmail.com wrote:
 I thought that I could remove the uniqueKey in Solr and then have more that
 one document with the same id, but then I don't know if in delta-imports the
 documents outdated or deleted are updated (updated document is added and
 then we would have the outdated and the updated document in the index) or
 removed.

 Noble Paul നോബിള്‍ नोब्ळ् wrote:

 it is not very clear to me on how it works

 probably you can put in the queries here.

 you can do all the joins in the db in one complex query and use that
 straightaway in an entity. You do not have to do any joins inside DIH
 itself

 On Fri, Mar 13, 2009 at 4:47 PM, Rui António da Cruz Pereira
 ruipereira...@gmail.com wrote:


 I have queries with outer joins defined in some entities and for the same
 root object I can have two or more lines with different objects, for
 example:

 Taking the following 3 tables, andquery defined in the entity with outer
 joins between tables:
 Table1 - Table2 - Table3

 I can have the following lines returned by the query:
 Table1Instance1 - Table2Instance1 - Table3Instance1
 Table1Instance1 - Table2Instance1 - Table3Instance2
 Table1Instance1 - Table2Instance2 - Table3Instance3
 Table1Instance2 - Table2Instance3 - Table3Instance4

 I wanted to have a single document per root object instance (in this case
 per Table1 instance) but with the values from the different lines
 returned.

 Is it possible to have this behavior in DataImportHandler? How?

 Thanks in advance,
  Rui Pereira











-- 
--Noble Paul


Re: Solr: ERRORs at Startup

2009-03-13 Thread Giovanni De Stefano
Hello Toby,

thank you for your quick reply.

Even setting everything to INFO through
http://localhost:8080/solr/admin/logging didn't help.

But considering you do not see any bad issue here, at this time I will
ignore those ERROR messages :-)

Cheers,
Giovanni


On Fri, Mar 13, 2009 at 11:16 AM, Toby Cole toby.c...@semantico.com wrote:

 Hi Giovanni,
It looks like logging is configured strangely. Those messages in my
 solr setup (on tomcat 6 or jetty) appear as INFO level messages.
 It could have something to do with your SLF4J setup, but I'm no expert on
 that side of things.
 I wouldn't worry too much, the content of the messages doesn't imply
 anything bad going on.

 Toby.


 On 13 Mar 2009, at 09:57, Giovanni De Stefano wrote:

  Hello everybody,

 I am currently using:
 Solr v1.3.0
 Jboss jboss-5.0.1.GA
 Java jdk 1.5_06

 When I start Solr within Jboss I see a lot of errors in the log but Solr
 seems working (meaning I can see the admin interface but I cannot index my
 DB...but that is another story :-) ).

 Attached is the log file. Here just some of the error messages I see:
 ...
 10:51:19,976 INFO  [ConnectionFactoryBindingService] Bound
 ConnectionManager 'jboss.jca:service=ConnectionFactoryBinding,name=JmsXA' to
 JNDI name 'java:JmsXA'
 10:51:20,006 INFO  [TomcatDeployment] deploy, ctxPath=/
 10:51:20,126 INFO  [TomcatDeployment] deploy, ctxPath=/jmx-console
 10:51:20,525 INFO  [TomcatDeployment] deploy, ctxPath=/solr
 10:51:20,617 ERROR [STDERR] Mar 13, 2009 10:51:20 AM
 org.apache.solr.servlet.SolrDispatchFilter init
 INFO: SolrDispatchFilter.init()
 10:51:20,631 ERROR [STDERR] Mar 13, 2009 10:51:20 AM
 org.apache.solr.core.SolrResourceLoader locateInstanceDir
 INFO: No /solr/home in JNDI
 10:51:20,631 ERROR [STDERR] Mar 13, 2009 10:51:20 AM
 org.apache.solr.core.SolrResourceLoader locateInstanceDir
 INFO: using system property solr.solr.home:
 /home/giovanni/development/search/solr
 10:51:20,637 ERROR [STDERR] Mar 13, 2009 10:51:20 AM
 org.apache.solr.core.CoreContainer$Initializer initialize
 INFO: looking for solr.xml:
 /home/giovanni/development/search/solr/solr.xml
 10:51:20,637 ERROR [STDERR] Mar 13, 2009 10:51:20 AM
 org.apache.solr.core.SolrResourceLoader init
 INFO: Solr home set to '/home/giovanni/development/search/solr/'
 10:51:20,710 ERROR [STDERR] Mar 13, 2009 10:51:20 AM
 org.apache.solr.core.SolrResourceLoader createClassLoader
 INFO: Adding 'file:/home/giovanni/development/search/solr/lib/ojdbc14.jar'
 to Solr classloader
 10:51:20,734 ERROR [STDERR] Mar 13, 2009 10:51:20 AM
 org.apache.solr.core.SolrResourceLoader locateInstanceDir
 INFO: No /solr/home in JNDI
 10:51:20,734 ERROR [STDERR] Mar 13, 2009 10:51:20 AM
 org.apache.solr.core.SolrResourceLoader locateInstanceDir
 INFO: using system property solr.solr.home:
 /home/giovanni/development/search/solr
 10:51:20,735 ERROR [STDERR] Mar 13, 2009 10:51:20 AM
 org.apache.solr.core.SolrResourceLoader init
 INFO: Solr home set to '/home/giovanni/development/search/solr/'
 10:51:20,736 ERROR [STDERR] Mar 13, 2009 10:51:20 AM
 org.apache.solr.core.SolrResourceLoader createClassLoader
 INFO: Adding 'file:/home/giovanni/development/search/solr/lib/ojdbc14.jar'
 to Solr classloader
 10:51:21,964 ERROR [STDERR] Mar 13, 2009 10:51:21 AM
 org.apache.solr.core.SolrConfig init
 INFO: Loaded SolrConfig: solrconfig.xml
 10:51:21,977 ERROR [STDERR] Mar 13, 2009 10:51:21 AM
 org.apache.solr.core.SolrCore init
 INFO: Opening new SolrCore at /home/giovanni/development/search/solr/,
 dataDir=./solr/data/
 10:51:21,991 ERROR [STDERR] Mar 13, 2009 10:51:21 AM
 org.apache.solr.schema.IndexSchema readSchema
 INFO: Reading Solr Schema
 10:51:22,027 ERROR [STDERR] Mar 13, 2009 10:51:22 AM
 org.apache.solr.schema.IndexSchema readSchema
 INFO: Schema name=search
 10:51:22,051 ERROR [STDERR] Mar 13, 2009 10:51:22 AM
 org.apache.solr.util.plugin.AbstractPluginLoader load
 INFO: created string: org.apache.solr.schema.StrField
 10:51:22,061 ERROR [STDERR] Mar 13, 2009 10:51:22 AM
 org.apache.solr.util.plugin.AbstractPluginLoader load
 INFO: created boolean: org.apache.solr.schema.BoolField
 10:51:22,067 ERROR [STDERR] Mar 13, 2009 10:51:22 AM
 org.apache.solr.util.plugin.AbstractPluginLoader load
 INFO: created integer: org.apache.solr.schema.IntField
 
 10:51:22,472 ERROR [STDERR] Mar 13, 2009 10:51:22 AM
 org.apache.solr.util.plugin.AbstractPluginLoader load
 INFO: created ignored: org.apache.solr.schema.StrField
 10:51:22,483 ERROR [STDERR] Mar 13, 2009 10:51:22 AM
 org.apache.solr.schema.IndexSchema readSchema
 INFO: default search field is text
 10:51:22,485 ERROR [STDERR] Mar 13, 2009 10:51:22 AM
 org.apache.solr.schema.IndexSchema readSchema
 INFO: query parser default operator is OR
 10:51:22,486 ERROR [STDERR] Mar 13, 2009 10:51:22 AM
 org.apache.solr.schema.IndexSchema readSchema
 INFO: unique key field: uri
 10:51:22,541 ERROR [STDERR] Mar 13, 2009 10:51:22 AM
 org.apache.solr.core.JmxMonitoredMap init
 INFO: JMX monitoring is 

DIH with outer joins

2009-03-13 Thread Rui António da Cruz Pereira
I have queries with outer joins defined in some entities and for the 
same root object I can have two or more lines with different objects, 
for example:


Taking the following 3 tables, andquery defined in the entity with outer 
joins between tables:

Table1 - Table2 - Table3

I can have the following lines returned by the query:
Table1Instance1 - Table2Instance1 - Table3Instance1
Table1Instance1 - Table2Instance1 - Table3Instance2
Table1Instance1 - Table2Instance2 - Table3Instance3
Table1Instance2 - Table2Instance3 - Table3Instance4

I wanted to have a single document per root object instance (in this 
case per Table1 instance) but with the values from the different lines 
returned.


Is it possible to have this behavior in DataImportHandler? How?

Thanks in advance,
  Rui Pereira


Re: input XSLT

2009-03-13 Thread Grant Ingersoll

Have you tried Solr Cell?  http://wiki.apache.org/solr/ExtractingRequestHandler



On Mar 13, 2009, at 2:49 AM, CIF Search wrote:

But these documents have to be converted to a particular format  
before being
posted. Any XML document cannot be posted to Solr (with XSLT handled  
by Solr

internally).
DIH handles any xml format, but it operates in pull mode.


On Fri, Mar 13, 2009 at 11:45 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

On Fri, Mar 13, 2009 at 11:36 AM, CIF Search cifsea...@gmail.com  
wrote:



There is a fundamental problem with using 'pull' approach using DIH.
Normally people want a delta imports which are done using a  
timestamp

field.
Now it may not always be possible for application servers to sync  
their
timestamps (given protocol restrictions due to security reasons).  
Due to
this Solr application is likely to miss a few records  
occasionally. Such

a
problem does not arise if applications themseleves identify their  
records
and post. Should we not have such a feature in Solr, which will  
allow

users
to push data onto the index in whichever format they wish to? This  
will

also
facilitate plugging in solr seamlessly with all kinds of  
applications.




You can of course push your documents to Solr using the XML/CSV  
update (or
using the solrj client). It's just that you can't push documents  
with DIH.


http://wiki.apache.org/solr/#head-98c3ee61c5fc837b09e3dfe3fb420491c9071be3

--
Regards,
Shalin Shekhar Mangar.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: Solr: is there a default ClobTransformer?

2009-03-13 Thread Giovanni De Stefano
Hello Paul,

I must have missed that detail :-)

I am currently using Solr 1.3.0.

Thank you very much for your remark: I just downloaded the latest nightly
build, compile the whole thing and included the
apache-solr-dataimporthandler-1.4-dev.jar in my $SOLR_HOME/lib folder.

I have just been able to index an Oracle DB with CLOB columns :-)

I hope Solr 1.4.0 will be released soon so that I can have a clean
installation rather than a hacked one (now I am using Solr 1.3.0 core with
the addition of the mentioned dataimporthandlerjar from Solr 1.4.0).

Cheers,
Giovanni


On Fri, Mar 13, 2009 at 12:29 PM, Noble Paul നോബിള്‍ नोब्ळ् 
noble.p...@gmail.com wrote:

 ClobTranformer is a Solr1.4 feature. which one are you using?

 On Fri, Mar 13, 2009 at 4:39 PM, Giovanni De Stefano
 giovanni.destef...@gmail.com wrote:
  Hello all,
 
  I am trying to index an Oracle DB with some Clob columns.
 
  Following the doc I see that I need to transform my entity with a
  ClobTransformer.
 
  Now, my log says the following:
 
  12:05:52,901 ERROR [STDERR] Mar 13, 2009 12:05:52 PM
  org.apache.solr.handler.dataimport.EntityProcessorBase loadTransformers
  SEVERE: Unable to load Transformer: ClobTransformer
  java.lang.ClassNotFoundException: Unable to load ClobTransformer or
  org.apache.solr.handler.dataimport.ClobTransformer
 at
 
 org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:587)
 at
 
 org.apache.solr.handler.dataimport.EntityProcessorBase.loadTransformers(EntityProcessorBase.java:96)
 at
 
 org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:159)
 at
 
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:80)
 at
 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285)
 at
 
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
 at
 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
 at
 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
 
  this is pretty easy to understand: no ClobTransformer implementation is
  found in the classpath.
 
  The question is: is there any default ClobTransformer shipped with Solrl
 or
  do I have to implement a custom one?
 
  Thanks,
  Giovanni
 



 --
 --Noble Paul



Re: fl wildcards

2009-03-13 Thread Erik Hatcher


On Mar 12, 2009, at 1:43 PM, Schley Andrew Kutz wrote:
If I wanted to hack Solr so that it has the ability to process  
wildcards for the field list parameter (fl), where would I look?  
(Perhaps I should look on the solr-dev mailing list, but since I am  
already on this one I thought I would start here). Thanks!


One strategy that can be used (and Solr Flare, a RoR plugin, employs  
this) is to make a request to Solr's Luke request handler at client  
startup (or whenever you want to reset) to get a list of the fields  
actually in the index and use that to build the field list and other  
dynamically controlled things, like facet.field parameters.


For example, Flare takes all fields returned from the luke request  
handler, and all that match *_facet become facet.field parameters in  
the search requests.


Wasn't exactly an answer to your question.  Wildcard support for field  
names in Solr is a feature that really deserves broader implementation  
consideration than just hacking one spot for fl.  Other field list  
parameters, like hl.fl could use that capability too.


Erik



Re: fl wildcards

2009-03-13 Thread Schley Andrew Kutz
Thanks. If I knew where to begin to implement this, I would. It seems  
to me that the constraining of field lists must occur at the very core  
of Solr because of the reduction in search time when specifying a  
restrictive set of fields to return. For example, when I return 10  
entire documents the search takes a QTime of 170, which I presume is  
milliseconds. However, the time it takes a browser to render the data  
puts the actual time into seconds. When I restrict the field list with  
fl=id,name, the QTime is reduced to 24 -- not a small difference.


So, this leads me to believe that the application of field list  
restrictions is not simply occurring in the response writer. Does  
anyone know where it *is* occurring?


--
-a

Ideally, a code library must be immediately usable by naive  
developers, easily customized by more sophisticated developers, and  
readily extensible by experts. -- L. Stein


On Mar 13, 2009, at 7:21 AM, Erik Hatcher wrote:



On Mar 12, 2009, at 1:43 PM, Schley Andrew Kutz wrote:
If I wanted to hack Solr so that it has the ability to process  
wildcards for the field list parameter (fl), where would I look?  
(Perhaps I should look on the solr-dev mailing list, but since I am  
already on this one I thought I would start here). Thanks!


One strategy that can be used (and Solr Flare, a RoR plugin, employs  
this) is to make a request to Solr's Luke request handler at client  
startup (or whenever you want to reset) to get a list of the  
fields actually in the index and use that to build the field list  
and other dynamically controlled things, like facet.field parameters.


For example, Flare takes all fields returned from the luke request  
handler, and all that match *_facet become facet.field parameters in  
the search requests.


Wasn't exactly an answer to your question.  Wildcard support for  
field names in Solr is a feature that really deserves broader  
implementation consideration than just hacking one spot for fl.   
Other field list parameters, like hl.fl could use that capability too.


Erik





Stemming in Solr

2009-03-13 Thread dabboo

Hi,

Can someone please let me know how to implement stemming in solr. I am
particularly looking of the changes, I might need to do in the config files
and also if I need to use some already supplied libraries/factories etc etc.

It would be a great help.

Thanks,
Amit Garg

-- 
View this message in context: 
http://www.nabble.com/Stemming-in-Solr-tp22495961p22495961.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: fl wildcards

2009-03-13 Thread Mark Miller

Erik Hatcher wrote:



Wasn't exactly an answer to your question.  Wildcard support for field 
names in Solr is a feature that really deserves broader implementation 
consideration than just hacking one spot for fl.  Other field list 
parameters, like hl.fl could use that capability too.

I think 540 added wildcard support for hl.fl


--
- Mark

http://www.lucidimagination.com





Storing map in Field

2009-03-13 Thread Jeff Crowder
All,

I'm working with the sample schema, and have a scenario where I would like
to store multiple prices in a map of some sort.  This would be used for a
scenario where a single product has different prices based on a price
list.  For instance:

add
  doc
field name=idSKU001/field
field name=nameA Sample Product/field
field name=price[pricelist1]119.99/field
field name=price[pricelist2]109.99/field
  /doc
/add

Is something like this possible?

Regards,
-Jeff


Re: Storing map in Field

2009-03-13 Thread Toby Cole
I don't think anything _quite_ like that exists, however you could use  
wildcard fields to achieve pretty much the same thing.


You could use a post like this:
add
 doc
   field name=idSKU001/field
   field name=nameA Sample Product/field
   field name=price_pricelist1119.99/field
   field name=price_pricelist2109.99/field
 /doc
/add

if you have a field definition in your schema.xml like:
dynamicField name=price_*  type=float  indexed=true   
stored=true/


Regards, Toby.

On 13 Mar 2009, at 14:01, Jeff Crowder wrote:


All,

I'm working with the sample schema, and have a scenario where I  
would like
to store multiple prices in a map of some sort.  This would be  
used for a
scenario where a single product has different prices based on a  
price

list.  For instance:

add
 doc
   field name=idSKU001/field
   field name=nameA Sample Product/field
   field name=price[pricelist1]119.99/field
   field name=price[pricelist2]109.99/field
 /doc
/add

Is something like this possible?

Regards,
-Jeff


Toby Cole
Software Engineer

Semantico
E: toby.c...@semantico.com
W: www.semantico.com



Re: Storing map in Field

2009-03-13 Thread Erick Erickson
H, what do you want to *do* with those multiple prices?
Search? Display? Change all the time? Each of these operations
will generate different suggestions I daresay

Best
Erick

On Fri, Mar 13, 2009 at 10:01 AM, Jeff Crowder jcrow...@tellusweb.comwrote:

 All,

 I'm working with the sample schema, and have a scenario where I would like
 to store multiple prices in a map of some sort.  This would be used for a
 scenario where a single product has different prices based on a price
 list.  For instance:

 add
  doc
field name=idSKU001/field
field name=nameA Sample Product/field
field name=price[pricelist1]119.99/field
field name=price[pricelist2]109.99/field
  /doc
 /add

 Is something like this possible?

 Regards,
 -Jeff



Re: DIH with outer joins

2009-03-13 Thread Walter Underwood
It may be easier to make a view in the database and index the view.
Databases have good tools for that.

wunder

On 3/13/09 2:46 AM, Rui António da Cruz Pereira ruipereira...@gmail.com
wrote:

 I have queries with outer joins defined in some entities and for the
 same root object I can have two or more lines with different objects,
 for example:
 
 Taking the following 3 tables, andquery defined in the entity with outer
 joins between tables:
 Table1 - Table2 - Table3
 
 I can have the following lines returned by the query:
 Table1Instance1 - Table2Instance1 - Table3Instance1
 Table1Instance1 - Table2Instance1 - Table3Instance2
 Table1Instance1 - Table2Instance2 - Table3Instance3
 Table1Instance2 - Table2Instance3 - Table3Instance4
 
 I wanted to have a single document per root object instance (in this
 case per Table1 instance) but with the values from the different lines
 returned.
 
 Is it possible to have this behavior in DataImportHandler? How?
 
 Thanks in advance,
Rui Pereira



rsync snappuller slowdown Qtime

2009-03-13 Thread sunnyfr

Hi,

Noticing a relevant latency during search, I tried to turn off cronjob and
test it manually.

And it was obvious how during snappuller on a slave server, the query time
was a lot longer than the rest of the time.
Even snapinstaller didn't affect the query time. 

without any action around 200msec with snappuller 3-6sec ..

Do you have any idea?

Thanks a lot,

-- 
View this message in context: 
http://www.nabble.com/rsync-snappuller-slowdown-Qtime-tp22497625p22497625.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: fl wildcards

2009-03-13 Thread Schley Andrew Kutz
That makes sense, since hl.fl probably can get away with calculating  
in the writer, and not as part of the core. However, I really need  
wildcard (or globbing) support for field lists as part of the common  
query parameter fl. Again, if someone can just point me to where the  
Solr core is using the contents of the fl param, I am happy to  
implement this, if only locally for my purposes.


Thanks!

--
-a

Ideally, a code library must be immediately usable by naive  
developers, easily customized by more sophisticated developers, and  
readily extensible by experts. -- L. Stein


On Mar 13, 2009, at 8:10 AM, Mark Miller wrote:


Erik Hatcher wrote:



Wasn't exactly an answer to your question.  Wildcard support for  
field names in Solr is a feature that really deserves broader  
implementation consideration than just hacking one spot for fl.   
Other field list parameters, like hl.fl could use that capability  
too.

I think 540 added wildcard support for hl.fl


--
- Mark

http://www.lucidimagination.com







Re: DIH use of the ?command=full-import entity= command option

2009-03-13 Thread Jon Baer
Bare in mind (and correct me if Im wrong) but a full-import is still  
a full-import no matter what entity you tack onto the param.


Thus I think clean=false should be appended (a friend starting off in  
Solr was really confused by this + could not understand why it did a  
delete on all documents).


Im not sure if that is clearly stated in the Wiki ...

- Jon

On Mar 13, 2009, at 1:34 AM, Shalin Shekhar Mangar wrote:

On Fri, Mar 13, 2009 at 10:44 AM, Fergus McMenemie  
fer...@twig.me.ukwrote:



If my data-config.xml contains multiple root level entities
what is the expected action if I call full-import without an
entity=XXX sub-command?

Does it process all entities one after the other or only the
first? (It would be useful IMHO if it only did the first.)



It processes all entities one after the other. If you want to import  
only

one, use the entity parameter.

--
Regards,
Shalin Shekhar Mangar.




Re: rsync snappuller slowdown Qtime

2009-03-13 Thread Yonik Seeley
On Fri, Mar 13, 2009 at 10:33 AM, sunnyfr johanna...@gmail.com wrote:
 And it was obvious how during snappuller on a slave server, the query time
 was a lot longer than the rest of the time.

Did the CPU utilization drop?

It could be writing of the new files being pulled forcing parts of the
current index files out of OS cache.
iostat could also help look how much data is actually read from
disk for a certain number of queries with and without a snappull going
on.

-Yonik
http://www.lucidimagination.com


Re: Tomcat holding deleted snapshots until it's restarted - SOLVED!!!

2009-03-13 Thread Marc Sturlese

Hey Yonik,
I tested the last nightly build and still happens... but I have solved it! I
tell you my solution, it seems to be working well but just want to be sure
that it doesn't have any bad effects as for me this is one of the most
complicated parts of the Solr source (the fact of dealing with multiple
indexsearchers in a syncronized way).
I noticed that in the SolrCore.java, there's a part in the function
getSearcher where there is a comment saying:

// we are all done with the old searcher we used
// for warming...

And after that the code is:
if (currSearcherHolderF!=null) currSearcherHolderF.decref();

The problem here is that this old SolrIndexSearcher is never closed and
never removed from _searchers
What I have done:

if (currSearcherHolderF!=null){

currSearcherHolderF.get().close(); //close SolrIndexSearcher proper
currSearcherHolderF.decref();
_searchers.remove(); //remove the
}

Doing that... if I do a lsof | grep tomcat will see that tomcat is not
holding deleted files anymore (as indexsearcher was proper close) and the
_searchers var will not accumulate infinite references...
It sorts the problem in the stats screen aswell... after 5 full-imports it
just shows one IndexSearcher
What do you think? 
-- 
View this message in context: 
http://www.nabble.com/Tomcat-holding-deleted-snapshots-until-it%27s-restarted---SOLVED%21%21%21-tp22451252p22500372.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Tomcat holding deleted snapshots until it's restarted - SOLVED!!!

2009-03-13 Thread Yonik Seeley
decref() decrements the reference count and closes the searcher when
it reaches 0 (no more users).
Forcing it to close at the point you did is unsafe since other threads
may still be using that searcher.
The real issue lies somewhere else - either a stuck thread, or some
code that is not decrementing the reference when it's done.  It's most
likely the latter.

We need to get to the root cause.  Can you open a JIRA bug for this?

-Yonik
http://www.lucidimagination.com

On Fri, Mar 13, 2009 at 12:39 PM, Marc Sturlese marc.sturl...@gmail.com wrote:

 Hey Yonik,
 I tested the last nightly build and still happens... but I have solved it! I
 tell you my solution, it seems to be working well but just want to be sure
 that it doesn't have any bad effects as for me this is one of the most
 complicated parts of the Solr source (the fact of dealing with multiple
 indexsearchers in a syncronized way).
 I noticed that in the SolrCore.java, there's a part in the function
 getSearcher where there is a comment saying:

 // we are all done with the old searcher we used
 // for warming...

 And after that the code is:
 if (currSearcherHolderF!=null) currSearcherHolderF.decref();

 The problem here is that this old SolrIndexSearcher is never closed and
 never removed from _searchers
 What I have done:

 if (currSearcherHolderF!=null){

        currSearcherHolderF.get().close(); //close SolrIndexSearcher proper
        currSearcherHolderF.decref();
        _searchers.remove(); //remove the
 }

 Doing that... if I do a lsof | grep tomcat will see that tomcat is not
 holding deleted files anymore (as indexsearcher was proper close) and the
 _searchers var will not accumulate infinite references...
 It sorts the problem in the stats screen aswell... after 5 full-imports it
 just shows one IndexSearcher
 What do you think?


Re: Tomcat holding deleted snapshots until it's restarted - SOLVED!!!

2009-03-13 Thread Yonik Seeley
On Fri, Mar 13, 2009 at 1:00 PM, Marc Sturlese marc.sturl...@gmail.com wrote:

 Ok, I will open a bug issue now.

 Forcing it to close at the point you did is unsafe since other threads
 may still be using that searcher.

 Can you give me an example where other threads would be using that searcher?

Any searches that started before the new searcher was registered will
still be using the old searcher.
  Thread A starts executing a search request with Searcher1
  Thread B issues a commit
  - close the writer
  - open Searcher2
  - register Searcher2 (and decrement Searcher1 ref count)
  Thread B finishes
  Thread A finishes (decrement Searcher1 ref count)

-Yonik
http://www.lucidimagination.com


Re: DIH with outer joins

2009-03-13 Thread Rui António da Cruz Pereira
The two entities resolves the problem, but adds some overhead (the 
queries can be really big). The views doesn't work for me, as the 
queries are dynamically generated, taken in consideration a determinate 
topology.


Noble Paul നോബിള്‍ नोब्ळ् wrote:

have one root entity which just does a select id from Table1  
.Then have a child entiy which does all the joins and return all other
columns for that 'id'.

On Fri, Mar 13, 2009 at 5:10 PM, Rui António da Cruz Pereira
ruipereira...@gmail.com wrote:
  

I thought that I could remove the uniqueKey in Solr and then have more that
one document with the same id, but then I don't know if in delta-imports the
documents outdated or deleted are updated (updated document is added and
then we would have the outdated and the updated document in the index) or
removed.

Noble Paul നോബിള്‍ नोब्ळ् wrote:


it is not very clear to me on how it works

probably you can put in the queries here.

you can do all the joins in the db in one complex query and use that
straightaway in an entity. You do not have to do any joins inside DIH
itself

On Fri, Mar 13, 2009 at 4:47 PM, Rui António da Cruz Pereira
ruipereira...@gmail.com wrote:

  

I have queries with outer joins defined in some entities and for the same
root object I can have two or more lines with different objects, for
example:

Taking the following 3 tables, andquery defined in the entity with outer
joins between tables:
Table1 - Table2 - Table3

I can have the following lines returned by the query:
Table1Instance1 - Table2Instance1 - Table3Instance1
Table1Instance1 - Table2Instance1 - Table3Instance2
Table1Instance1 - Table2Instance2 - Table3Instance3
Table1Instance2 - Table2Instance3 - Table3Instance4

I wanted to have a single document per root object instance (in this case
per Table1 instance) but with the values from the different lines
returned.

Is it possible to have this behavior in DataImportHandler? How?

Thanks in advance,
 Rui Pereira






  





  




Wildcard query search

2009-03-13 Thread Narayanan, Karthikeyan
Hi,
   I am trying to perform wildcard search using q query.  The query results 
are returned.  After getting the results, I trying to get the highlighting 
using ressponse.getHighlighting().
It returns empty list. But It works fine for non-wildcard searches.  Any ideas 
please?.

Thanks.
  
Karthik




Re: Wildcard query search

2009-03-13 Thread Erick Erickson
Fragments from the user list (search it for the full context, I don't
have the URL for the searchable user list handy, but it's on the Wiki)


**original post
Hi,

i'm using solr 1.3.0 and SolrJ for my java application

I need to highlight my query words even if I use wildcards

for example
q=tele*

i need to highlight words as television, telephone, etc

I found this thread
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200704.mbox/%3cof8c6e2423.f20baa06-onc12572c6.003fc377-c12572c6.00427...@ibs.se%3e

but i have not understood ho to solve my problem

could anyone tell me how to solve the problem with SolrJ  and with solr
web (by url)?

thanks in advance,
  Revenge

 reply*
To do it now, you'd have to switch the query parser to using the old style
wildcard (and/or prefix) query, which is slower on large indexes and has max
clause issues.

I think I can make it work out of the box for the next release again though.
see https://issues.apache.org/jira/browse/SOLR-825

On Fri, Mar 13, 2009 at 3:06 PM, Narayanan, Karthikeyan 
karthikeyan.naraya...@gs.com wrote:

 Hi,
   I am trying to perform wildcard search using q query.  The query
 results are returned.  After getting the results, I trying to get the
 highlighting using ressponse.getHighlighting().
 It returns empty list. But It works fine for non-wildcard searches.  Any
 ideas please?.

 Thanks.

 Karthik





Custom handler that forwards a request to another core

2009-03-13 Thread Pascal Dimassimo

Hi,

I'm writing a custom handler that forwards a request to a handler of another
core. The custom handler is defined in core0 and the core I try to send
the request to is core2 which has a mlt handler. Here is the code of my
custom handler (extends RequestHandlerBase and implements SolrCoreAware):

public void inform(SolrCore core) {
this.core = core;
this.cores = core.getCoreDescriptor().getCoreContainer();
this.multiCoreHandler = cores.getMultiCoreHandler();
}

public void handleRequestBody(SolrQueryRequest request, SolrQueryResponse
response)
throws Exception {

SolrCore coreToRequest = cores.getCore(core2);

ModifiableSolrParams params = new ModifiableSolrParams();
params.set(q, Lucene);
params.set(mlt.fl, body);
params.set(debugQuery, true);

request = new LocalSolrQueryRequest(coreToRequest, params);

SolrRequestHandler mlt = coreToRequest.getRequestHandler(/mlt);
coreToRequest.execute(mlt, request, response);

coreToRequest.close();
}


I'm calling this handler from firefox with this url (the path of my custom
handler is /nlt):
http://localhost:8080/solr/core0/nlt

With my debugger, I can see, after the execute() method is executed, this
line in the log:
13-Mar-2009 4:25:59 PM org.apache.solr.core.SolrCore execute
INFO: [core2] webapp=/solr path=/nlt params={} webapp=null path=null
params={q=Lucenemlt.fl=bodydebugQuery=true} status=0 QTime=125

Which seems logical: the core2 is executing the request (though I'm
wondering how core2 knows about the /nlt path)

After, I let the debugger resume the program and I see those lines:
13-Mar-2009 4:25:59 PM org.apache.solr.core.SolrCore execute
INFO: [core0] webapp=/solr path=/nlt params={} webapp=null path=null
params={q=Lucenemlt.fl=bodydebugQuery=true} status=0 QTime=125 status=0
QTime=141 
13-Mar-2009 4:25:59 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.ArrayIndexOutOfBoundsException: -1
at
org.apache.lucene.index.MultiSegmentReader.document(MultiSegmentReader.java:259)
at
org.apache.lucene.index.IndexReader.document(IndexReader.java:632)
at
org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:371)
at org.apache.solr.request.XMLWriter$3.writeDocs(XMLWriter.java:479)

It looks like core0 is also trying to handle the request. With the debugger,
I discover that the code is trying to access a document with an id of the
index of core2 within the index of core0, which fails
(SolrIndexSearcher.java:371).

Any idea with there seems to be two cores that try to handle the request?


-- 
View this message in context: 
http://www.nabble.com/Custom-handler-that-forwards-a-request-to-another-core-tp22501470p22501470.html
Sent from the Solr - User mailing list archive at Nabble.com.



Commit is taking very long time

2009-03-13 Thread mahendra mahendra
Hello,
 
I am experiencing strange problems while doing commit. I am doing indexing for 
every 10 min to update index with data base values. commit is taking 7 to 10 
min approximately and my indexing is failing due to null pointer exception. If 
first thread is not completed in 10 min the second thread will be starting to 
index data.
I changed wait=false for the listener from solrconfig.xml file. It stopped 
getting Null pointer exception but the commit is taking 7 to 10 min. I have 
approximately 70 to 90 kb of data every time.
   listener event=postCommit class=solr.RunExecutableListener
  str name=exesolr/bin/snapshooter/str
  str name=dir./str
  bool name=waitfalse/bool
  arr name=args strarg1/str strarg2/str /arr
  arr name=env strMYVAR=val1/str /arr
    /listener 
I kept all default parameter values in solrconfig.xml except the ramBuffersize 
to 512.
Could you please tell me how can I overcome these problems, also some times I 
see INFO: Failed to unregister mbean: partitioned because it was not registered
Mar 13, 2009 11:49:16 AM org.apache.solr.core.JmxMonitoredMap unregister in my 
log files.

Log file

ar 13, 2009 1:28:40 PM org.apache.solr.core.SolrCore execute
INFO: [EnglishAuction1-0] webapp=/solr path=/update 
params={wt=javabinwaitFlush=truecommit=truewaitSearcher=trueversion=2.2} 
status=0 QTime=247232 
Mar 13, 2009 1:30:32 PM org.apache.solr.update.processor.LogUpdateProcessor 
finish
INFO: {add=[79827482, 79845504, 79850902, 79850913, 79850697, 79850833, 
79850901, 79798207, ...(93 more)]} 0 62578
Mar 13, 2009 1:30:32 PM org.apache.solr.core.SolrCore execute
INFO: [EnglishAuction1-0] webapp=/solr path=/update 
params={wt=javabinversion=2.2} status=0 QTime=62578 
Mar 13, 2009 1:30:32 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher init
INFO: Opening searc...@1ba5edf main
Mar 13, 2009 1:34:38 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@1ba5edf main from searc...@81f25 main
 filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@1ba5edf main
 filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@1ba5edf main from searc...@81f25 main
 queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=3,evictions=0,size=3,warmupTime=63,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@1ba5edf main
 queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=3,evictions=0,size=3,warmupTime=94,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@1ba5edf main from searc...@81f25 main
 documentCache{lookups=0,hits=0,hitratio=0.00,inserts=20,evictions=0,size=20,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@1ba5edf main
 documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Mar 13, 2009 1:34:38 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to searc...@1ba5edf main
Mar 13, 2009 1:34:38 PM org.apache.solr.core.SolrCore execute
INFO: [EnglishAuction1-0] webapp=null path=null params={rows=10start=0q=solr} 
hits=0 status=0 QTime=0 
Mar 13, 2009 1:34:38 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
Mar 13, 2009 1:34:38 PM org.apache.solr.core.SolrCore execute
INFO: [EnglishAuction1-0] webapp=null path=null 
params={rows=10start=0q=rocks} hits=223 status=0 QTime=0 
Mar 13, 2009 1:34:38 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
Mar 13, 2009 1:34:38 PM org.apache.solr.core.SolrCore execute
INFO: [EnglishAuction1-0] webapp=null path=null 
params={q=static+newSearcher+warming+query+from+solrconfig.xml} hits=4297 
status=0 QTime=0 
Mar 13, 2009 1:34:38 PM 

Re: Commit is taking very long time

2009-03-13 Thread Yonik Seeley
From your logs, it looks like the time is spent in closing of the index.
There may be some pending deletes buffered, but they shouldn't take too long.
There could also be a merge triggered... but this would only happen
sometimes, not every time you commit.

One more relatively recent change in Lucene is to sync the index files
for safety.
Are you perhaps running on Linux with the ext3 filesystem?

Not sure what's causing the null pointer exception... do you have a stack trace?

-Yonik
http://www.lucidimagination.com


On Fri, Mar 13, 2009 at 9:05 PM, mahendra mahendra
mahendra_featu...@yahoo.com wrote:
 Hello,

 I am experiencing strange problems while doing commit. I am doing indexing 
 for every 10 min to update index with data base values. commit is taking 7 to 
 10 min approximately and my indexing is failing due to null pointer 
 exception. If first thread is not completed in 10 min the second thread will 
 be starting to index data.
 I changed wait=false for the listener from solrconfig.xml file. It stopped 
 getting Null pointer exception but the commit is taking 7 to 10 min. I have 
 approximately 70 to 90 kb of data every time.
    listener event=postCommit class=solr.RunExecutableListener
   str name=exesolr/bin/snapshooter/str
   str name=dir./str
   bool name=waitfalse/bool
   arr name=args strarg1/str strarg2/str /arr
   arr name=env strMYVAR=val1/str /arr
     /listener
 I kept all default parameter values in solrconfig.xml except the 
 ramBuffersize to 512.
 Could you please tell me how can I overcome these problems, also some times I 
 see INFO: Failed to unregister mbean: partitioned because it was not 
 registered
 Mar 13, 2009 11:49:16 AM org.apache.solr.core.JmxMonitoredMap unregister in 
 my log files.

 Log file

 ar 13, 2009 1:28:40 PM org.apache.solr.core.SolrCore execute
 INFO: [EnglishAuction1-0] webapp=/solr path=/update 
 params={wt=javabinwaitFlush=truecommit=truewaitSearcher=trueversion=2.2} 
 status=0 QTime=247232
 Mar 13, 2009 1:30:32 PM org.apache.solr.update.processor.LogUpdateProcessor 
 finish
 INFO: {add=[79827482, 79845504, 79850902, 79850913, 79850697, 79850833, 
 79850901, 79798207, ...(93 more)]} 0 62578
 Mar 13, 2009 1:30:32 PM org.apache.solr.core.SolrCore execute
 INFO: [EnglishAuction1-0] webapp=/solr path=/update 
 params={wt=javabinversion=2.2} status=0 QTime=62578
 Mar 13, 2009 1:30:32 PM org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
 Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher init
 INFO: Opening searc...@1ba5edf main
 Mar 13, 2009 1:34:38 PM org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: end_commit_flush
 Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm
 INFO: autowarming searc...@1ba5edf main from searc...@81f25 main
  filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
 Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm
 INFO: autowarming result for searc...@1ba5edf main
  filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
 Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm
 INFO: autowarming searc...@1ba5edf main from searc...@81f25 main
  queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=3,evictions=0,size=3,warmupTime=63,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
 Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm
 INFO: autowarming result for searc...@1ba5edf main
  queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=3,evictions=0,size=3,warmupTime=94,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
 Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm
 INFO: autowarming searc...@1ba5edf main from searc...@81f25 main
  documentCache{lookups=0,hits=0,hitratio=0.00,inserts=20,evictions=0,size=20,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
 Mar 13, 2009 1:34:38 PM org.apache.solr.search.SolrIndexSearcher warm
 INFO: autowarming result for searc...@1ba5edf main
  documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
 Mar 13, 2009 1:34:38 PM org.apache.solr.core.QuerySenderListener newSearcher
 INFO: QuerySenderListener sending requests to searc...@1ba5edf main
 Mar 13, 2009 1:34:38 PM org.apache.solr.core.SolrCore execute
 INFO: [EnglishAuction1-0] webapp=null path=null 
 params={rows=10start=0q=solr} hits=0 

Re: DIH with outer joins

2009-03-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
joining entities may have some overhead. Is it prohibitive in absolute terms?

On Sat, Mar 14, 2009 at 12:29 AM, Rui António da Cruz Pereira
ruipereira...@gmail.com wrote:
 The two entities resolves the problem, but adds some overhead (the queries
 can be really big). The views doesn't work for me, as the queries are
 dynamically generated, taken in consideration a determinate topology.

 Noble Paul നോബിള്‍ नोब्ळ् wrote:

 have one root entity which just does a select id from Table1  
 .Then have a child entiy which does all the joins and return all other
 columns for that 'id'.

 On Fri, Mar 13, 2009 at 5:10 PM, Rui António da Cruz Pereira
 ruipereira...@gmail.com wrote:


 I thought that I could remove the uniqueKey in Solr and then have more
 that
 one document with the same id, but then I don't know if in delta-imports
 the
 documents outdated or deleted are updated (updated document is added and
 then we would have the outdated and the updated document in the index) or
 removed.

 Noble Paul നോബിള്‍ नोब्ळ् wrote:


 it is not very clear to me on how it works

 probably you can put in the queries here.

 you can do all the joins in the db in one complex query and use that
 straightaway in an entity. You do not have to do any joins inside DIH
 itself

 On Fri, Mar 13, 2009 at 4:47 PM, Rui António da Cruz Pereira
 ruipereira...@gmail.com wrote:



 I have queries with outer joins defined in some entities and for the
 same
 root object I can have two or more lines with different objects, for
 example:

 Taking the following 3 tables, andquery defined in the entity with
 outer
 joins between tables:
 Table1 - Table2 - Table3

 I can have the following lines returned by the query:
 Table1Instance1 - Table2Instance1 - Table3Instance1
 Table1Instance1 - Table2Instance1 - Table3Instance2
 Table1Instance1 - Table2Instance2 - Table3Instance3
 Table1Instance2 - Table2Instance3 - Table3Instance4

 I wanted to have a single document per root object instance (in this
 case
 per Table1 instance) but with the values from the different lines
 returned.

 Is it possible to have this behavior in DataImportHandler? How?

 Thanks in advance,
  Rui Pereira

















-- 
--Noble Paul


Re: DataImportHandler Robustness For Imports That Take A Long Time

2009-03-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
alternately you can do the commit yourself after marking in the db
. Context#getSolrCore().getUpdateHandler().commit()

or as you mentioned you can do an autocommit

On Sat, Mar 14, 2009 at 12:31 AM, Chris Harris rygu...@gmail.com wrote:
 Wouldn't this approach get confused if there was an error that caused
 DIH to do a rollback? For example, suppose this happened:

 * 1000 successful document adds
 * The custom transformer saves some marker in the DB to signal that
 the above docs have been successfully indexed
 * The next document add throws an exception
 * DIH, rather than doing a commit, rolls back the 1000 document adds

 At this point my database marker says that the 1000 docs have been
 successfully indexed, but the documents themselves are not actually in
 the Solr index. Because by hypothesis my import query is defined in
 terms of my DB marker, I'll never end up getting these docs into the
 Solr index, even if I resolve the issue that causes the exception and
 re-run the data import.

 It seems like, to do a safe equivalent of your suggestion, I'd have to
 somehow A) prevent DIH from doing any rollbacks, B) get DIH to do
 auto-commits, and C) make my custom transformer update the DB marker
 only immediately after an auto-commit.

 On Mon, Mar 9, 2009 at 9:27 PM, Noble Paul നോബിള്‍  नोब्ळ्
 noble.p...@gmail.com wrote:
 I recommend writing a simple transformer which can write an entry
 into db after n documents (say 1000). and modify your query to take to
 consider that entry so that subsequent imports will start from there.

 DIH does not write the last_index_time unless the import completes 
 successfully.

 On Tue, Mar 10, 2009 at 1:54 AM, Chris Harris rygu...@gmail.com wrote:
 I have a dataset (7M-ish docs each of which is maybe 1-100K) that,
 with my current indexing process, takes a few days or maybe a week to
 put into Solr.  I'm considering maybe switching to indexing with the
 DataImportHandler, but I'm concerned about the impact of this on
 indexing robustness:

 If I understand DIH properly, then if Solr goes down for whatever
 reason during an import, then DIH loses track of what it has and
 hasn't yet indexed that round, and will thus probably do a lot of
 redundant reimporting the next time you run an import command. (For
 example, if DIH successfully imports row id 100, and then Solr dies
 before the DIH import finishes, and then I restart Solr and start a
 new delta-import, then I think DIH will import row id 100 again.) One
 implication for my dataset seems to be that, unless Solr can actually
 stay up for several days on end, then DIH will never finish importing
 my data, even if I manage to keep Solr at, say, 99% uptime. This would
 be fine if a full import took only a few hours. If full import could
 take a week, though, this is slightly unnerving. (Sometimes you just
 need to restart Solr. Or the machine itself, for that matter.)

 Are there any good ways around this with DIH? One potential option is
 to give each row in the database table not only a
 ModificationTimestamp column but also a DataImportHandlerTimestamp
 column, and try to get DIH to update that column whenever it finishes
 indexing a row. Then you'd modify the WHERE clause in the DIH config
 so that instead of determining which rows to index with something like

  WHERE ModificationTimestamp  dataimporter.last_index_time

 you'd use something like

  WHERE ModificationTimestamp  SolrImportTimestamp

 In this way, hopefully, DIH can always pick up where it left off last time,
 rather than trying to redo any work it might have actually managed
 to do last round.

 (I'm using something along these lines with my current, non-DIH-based
 indexing scheme.)

 Am I making sense here?

 Chris




 --
 --Noble Paul





-- 
--Noble Paul


Re: what crawler do you use for Solr indexing?

2009-03-13 Thread ristretto.rb
Hello,

I built my own crawler with Python, as I couldn't find (not
complaining, probably didn't look hard enough)
nutch documentation.  I use BeautifulSoup, because the site is mostly
based on Python/Django, and we like
Python.

Writing one was good for us because we spent most of out time figuring
out what to write ... how to fetch
pages, which to choose, what data to store etc.  It was an awesome
exercise that really narrowed the
definition of our project.  It helped us define our solr schema and
other parts of the project during development.
If we knew exactly what sort of data to crawl, and exactly what we
intended to save, I'm sure we would have pushed
harder at figuring out nutch.

If I was to refactor, I would give Heririx and Nutch good looks now.

cheers
gene

Gene Campbell
http:www.picante.co.nz
gene at picante point co point nz

http://www.travelbeen.com - the social search engine for travel



On Tue, Mar 10, 2009 at 11:14 PM, Andrzej Bialecki a...@getopt.org wrote:
 Sean Timm wrote:

 We too use Heritrix. We tried Nutch first but Nutch was not finding all
 of the documents that it was supposed to. When Nutch and Heritrix were
 both set to crawl our own site to a depth of three, Nutch missed some
 pages that were linked directly from the seed. We ended up with 10%-20%
 fewer pages in the Nutch crawl.

 FWIW, from a private conversation with Sean it seems that this was likely
 related to the default configuration in Nutch, which collects only the first
 1000 outlinks from a page. This is an arbitrary and configurable limit,
 introduced as a way to limit the impact of spam pages and to limit the size
 of LinkDb. If a page hits this limit then indeed the symptoms that you
 observe are missing (dropped) links.



 --
 Best regards,
 Andrzej Bialecki     
  ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com




Re: unique result

2009-03-13 Thread ristretto.rb
FWIW...  We run a hash or the content and other bits of our docs, and
then remove duplicates according to specific algorithms.  (exactly the
same page content can clearly be hosted on many different urls but,
and domains)  Then, the choosen ones are indexed.  Though we toss the
synonyms in the index too, so we know all it's other names.

cheers
gene

Gene Campbell
http:www.picante.co.nz
gene at picante point co point nz

http://www.travelbeen.com - the social search engine for travel

On Fri, Feb 27, 2009 at 5:53 AM, Cheng Zhang zhangyongji...@yahoo.com wrote:
 It's exactly what I'm looking for. Thank you Grant.


 - Original Message 
 From: Grant Ingersoll gsing...@apache.org
 To: solr-user@lucene.apache.org
 Sent: Thursday, February 26, 2009 6:56:22 AM
 Subject: Re: unique result

 I presume these all have different unique ids?

 If you can address it at indexing time, then have a look at 
 https://issues.apache.org/jira/browse/SOLR-799

 Otherwise, you might look at https://issues.apache.org/jira/browse/SOLR-236


 On Feb 25, 2009, at 6:54 PM, Cheng Zhang wrote:

 Is it possible to have Solr to remove duplicated query results?

 For example, instead of return

 result name=response numFound=572 start=0
 doc  str name=productGroup_t_i_s_nmWireless/str /doc
 doc  str name=productGroup_t_i_s_nmWireless/str /doc
 doc  str name=productGroup_t_i_s_nmWireless/str /doc
 doc  str name=productGroup_t_i_s_nmVideo Games/str /doc
 doc  str name=productGroup_t_i_s_nmVideo Games/str /doc
 /result

 return:
  result name=response numFound=572 start=0
   doc  str name=productGroup_t_i_s_nmWireless/str /doc
   doc  str name=productGroup_t_i_s_nmVideo Games/str /doc
  /result

 Thanks a lot,
 Kevin


 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
 using Solr/Lucene:
 http://www.lucidimagination.com/search



com.ctc.wstx.exc.WstxLazyException exception while passing the text content of a word doc to SOLR

2009-03-13 Thread Suryasnat Das
Hi,

I am using Apache POI parser to parse a Word Doc and extract the text
content. Then i am passing the text content to SOLR. The Word document has
many pictures, graphs and tables. But when i am passing the content to SOLR,
it fails. Here is the exception trace.

09:31:04,516 ERROR [STDERR] Mar 14, 2009 9:31:04 AM
org.apache.solr.common.SolrException log
SEVERE: [com.ctc.wstx.exc.WstxLazyException]
com.ctc.wstx.exc.WstxParsingException: Illegal charact
er entity: expansion character (code 0x7) not a valid XML character
 at [row,col {unknown-source}]: [40,18]
at
com.ctc.wstx.exc.WstxLazyException.throwLazily(WstxLazyException.java:45)
at
com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:729)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3659)
at
com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at
org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:327
)
at
org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.ja
va:195)
at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandle
r.java:123)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.
java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206
)
at
org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.
java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206
)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:235)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.j
ava:190)
at
org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:92)
at
org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.process(SecurityContextE
stablishmentValve.java:126)
at
org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.invoke(SecurityContextEs
tablishmentValve.java:70)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java
:158)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:330)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:828)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.j
ava:601)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:595).

Another error trace relating to POI is also throwing up:

09:31:04,828 ERROR [STDERR] java.io.IOException: Unable to read entire
header; 130 bytes read; expe
cted 512 bytes
09:31:04,828 ERROR [STDERR] at
org.apache.poi.poifs.storage.HeaderBlockReader.alertShortRead(He
aderBlockReader.java:130)
09:31:04,843 ERROR [STDERR] at
org.apache.poi.poifs.storage.HeaderBlockReader.init(HeaderBloc
kReader.java:94)
09:31:04,843 ERROR [STDERR] at
org.apache.poi.poifs.filesystem.POIFSFileSystem.init(POIFSFile
System.java:151)
09:31:04,843 ERROR [STDERR] at
org.apache.poi.hwpf.HWPFDocument.verifyAndBuildPOIFS(HWPFDocumen
t.java:133)
09:31:04,843 ERROR [STDERR] at
org.apache.poi.hwpf.extractor.WordExtractor.init(WordExtractor
.java:51)
09:31:04,859 ERROR [STDERR] at
com.apple.servlet.SearchApplicationServlet.parseWordFile(SearchA
pplicationServlet.java:963)
09:31:04,859 ERROR [STDERR] at
com.apple.servlet.SearchApplicationServlet.indexDirectory(Search
ApplicationServlet.java:813)
09:31:04,859 ERROR [STDERR] at
com.apple.servlet.SearchApplicationServlet.index(SearchApplicati
onServlet.java:747)
09:31:04,859 ERROR [STDERR] at
com.apple.servlet.SearchApplicationServlet.processAdd(SearchAppl
icationServlet.java:331)
09:31:04,874 ERROR [STDERR] at
com.apple.servlet.SearchApplicationServlet.doGet(SearchApplicati
onServlet.java:160)
09:31:04,874 ERROR [STDERR] at

Re: Solr: ERRORs at Startup

2009-03-13 Thread Chris Hostetter

: Even setting everything to INFO through
: http://localhost:8080/solr/admin/logging didn't help.
: 
: But considering you do not see any bad issue here, at this time I will
: ignore those ERROR messages :-)

i would read up more on how to configure logging in JBoss.

as far as i can tell, Solr is logging messages, which are getting handled 
by a logger that writes them to STDERR using a fairly standard format 
(date, class, method, level, msg) ... except some other piece of code 
seems to be reading from STDERR, and assuming anything that got written 
there is an ERROR, so it's loging those writes to stderr using a format 
with a date, a level (of ERROR), and a group or some other identifier of 
STDERR

the problem is if you ignore them completely, you're going to miss 
noticing when you really have a problem.

Like i said: figure out how to configure logging in JBoss, you might need 
to change the slf4j adapater jar or something if it can't deal with JUL 
(which is the default).

:  10:51:20,525 INFO  [TomcatDeployment] deploy, ctxPath=/solr
:  10:51:20,617 ERROR [STDERR] Mar 13, 2009 10:51:20 AM
:  org.apache.solr.servlet.SolrDispatchFilter init
:  INFO: SolrDispatchFilter.init()



-Hoss