date:20090910

the dataDir is a Solr1.4 feature

On Thu, Sep 10, 2009 at 1:57 AM, Paul Rosen p...@performantsoftware.com wrote:
 Hi All,

 I'm trying to set up solr 1.3 to use multicore but I'm getting some puzzling
 results. My solr.xml file is:

 ?xml version=1.0 encoding=UTF-8?
 solr persistent=true sharedLib=../lib
  cores adminPath=/admin/cores
  core name=resources instanceDir=resources
 dataDir=solr/resources/data/ /
  core name=exhibits instanceDir=exhibits dataDir=solr/exhibits/data/
 /
  core name=reindex_resources instanceDir=reindex_resources
 dataDir=solr/reindex_resources/data/ /
  /cores
 /solr

 When I start up solr, everything looks normal until I get this line in the
 log:

 INFO: [resources] Opening new SolrCore at solr/resources/,
 dataDir=./solr/data/

 And a new folder is created ./solr/data/index with a blank index. And, of
 course, any queries go to that blank index and not to one of my cores.

 Actually, what I'd really like is to have my directory structure look like
 this (some items removed for brevity):

 -
 solr_1.3
    lib
    solr
        solr.xml
        bin
        conf
        data
            resources
                index
            exhibits
                index
            reindex_resources
                index
 start.jar
 -

 And have all the cores share everything except an index.

 How would I set that up?

 Are there differences between 1.3 and 1.4 in this respect?

 Thanks,
 Paul




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: OutOfMemory error on solr 1.3

2009-09-10 Thread Constantijn Visinescu

Just wondering, how much memory are you giving your JVM ?

On Thu, Sep 10, 2009 at 7:46 AM, Francis Yakin fya...@liquid.com wrote:


 I am having OutOfMemory error on our slaves server, I would like to know if
 someone has the same issue and have the solution for this.

 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@96cd2ffc:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
 SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size:
 441216, Num elements: 55150
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@519116e0:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@74dc52fa:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@d0dd3e28:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@b6dfa5bc:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@482b13ef:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@2309438c:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@277bd48c:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
 Exception in thread [ACTIVE] ExecuteThread: '7' for queue:
 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
 Exception in thread [ACTIVE] ExecuteThread: '8' for queue:
 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
 Exception in thread [ACTIVE] ExecuteThread: '10' for queue:
 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
 Exception in thread [ACTIVE] ExecuteThread: '11' for queue:
 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@41405463:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 751552, Num elements: 187884
  java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208,
 Num elements: 8192
 java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208,
 Num elements: 8192
 java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5096,
 Num elements: 2539
 java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5400,
 Num elements: 2690
 Sep 7, 2009 7:22:50 PM GMT Warning DeploymentService BEA-290064
 Deployment service servlet encountered an Exception while handling the
 deployment service message for request id -1 from server AdminServer.
 Exception is: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
 size: 4368, Num elements: 2174
 SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size:
 14140768, Num elements: 3535188
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@8dbcc7ab:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
 java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5320,
 Num elements: 2649
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@4d0c6fc5:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 751560, Num elements: 187885
 java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 16400,
 Num elements: 8192
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@fb6bac19:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 14140904, Num elements: 3535222
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@536d7b1b:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 14140904, Num elements: 3535222
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@6a1ef00a:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 751864, Num elements: 187961
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@298f2d9c:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 5398568, Num elements: 1349637
 SEVERE:

Re: about SOLR-1395 integration with katta

2009-09-10 Thread Zhenyu Zhong

Jason,

Thanks for the reply.

In general, I would like to use katta to handle the management overhead such
as single point of failure as well as the distributed index deployment. In
the same time, I still want to use nice search features provided by solr.

Basically, I would like to try both on the indexing part
1. Using Hadoop to lauch MR jobs to build index. Then deploy the index to
katta
2. Using the new patch SOLR-1935
Based on my understandings, it seems to support index building with
Hadoop. I assume the index would have all the necessary information such as
solr index schema so that I can still use the nice search features provided
by solr.

On the search part,
I would like to try the distributed search on solr-index which is deployed
on katta if that is possible.

I would be very appreciated if you could share some thoughts with me.

thanks
zhong



On Wed, Sep 9, 2009 at 6:06 PM, Jason Rutherglen jason.rutherg...@gmail.com
 wrote:

 Hi Zhong,

 It's a very new patch. I'll update the issue as we start the
 wiki page.

 I've been working on indexing in Hadoop in conjunction with
 Katta, which is different (it sounds) than your use case where
 you have prebuilt indexes you simply want to distributed using
 Katta?

 -J

 On Wed, Sep 9, 2009 at 12:33 PM, Zhenyu Zhong zhongresea...@gmail.com
 wrote:
  Hi,
 
  It is really exciting to see this integration coming out.
  May I ask how I need to make changes to be able to deploy Solr index on
  katta servers?
  Are there any tutorials?
 
  thanks
  zhong

RE: OutOfMemory error on solr 1.3

2009-09-10 Thread Francis Yakin

 Xms is 1.5Gb  , Xnx is 1.5Gb and Xns is 128Mb. Physical memory  is 4Gb.

We are running Jrockit version 1.5.0_15 on weblogic 10.

./java -version
java version 1.5.0_15
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_15-b04)
BEA JRockit(R) (build R27.6.0-50_o-100423-1.5.0_15-20080626-2104-linux-x86_64, 
compiled mode)

4 S root  7532  7487  8  75   0 - 804721 184466 05:10 ?   00:07:18 
/opt/bea/jrmc-3.0.3-1.5.0/bin/java -Xms1536m -Xmx1536m -Xns:128m -Xgc:gencon 
-Djavelin.jsp.el.elcache=4096 
-Dsolr.solr.home=/opt/apache-solr-1.3.0/example/solr

Francis

-Original Message-
From: Constantijn Visinescu [mailto:baeli...@gmail.com]
Sent: Wednesday, September 09, 2009 11:35 PM
To: solr-user@lucene.apache.org
Subject: Re: OutOfMemory error on solr 1.3

Just wondering, how much memory are you giving your JVM ?

On Thu, Sep 10, 2009 at 7:46 AM, Francis Yakin fya...@liquid.com wrote:


 I am having OutOfMemory error on our slaves server, I would like to know if
 someone has the same issue and have the solution for this.

 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@96cd2ffc:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
 SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size:
 441216, Num elements: 55150
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@519116e0:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@74dc52fa:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@d0dd3e28:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@b6dfa5bc:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@482b13ef:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@2309438c:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@277bd48c:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
 Exception in thread [ACTIVE] ExecuteThread: '7' for queue:
 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
 Exception in thread [ACTIVE] ExecuteThread: '8' for queue:
 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
 Exception in thread [ACTIVE] ExecuteThread: '10' for queue:
 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
 Exception in thread [ACTIVE] ExecuteThread: '11' for queue:
 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@41405463:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 751552, Num elements: 187884
  java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208,
 Num elements: 8192
 java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208,
 Num elements: 8192
 java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5096,
 Num elements: 2539
 java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5400,
 Num elements: 2690
 Sep 7, 2009 7:22:50 PM GMT Warning DeploymentService BEA-290064
 Deployment service servlet encountered an Exception while handling the
 deployment service message for request id -1 from server AdminServer.
 Exception is: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
 size: 4368, Num elements: 2174
 SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size:
 14140768, Num elements: 3535188
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@8dbcc7ab:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
 java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5320,
 Num elements: 2649
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@4d0c6fc5:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 751560, Num elements: 187885
 java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 16400,
 Num

Re: Solr fitting in travel site context?

2009-09-10 Thread Constantijn Visinescu

I'd run look into faceting and run a test.

Create a schema, index the data and then run a query for *:* facteted by
hotel to get a list of all the hotels you want followed by a query that
returns all documents matching that hotel for your 2nd usecase.

You're probably still going to want a SQL database to catch the reservations
made tho.

in my experience implementing Solr is more work then implementing a normal
SQL database, and loosing the relational part of a relational database is
something you have to wrap your head around to see how it affects your
application.

That said solr on my 4 year old single core laptop outperforms our new dual
xeon database server running IBM DB2 when it comes to running a query on a
10 million record dataset and retuning the total amount of documents that
match.

Once you get it up and running properly and you need querys that are like
give me the total number of documents that match these criteria, optionally
facted by this and that it's amazingly fast.

Note that this advantage only becomes apparent when dealing with large data
sets. anything under a coulpe 100k records (guideline, depends heavily on
the type of record) and a normal SQL server should also be able to give you
the results you need near instantly.

Hope this helps ;)


On Wed, Sep 9, 2009 at 5:33 PM, Carsten Kraus carsten.kr...@gmail.comwrote:

 Hi all,

 I'm about to develop a travel website and am wondering if Solr might fit to
 be used as the search solution.
 Being quite the opposite of a db guru and new to Solr, it's hard for me to
 judge if for my use-case a relational db should be used in favor of Solr(or
 similar indexing server). Maybe some of you guys would share their opinion
 on this?

 The products being searched for would be travel packages. That is: hotel
 room + flight combined into one product.
 I receive the products via a csv file, where each line defines a travel
 package with concrete departure/return, accommodation and price data.

 For example one csv row might represent:
 Hotel Foo in Paris, flight departing 10/10/09 from London, ending 10/20/09,
 mealplan Bar, pricing $300
 ..while another one might look like:
 Hotel Foo in Paris, flight departing 10/10/09 from Amsterdam, ending
 10/30/09, mealplan Eggs :), pricing $400

 Now searches should show results in 2 steps: first step showing results
 grouped by hotel(so no hotel appears twice) and second one all
 date-airport-mealplan combinations for the hotel selected by the user in
 step 1.

 From some first little tests, it seems to me as if I at least would need
 the
 collapse patch(SOLR-236) to be used in step 1 above?!

 What do you think? Does Solr fit into this scenario? Thoughts?

 Sorry for the lengthy post  thanks a lot for any pointer!
 Carsten

Does MoreLikeThis support sharding?

2009-09-10 Thread jlist9

Hi,

I tried MoreLikeThis (StandardRequestHandler with mlt arguments)
with a single solr server and it works fine. However, when I tried
the same query with sharded servers, I don't get the moreLikeThis
key in the results.

So my question is, Is MoreLikeThis with StandardRequestHandler
supported on shards? If not, is MoreLikeThisHandler supported?

Thanks,
Jack

Re: Misleading log messages while deploying solr

2009-09-10 Thread con


Thanks Hossman

As per my understandings and investigations, if we disable STDERR from the
jboss configs, we will not be able to see any STDERR coming from any of the
APIs - which can be real error messages. 
So if we know the exact reason why this message from solr is showing up, we
can block this at solr level or may be jboss level. 

Any suggestion which points out a reason for this or a solution that hides
these messages only is really appreciable.


thanks



hossman wrote:
 
 
 : But the log message that is getting print in the server console, in my
 case
 : jboss, is showing status as error.
 : Why is this showing as ERROR, even though things are working fine.
 
 Solr is not declaring that those messages are ERRORs, solr is just logging 
 informational messages (hence then INFO lines) using the java logging 
 framework.
 
 My guess: since the logs are getting prefixed with ERROR [STDERR] 
 something about the way your jboss container is configured is probably 
 causing those log messages to be written to STDERR, and then jboss is 
 capturing the STDERR and assuming that if it went there it mist be an 
 ERROR of some kind and logging it to the console (using it's own log 
 format, hence the touble timestamps per line message)
 
 In short: jboss is doing this in response to normal logging from solr.  
 you should investigate your options for configuriring jboss and how it 
 deals with log messages from applications.
 
 
 : 11:41:19,030 INFO  [TomcatDeployer] deploy, ctxPath=/solr,
 : warUrl=.../tmp/deploy/tmp43266solr-exp.war/
 : 11:41:19,948 ERROR [STDERR] 8 Sep, 2009 11:41:19 AM
 : org.apache.solr.servlet.SolrDispatchFilter init
 : INFO: SolrDispatchFilter.init()
 : 11:41:19,975 ERROR [STDERR] 8 Sep, 2009 11:41:19 AM
 : org.apache.solr.core.SolrResourceLoader locateInstanceDir
 : INFO: No /solr/home in JNDI
 : 11:41:19,976 ERROR [STDERR] 8 Sep, 2009 11:41:19 AM
 : org.apache.solr.core.SolrResourceLoader locateInstanceDir
 : INFO: using system property solr.solr.home: C:\app\Search
 : 11:41:19,984 ERROR [STDERR] 8 Sep, 2009 11:41:19 AM
 : org.apache.solr.core.CoreContainer$Initializer initialize
 : INFO: looking for solr.xml: C:\app\Search\solr.xml
 : 11:41:20,084 ERROR [STDERR] 8 Sep, 2009 11:41:20 AM
 : org.apache.solr.core.SolrResourceLoader init
 : INFO: Solr home set to 'C:\app\Search' 
 : 11:41:20,142 ERROR [STDERR] 8 Sep, 2009 11:41:20 AM
 : org.apache.solr.core.SolrResourceLoader createClassLoader
 : INFO: Adding
 : 'file:/C:/app/Search/lib/apache-solr-dataimporthandler-1.3.0.jar' to
 Solr
 : classloader
 : 11:41:20,144 ERROR [STDERR] 8 Sep, 2009 11:41:20 AM
 : org.apache.solr.core.SolrResourceLoader createClassLoader
 : INFO: Adding 'file:/C:/app/Search/lib/jsp-2.1/' to Solr classloader
 : 
 : ...
 : INFO: Reusing parent classloader
 : 11:41:21,870 ERROR [STDERR] 8 Sep, 2009 11:41:21 AM
 : org.apache.solr.core.SolrConfig init
 : INFO: Loaded SolrConfig: solrconfig.xml
 : 11:41:21,909 ERROR [STDERR] 8 Sep, 2009 11:41:21 AM
 : org.apache.solr.schema.IndexSchema readSchema
 : INFO: Reading Solr Schema
 : 11:41:22,092 ERROR [STDERR] 8 Sep, 2009 11:41:22 AM
 : org.apache.solr.schema.IndexSchema readSchema
 : INFO: Schema name=contacts schema
 : 11:41:22,121 ERROR [STDERR] 8 Sep, 2009 11:41:22 AM
 : org.apache.solr.util.plugin.AbstractPluginLoader load
 : INFO: created string: org.apache.solr.schema.StrField
 : 
 : .
 : -- 
 : View this message in context:
 http://www.nabble.com/Misleading-log-messages-while-deploying-solr-tp25354654p25354654.html
 : Sent from the Solr - User mailing list archive at Nabble.com.
 : 
 
 
 
 -Hoss
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Misleading-log-messages-while-deploying-solr-tp25354654p25379937.html
Sent from the Solr - User mailing list archive at Nabble.com.

Pb using delta import with XPathEntityProcessor

Hi, 
 
I'm new solR user and for the moment it suits almost all my needs :)
 
I use a fresh nightly release (09/2009) and I index a
database table using dataImportHandler.
 
I try to parse an xml content field from this table using XPathEntityProcessor
and FieldReaderDataSource. Everything works fine for the full-import.
 
But when I try to use the delta import (i need incremental indexation) using 
deltaQuery
and deltaImportQuery, it does not work and i have a stack for each
field : 
 
10 sept. 2009 11:12:26
org.apache.solr.handler.dataimport.XPathEntityProcessor initQuery
ATTENTION: Parsing failed for xml, url:null rows processed:0
java.lang.RuntimeException: java.lang.NullPointerException
at 
org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:92)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:282)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:365)
at
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:259)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
at
org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:354)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:395)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
Caused by: java.lang.NullPointerException
at
com.ctc.wstx.io.ReaderBootstrapper.initialLoad(ReaderBootstrapper.java:245)
at
com.ctc.wstx.io.ReaderBootstrapper.bootstrapInput(ReaderBootstrapper.java:132)
at
com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:543)
at
com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:604)
at
com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:660)
at
com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:331)
at
org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:88)
... 11 more
 
 
When I remove the delta queries or the XPathEntityProcessor block , it's ok.
 
my data-config.xml : 
 
dataConfig
  dataSource
name=database
  type=JdbcDataSource 
  driver=com.mysql.jdbc.Driver
  url=jdbc:mysql://xxx 
  user=xxx 
  password=xxx/
  dataSource
type=FieldReaderDataSource name=fieldReader/
  document

   entity
name=document 
dataSource=database 
processor=SqlEntityProcessor
pk=CONTENTID
query=SELECT * FROM SEARCH
deltaImportQuery=SELECT * FROM SEARCH WHERE 
CONTENTID=${dataimporter.delta.CONTENTID}
deltaQuery=SELECT CONTENTID FROM SEARCH WHERE DATESTATUS
=  UNIX_TIMESTAMP('${dataimporter.last_index_time}')

  entity
name=xml_contenu
  dataSource=fieldReader 
  processor=XPathEntityProcessor 
  forEach=/Contenu 
  dataField=document.XML
  onError=continue
field
column=SurTitre xpath=/Contenu/ArtCourt/SurTitre
flatten=true/
field
column=Titre xpath=/Contenu/ArtCourt/Titre
flatten=true/
field
column=Chapeau xpath=/Contenu/ArtCourt/Chapeau
flatten=true/
field
column=Auteur xpath=/Contenu/ArtCourt/AuteurW
flatten=true/
field
column=Accroche xpath=/Contenu/ArtCourt/Accroche
flatten=true/
field
column=TxtCourt xpath=/Contenu/ArtCourt/TxtCourt
flatten=true/
field
column=Refs xpath=/Contenu/ArtCourt/Refs
flatten=true/
  /entity
   /entity
   
  /document
  
/dataConfig
 
the server query 
:http://localhost:8080/apache-solr-nightly/dataimport?command=delta-import
 
All fields are declared in the shema.xml
 
Can someone help me?
 
Nourredine

Indexing fields dynamically

Hello,

I want to index my fields dynamically. 

DynamicFields don't suit my need because I don't know fields name in advance 
and fields type must be set dynamically too (need strong typage).

I think the solution is to handle this programmatically but what is the best 
way to do this? Which custom handler and api use ?

Nourredine.

Re: Pb using delta import with XPathEntityProcessor

I guess there was a null field and the xml parser blows up


On Thu, Sep 10, 2009 at 3:06 PM, nourredine khadri
nourredin...@yahoo.com wrote:
 Hi,

 I'm new solR user and for the moment it suits almost all my needs :)

 I use a fresh nightly release (09/2009) and I index a
 database table using dataImportHandler.

 I try to parse an xml content field from this table using XPathEntityProcessor
 and FieldReaderDataSource. Everything works fine for the full-import.

 But when I try to use the delta import (i need incremental indexation) using 
 deltaQuery
 and deltaImportQuery, it does not work and i have a stack for each
 field :

 10 sept. 2009 11:12:26
 org.apache.solr.handler.dataimport.XPathEntityProcessor initQuery
 ATTENTION: Parsing failed for xml, url:null rows processed:0
 java.lang.RuntimeException: java.lang.NullPointerException
        at 
 org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:92)
        at
 org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:282)
        at
 org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187)
        at
 org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164)
        at
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
        at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339)
        at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:365)
        at
 org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:259)
        at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
        at
 org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:354)
        at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:395)
        at
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
 Caused by: java.lang.NullPointerException
        at
 com.ctc.wstx.io.ReaderBootstrapper.initialLoad(ReaderBootstrapper.java:245)
        at
 com.ctc.wstx.io.ReaderBootstrapper.bootstrapInput(ReaderBootstrapper.java:132)
        at
 com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:543)
        at
 com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:604)
        at
 com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:660)
        at
 com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:331)
        at
 org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:88)
        ... 11 more


 When I remove the delta queries or the XPathEntityProcessor block , it's ok.

 my data-config.xml :

 dataConfig
  dataSource
 name=database
              type=JdbcDataSource
              driver=com.mysql.jdbc.Driver
              url=jdbc:mysql://xxx
              user=xxx
              password=xxx/
  dataSource
 type=FieldReaderDataSource name=fieldReader/
  document

   entity
 name=document
            dataSource=database
            processor=SqlEntityProcessor
            pk=CONTENTID
            query=SELECT * FROM SEARCH
            deltaImportQuery=SELECT * FROM SEARCH WHERE 
 CONTENTID=${dataimporter.delta.CONTENTID}
            deltaQuery=SELECT CONTENTID FROM SEARCH WHERE DATESTATUS
=  UNIX_TIMESTAMP('${dataimporter.last_index_time}')

      entity
 name=xml_contenu
              dataSource=fieldReader
              processor=XPathEntityProcessor
              forEach=/Contenu
              dataField=document.XML
              onError=continue
        field
 column=SurTitre xpath=/Contenu/ArtCourt/SurTitre
 flatten=true/
        field
 column=Titre xpath=/Contenu/ArtCourt/Titre
 flatten=true/
        field
 column=Chapeau xpath=/Contenu/ArtCourt/Chapeau
 flatten=true/
        field
 column=Auteur xpath=/Contenu/ArtCourt/AuteurW
 flatten=true/
        field
 column=Accroche xpath=/Contenu/ArtCourt/Accroche
 flatten=true/
        field
 column=TxtCourt xpath=/Contenu/ArtCourt/TxtCourt
 flatten=true/
        field
 column=Refs xpath=/Contenu/ArtCourt/Refs
 flatten=true/
      /entity
   /entity

  /document

 /dataConfig

 the server query 
 :http://localhost:8080/apache-solr-nightly/dataimport?command=delta-import

 All fields are declared in the shema.xml

 Can someone help me?

 Nourredine






-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re : Pb using delta import with XPathEntityProcessor

But why that occurs only for delta import and not for the full ?

I've checked my data : no xml field is null.

Nourredine.

Noble Paul wrote : 

I guess there was a null field and the xml parser blows up

Re: Pb using delta import with XPathEntityProcessor

I just committed the fix https://issues.apache.org/jira/browse/SOLR-1420

But it does not solve your problem , it will just prevent the import
from throwing an exception and fail

2009/9/10 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 I guess there was a null field and the xml parser blows up


 On Thu, Sep 10, 2009 at 3:06 PM, nourredine khadri
 nourredin...@yahoo.com wrote:
 Hi,

 I'm new solR user and for the moment it suits almost all my needs :)

 I use a fresh nightly release (09/2009) and I index a
 database table using dataImportHandler.

 I try to parse an xml content field from this table using 
 XPathEntityProcessor
 and FieldReaderDataSource. Everything works fine for the full-import.

 But when I try to use the delta import (i need incremental indexation) using 
 deltaQuery
 and deltaImportQuery, it does not work and i have a stack for each
 field :

 10 sept. 2009 11:12:26
 org.apache.solr.handler.dataimport.XPathEntityProcessor initQuery
 ATTENTION: Parsing failed for xml, url:null rows processed:0
 java.lang.RuntimeException: java.lang.NullPointerException
        at 
 org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:92)
        at
 org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:282)
        at
 org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187)
        at
 org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164)
        at
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
        at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339)
        at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:365)
        at
 org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:259)
        at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
        at
 org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:354)
        at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:395)
        at
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
 Caused by: java.lang.NullPointerException
        at
 com.ctc.wstx.io.ReaderBootstrapper.initialLoad(ReaderBootstrapper.java:245)
        at
 com.ctc.wstx.io.ReaderBootstrapper.bootstrapInput(ReaderBootstrapper.java:132)
        at
 com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:543)
        at
 com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:604)
        at
 com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:660)
        at
 com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:331)
        at
 org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:88)
        ... 11 more


 When I remove the delta queries or the XPathEntityProcessor block , it's 
 ok.

 my data-config.xml :

 dataConfig
  dataSource
 name=database
              type=JdbcDataSource
              driver=com.mysql.jdbc.Driver
              url=jdbc:mysql://xxx
              user=xxx
              password=xxx/
  dataSource
 type=FieldReaderDataSource name=fieldReader/
  document

   entity
 name=document
            dataSource=database
            processor=SqlEntityProcessor
            pk=CONTENTID
            query=SELECT * FROM SEARCH
            deltaImportQuery=SELECT * FROM SEARCH WHERE 
 CONTENTID=${dataimporter.delta.CONTENTID}
            deltaQuery=SELECT CONTENTID FROM SEARCH WHERE DATESTATUS
=  UNIX_TIMESTAMP('${dataimporter.last_index_time}')

      entity
 name=xml_contenu
              dataSource=fieldReader
              processor=XPathEntityProcessor
              forEach=/Contenu
              dataField=document.XML
              onError=continue
        field
 column=SurTitre xpath=/Contenu/ArtCourt/SurTitre
 flatten=true/
        field
 column=Titre xpath=/Contenu/ArtCourt/Titre
 flatten=true/
        field
 column=Chapeau xpath=/Contenu/ArtCourt/Chapeau
 flatten=true/
        field
 column=Auteur xpath=/Contenu/ArtCourt/AuteurW
 flatten=true/
        field
 column=Accroche xpath=/Contenu/ArtCourt/Accroche
 flatten=true/
        field
 column=TxtCourt xpath=/Contenu/ArtCourt/TxtCourt
 flatten=true/
        field
 column=Refs xpath=/Contenu/ArtCourt/Refs
 flatten=true/
      /entity
   /entity

  /document

 /dataConfig

 the server query 
 :http://localhost:8080/apache-solr-nightly/dataimport?command=delta-import

 All fields are declared in the shema.xml

 Can someone help me?

 Nourredine






 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Re : Pb using delta import with XPathEntityProcessor

can your just confirm that the  field is not null byadding in  a
LogTransformer to the entity document

On Thu, Sep 10, 2009 at 3:54 PM, nourredine khadri
nourredin...@yahoo.com wrote:
 But why that occurs only for delta import and not for the full ?

 I've checked my data : no xml field is null.

 Nourredine.

 Noble Paul wrote :

I guess there was a null field and the xml parser blows up






-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

How to Convert Lucene index files to XML Format

2009-09-10 Thread busbus


Hello All,
I have a set of Files indexed by Lucene. Now i want to use the indexed files
in SOLR. The file .cfx an .cfs are not readable by Solr, as it supports only
.fds and .fdx.

So i decided to Add/update the index by just loading a XML File using the
post.jar funtion.

java -jar post.jar newFile.XML - Loads the XML and Updates the index.

Now i want to Convert all the cfx files to XML so that i can Use them in
SOLR.

Advice Needed.

Any other suggestions are most welcomed.

- Balaji
-- 
View this message in context: 
http://www.nabble.com/How-to-Convert-Lucene-index-files-to-XML-Format-tp25381017p25381017.html
Sent from the Solr - User mailing list archive at Nabble.com.

Connection refused when excecuting the query

2009-09-10 Thread dharhsana


Hi to all,
when i try to execute my query i get Connection refused ,can any one please
tell me what should be done for this ,to make my solr run.

org.apache.solr.client.solrj.SolrServerException: java.net.ConnectException:
Connection refused: connect
org.apache.solr.client.solrj.SolrServerException: java.net.ConnectException:
Connection refused: connect
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:471)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
at
org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
at
com.cloud.seviceImpl.InsertToSolrServiceImpl.getMyBlogs(InsertToSolrServiceImpl.java:214)
at
com.cloud.struts.action.MyBlogAction.execute(MyBlogAction.java:42)
at
org.apache.struts.action.RequestProcessor.processActionPerform(RequestProcessor.java:425)
at
org.apache.struts.action.RequestProcessor.process(RequestProcessor.java:228)
at
org.apache.struts.action.ActionServlet.process(ActionServlet.java:1913)
at
org.apache.struts.action.ActionServlet.doGet(ActionServlet.java:449)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:567)
at
org.apache.catalina.authenticator.SingleSignOn.invoke(SingleSignOn.java:394)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
at java.lang.Thread.run(Thread.java:595)
Caused by: java.net.ConnectException: Connection refused: connect
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at
java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:519)
at java.net.Socket.connect(Socket.java:469)
at java.net.Socket.init(Socket.java:366)
at java.net.Socket.init(Socket.java:239)
at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)
at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122)
at
org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361)
at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:415)


with regrards,

rekha

-- 
View this message in context: 
http://www.nabble.com/Connection-refused-when-excecuting-the-query-tp25381486p25381486.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Connection refused when excecuting the query

2009-09-10 Thread Shalin Shekhar Mangar

On Thu, Sep 10, 2009 at 4:52 PM, dharhsana rekha.dharsh...@gmail.comwrote:


 Hi to all,
 when i try to execute my query i get Connection refused ,can any one please
 tell me what should be done for this ,to make my solr run.

 org.apache.solr.client.solrj.SolrServerException:
 java.net.ConnectException:
 Connection refused: connect
 org.apache.solr.client.solrj.SolrServerException:
 java.net.ConnectException:
 Connection refused: connect
at

 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:471)
at


Your Solr server is not running at the url you have given to
CommonsHttpSolrServer. Make sure you have given the correct url and Solr is
actually up and running at that url.

-- 
Regards,
Shalin Shekhar Mangar.

Re: How to Convert Lucene index files to XML Format

2009-09-10 Thread Grant Ingersoll



On Sep 10, 2009, at 6:41 AM, busbus wrote:



Hello All,
I have a set of Files indexed by Lucene. Now i want to use the  
indexed files
in SOLR. The file .cfx an .cfs are not readable by Solr, as it  
supports only

.fds and .fdx.


Solr defers to Lucene on reading the index.  You just need to tell  
Solr whether the index is a compound file or not and make sure the  
versions are compatible.


What error are you getting?




So i decided to Add/update the index by just loading a XML File  
using the

post.jar funtion.

java -jar post.jar newFile.XML - Loads the XML and Updates the index.

Now i want to Convert all the cfx files to XML so that i can Use  
them in

SOLR.

Advice Needed.


I suppose you could walk the documents and dump them out to XML,  
assuming you have stored all your fields.


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search

Re : Re : Pb using delta import with XPathEntityProcessor

That's the case. The field is not null.

10 sept. 2009 14:10:54 org.apache.solr.handler.dataimport.LogTransformer 
transformRow
FIN: id : 5040052 - Xml content : ?xml version=1.0 encoding=ISO-8859-1? 
Contenu  ArtCourt Template=Article Ref=10   SurTitreParagEmpty 
Subtitle - Click Here to edit/Parag/SurTitre   TitreParagEmpty Title - 
Click Here to edit/Parag/Titre   ChapeauParagEmpty Chap¶ - Click Here 
to edit/Parag/Chapeau   AuteurWEmpty Autor - Click Here to edit/AuteurW 
  AccrocheParagEmpty Catchword - Click Here to edit/Parag/Accroche   
TxtCourtIntTitreParagEmpty InterTitle - Click Here to edit 
Text/Parag/IntTitreParagEmpty Paragraph - Click Here to edit 
Text/Parag/TxtCourt   Images/Images   Refs/Refs  
/ArtCourt/Contenu
10 sept. 2009 14:10:54 org.apache.solr.handler.dataimport.DocBuilder 
buildDocument
GRAVE: Exception while processing: xml_document document : 
SolrInputDocument[{keywords=keywords(1.0)={pub}, fathersId=fathersId(1.0)={}, 
containerId=containerId(1.0)={}, site=site(1.0)={12308}, 
archiveState=archiveState(1.0)={false}, offlineAtDate=offlineAtDate(1.0)={0}, 
onlineAtDate=onlineAtDate(1.0)={1026307864230}, status=status(1.0)={0}, 
dateStatus=dateStatus(1.0)={1113905585726}, model=model(1.0)={0}, 
activationState=activationState(1.0)={true}, 
publicationState=publicationState(1.0)={true}, xml=xml(1.0)={?xml 
version=1.0 encoding=ISO-8859-1? Contenu  ArtCourt Template=Article 
Ref=10   SurTitreParagEmpty Subtitle - Click Here to 
edit/Parag/SurTitre   TitreParagEmpty Title - Click Here to 
edit/Parag/Titre   ChapeauParagEmpty Chap¶ - Click Here to edit
/Parag/Chapeau   AuteurWEmpty Autor - Click Here to edit/AuteurW   
AccrocheParagEmpty Catchword - Click Here to edit/Parag/Accroche   
TxtCourtIntTitreParagEmpty InterTitle - Click Here to edit 
Text/Parag/IntTitreParagEmpty Paragraph - Click Here to edit 
Text/Parag/TxtCourt   Images/Images   Refs/Refs  
/ArtCourt/Contenu}, identifierversion=identifierversion(1.0)={5040052}, 
contentid=contentid(1.0)={5040052}}]
org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed 
for xml, url:null rows processed:0 Processing Document # 1
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:292)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:365)
at 
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:259)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
at 
org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:354)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:395)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
Caused by: java.lang.RuntimeException: java.lang.NullPointerException
at 
org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:92)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:282)
... 10 more
Caused by: java.lang.NullPointerException
at 
com.ctc.wstx.io.ReaderBootstrapper.initialLoad(ReaderBootstrapper.java:245)
at 
com.ctc.wstx.io.ReaderBootstrapper.bootstrapInput(ReaderBootstrapper.java:132)
at 
com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:543)
at 
com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:604)
at 
com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:660)
at 
com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:331)
at 
org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:88)
... 11 more
10 sept. 2009 14:10:54 org.apache.solr.handler.dataimport.DataImporter 
doDeltaImport
GRAVE: Delta Import Failed
org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed 
for xml, url:null rows processed:0 Processing Document # 1
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:292)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187)
at

RE: Extract info from parent node during data import

2009-09-10 Thread venn hardy


Hi Paul,
The forEach=/document/category/item | /document/category/name didn't work (no 
categoryname was stored or indexed).
However forEach=/document/category/item | /document/category seems to work 
well. I am not sure why category on its own works, but not category/name...
But thanks for tip. It wasn't as painful as I thought it would be.
Venn

 From: noble.p...@corp.aol.com
 Date: Thu, 10 Sep 2009 09:58:21 +0530
 Subject: Re: Extract info from parent node during data import
 To: solr-user@lucene.apache.org
 
 try this
 
 add two xpaths in your forEach
 
 forEach=/document/category/item | /document/category/name
 
 and add a field as follows
 
 field column=catgoryname xpath =/document/category/name
 commonField=true/
 
 Please try it out and let me know.
 
 On Thu, Sep 10, 2009 at 7:30 AM, venn hardy venn.ha...@hotmail.com wrote:
 
  Hello,
 
 
 
  I am using SOLR 1.4 (from nighly build) and its URLDataSource in 
  conjunction with the XPathEntityProcessor. I have successfully imported XML 
  content, but I think I may have found a limitation when it comes to the 
  commonField attribute in the DataImportHandler.
 
 
 
  Before writing my own parser to read in a whole XML document, I thought I'd 
  post the question here (since I got some great advice last time).
 
 
 
  The bulk of my content is contained within each item tag. However, each 
  item has a parent called category and each category has a name which I 
  would like to import. In my forEach loop I specify the 
  /document/category/item as the collection of items I am interested in. Is 
  there anyway to extract an element from underneath a parent node? To be a 
  more more specific (see eg xml below). I would like to index the following:
 
  - category: Category 1; id: 1; author: Author 1
 
  - category: Category 1; id: 2; author: Author 2
 
  - category: Category 2; id: 3; author: Author 3
 
  - category: Category 2; id: 4; author: Author 4
 
 
 
  Any ideas on how I can get to a parent node from within a child during data 
  import? If it cant be done, what do you suggest would be the best way so I 
  can keep using the DataImportHandler... would XSLT be a good idea to 
  'flatten out' the structure a bit?
 
 
 
  Thanks
 
 
 
  This is what my XML document looks like:
 
  document
   category
   nameCategory 1/name
   item
id1/id
authorAuthor 1/author
   /item
   item
id2/id
authorAuthor 2/author
   /item
   /category
   category
   nameCategory 2/name
   item
id3/id
authorAuthor 3/author
   /item
   item
id4/id
authorAuthor 4/author
   /item
   /category
  /document
 
 
 
  And this is what my dataConfig looks like:
  dataConfig
   dataSource type=URLDataSource /
   document
entity name=archive pk=id 
  url=http://localhost:9080/data/20090817070752.xml; 
  processor=XPathEntityProcessor forEach=/document/category/item 
  transformer=DateFormatTransformer stream=true dataSource=dataSource
 field column=category xpath=/document/category/name 
  commonField=true /
 field column=id xpath=/document/category/item/id /
 field column=author xpath=/document/category/item/author /
/entity
   /document
  /dataConfig
 
 
 
  This is how I have specified my schema
  fields
field name=id type=string indexed=true stored=true 
  required=true /
field name=author type=string indexed=true stored=true/
field name=category type=string indexed=true stored=true/
  /fields
 
  uniqueKeyid/uniqueKey
  defaultSearchFieldid/defaultSearchField
 
 
 
 
 
 
  _
  Need a place to rent, buy or share? Let us find your next place for you!
  http://clk.atdmt.com/NMN/go/157631292/direct/01/
 
 
 
 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com

_
Get Hotmail on your iPhone Find out how here
http://windowslive.ninemsn.com.au/article.aspx?id=845706

Re: Solr: ERRORs at Startup

2009-09-10 Thread con


Hi Giovanni, 

i am facing same issue. Can you share some info on how you solved this
puzzle.. 





hossman wrote:
 
 
 : Even setting everything to INFO through
 : http://localhost:8080/solr/admin/logging didn't help.
 : 
 : But considering you do not see any bad issue here, at this time I will
 : ignore those ERROR messages :-)
 
 i would read up more on how to configure logging in JBoss.
 
 as far as i can tell, Solr is logging messages, which are getting handled 
 by a logger that writes them to STDERR using a fairly standard format 
 (date, class, method, level, msg) ... except some other piece of code 
 seems to be reading from STDERR, and assuming anything that got written 
 there is an ERROR, so it's loging those writes to stderr using a format 
 with a date, a level (of ERROR), and a group or some other identifier of 
 STDERR
 
 the problem is if you ignore them completely, you're going to miss 
 noticing when you really have a problem.
 
 Like i said: figure out how to configure logging in JBoss, you might need 
 to change the slf4j adapater jar or something if it can't deal with JUL 
 (which is the default).
 
 :  10:51:20,525 INFO  [TomcatDeployment] deploy, ctxPath=/solr
 :  10:51:20,617 ERROR [STDERR] Mar 13, 2009 10:51:20 AM
 :  org.apache.solr.servlet.SolrDispatchFilter init
 :  INFO: SolrDispatchFilter.init()
 
 
 
 -Hoss
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Solr%3A-ERRORs-at-Startup-tp22493300p25382340.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing fields dynamically

On Thu, Sep 10, 2009 at 5:58 AM, nourredine khadri
nourredin...@yahoo.com wrote:
 I want to index my fields dynamically.

 DynamicFields don't suit my need because I don't know fields name in advance 
 and fields type must be set  dynamically too (need strong typage).

This is what dynamic fields are meant for - you pick both the name and
type (from a pre-defined set of types of course) at runtime.  The
suffix of the field name matches one of the dynamic fields and
essentially picks the type.

-Yonik
http://www.lucidimagination.com

Re: How to Convert Lucene index files to XML Format

2009-09-10 Thread busbus


Thanks for your reply





 On Sep 10, 2009, at 6:41 AM, busbus wrote:
 Solr defers to Lucene on reading the index.  You just need to tell  
 Solr whether the index is a compound file or not and make sure the 
 versions are compatible.
 

This part seems to be the point.
How to make solr to read lucene index files.
There is a tag in Solrconfig.xml
useCompundFile false useCompundFile

Enable it to true does not seem to be working.

What else need to be done.

Should i change the config file or add new tag.

Also how to check the compatibility of Lucen and solr

Thanks in advance

-- 
View this message in context: 
http://www.nabble.com/How-to-Convert-Lucene-index-files-to-XML-Format-tp25381017p25382367.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Re : Re : Pb using delta import with XPathEntityProcessor

what do you see if you keep the logTemplate=${document}. I'm trying
to figure out the contents of the map

Re : Re : Re : Pb using delta import with XPathEntityProcessor

Some fields are null but not the one parsed by XPathEntityProcessor (named XML)

10 sept. 2009 14:40:34 org.apache.solr.handler.dataimport.LogTransformer 
transformRow
FIN: Map content : {KEYWORDS=pub, SPECIFIC=null, FATHERSID=, CONTAINERID=, 
ARCHIVEDDATE=0, SITE=12308, LANGUAGE=null, ARCHIVESTATE=false, OFFLINEATDATE=0, 
ONLINEATDATE=1026307864230, STATUS=0, DATESTATUS=1113905585726, MODEL=0, 
ACTIVATIONSTATE=true, MOUNTED_SITE_IDS=null, SPECIFIC_XML=null, 
PUBLICATIONSTATE=true, XML=?xml version=1.0 encoding=ISO-8859-1? 
Contenu  ArtCourt Template=Article Ref=10   SurTitreParagEmpty 
Subtitle - Click Here to edit/Parag/SurTitre   TitreParagEmpty Title - 
Click Here to edit/Parag/Titre   ChapeauParagEmpty Chap¶ - Click Here 
to edit/Parag/Chapeau   AuteurWEmpty Autor - Click Here to edit/AuteurW 
  AccrocheParagEmpty Catchword - Click Here to edit/Parag/Accroche   
TxtCourtIntTitreParagEmpty InterTitle - Cl
ick Here to edit Text/Parag/IntTitreParagEmpty Paragraph - Click Here to 
edit Text/Parag/TxtCourt   Images/Images   Refs/Refs  
/ArtCourt/Contenu, IDENTIFIERVERSION=5040052, CONTENTID=5040052}
10 sept. 2009 14:40:34 org.apache.solr.handler.dataimport.DocBuilder 
buildDocument
GRAVE: Exception while processing: xml_document document : 
SolrInputDocument[{keywords=keywords(1.0)={pub}, fathersId=fathersId(1.0)={}, 
containerId=containerId(1.0)={}, site=site(1.0)={12308}, 
archiveState=archiveState(1.0)={false}, offlineAtDate=offlineAtDate(1.0)={0}, 
onlineAtDate=onlineAtDate(1.0)={1026307864230}, status=status(1.0)={0}, 
dateStatus=dateStatus(1.0)={1113905585726}, model=model(1.0)={0}, 
activationState=activationState(1.0)={true}, 
publicationState=publicationState(1.0)={true}, xml=xml(1.0)={?xml 
version=1.0 encoding=ISO-8859-1? Contenu  ArtCourt Template=Article 
Ref=10   SurTitreParagEmpty Subtitle - Click Here to 
edit/Parag/SurTitre   TitreParagEmpty Title - Click Here to 
edit/Parag/Titre   ChapeauParagEmpty Chap¶ - Click Here to edit
/Parag/Chapeau   AuteurWEmpty Autor - Click Here to edit/AuteurW   
AccrocheParagEmpty Catchword - Click Here to edit/Parag/Accroche   
TxtCourtIntTitreParagEmpty InterTitle - Click Here to edit 
Text/Parag/IntTitreParagEmpty Paragraph - Click Here to edit 
Text/Parag/TxtCourt   Images/Images   Refs/Refs  
/ArtCourt/Contenu}, identifierversion=identifierversion(1.0)={5040052}, 
contentid=contentid(1.0)={5040052}}]
org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed 
for xml, url:null rows processed:0 Processing Document # 1
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:292)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:365)
at 
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:259)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
at 
org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:354)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:395)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
Caused by: java.lang.RuntimeException: java.lang.NullPointerException
at 
org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:92)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:282)
... 10 more
Caused by: java.lang.NullPointerException
at 
com.ctc.wstx.io.ReaderBootstrapper.initialLoad(ReaderBootstrapper.java:245)
at 
com.ctc.wstx.io.ReaderBootstrapper.bootstrapInput(ReaderBootstrapper.java:132)
at 
com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:543)
at 
com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:604)
at 
com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:660)
at 
com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:331)
at 
org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:88)
... 11 more
10 sept. 2009 14:40:34 org.apache.solr.handler.dataimport.DataImporter 
doDeltaImport
GRAVE: Delta Import Failed
org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed 
for xml, url:null rows processed:0 Processing Document # 1
at

Re: Extract info from parent node during data import

in my tests both seems to be working. I had misspelt the column as
catgoryname is that why?

keep in mind that you get extra docs for each category also



On Thu, Sep 10, 2009 at 5:53 PM, venn hardy venn.ha...@hotmail.com wrote:

 Hi Paul,
 The forEach=/document/category/item | /document/category/name didn't work 
 (no categoryname was stored or indexed).
 However forEach=/document/category/item | /document/category seems to work 
 well. I am not sure why category on its own works, but not category/name...
 But thanks for tip. It wasn't as painful as I thought it would be.
 Venn

 From: noble.p...@corp.aol.com
 Date: Thu, 10 Sep 2009 09:58:21 +0530
 Subject: Re: Extract info from parent node during data import
 To: solr-user@lucene.apache.org

 try this

 add two xpaths in your forEach

 forEach=/document/category/item | /document/category/name

 and add a field as follows

 field column=catgoryname xpath =/document/category/name
 commonField=true/

 Please try it out and let me know.

 On Thu, Sep 10, 2009 at 7:30 AM, venn hardy venn.ha...@hotmail.com wrote:
 
  Hello,
 
 
 
  I am using SOLR 1.4 (from nighly build) and its URLDataSource in 
  conjunction with the XPathEntityProcessor. I have successfully imported 
  XML content, but I think I may have found a limitation when it comes to 
  the commonField attribute in the DataImportHandler.
 
 
 
  Before writing my own parser to read in a whole XML document, I thought 
  I'd post the question here (since I got some great advice last time).
 
 
 
  The bulk of my content is contained within each item tag. However, each 
  item has a parent called category and each category has a name which I 
  would like to import. In my forEach loop I specify the 
  /document/category/item as the collection of items I am interested in. Is 
  there anyway to extract an element from underneath a parent node? To be a 
  more more specific (see eg xml below). I would like to index the following:
 
  - category: Category 1; id: 1; author: Author 1
 
  - category: Category 1; id: 2; author: Author 2
 
  - category: Category 2; id: 3; author: Author 3
 
  - category: Category 2; id: 4; author: Author 4
 
 
 
  Any ideas on how I can get to a parent node from within a child during 
  data import? If it cant be done, what do you suggest would be the best way 
  so I can keep using the DataImportHandler... would XSLT be a good idea to 
  'flatten out' the structure a bit?
 
 
 
  Thanks
 
 
 
  This is what my XML document looks like:
 
  document
   category
   nameCategory 1/name
   item
    id1/id
    authorAuthor 1/author
   /item
   item
    id2/id
    authorAuthor 2/author
   /item
   /category
   category
   nameCategory 2/name
   item
    id3/id
    authorAuthor 3/author
   /item
   item
    id4/id
    authorAuthor 4/author
   /item
   /category
  /document
 
 
 
  And this is what my dataConfig looks like:
  dataConfig
   dataSource type=URLDataSource /
   document
    entity name=archive pk=id 
  url=http://localhost:9080/data/20090817070752.xml; 
  processor=XPathEntityProcessor forEach=/document/category/item 
  transformer=DateFormatTransformer stream=true dataSource=dataSource
     field column=category xpath=/document/category/name 
  commonField=true /
     field column=id xpath=/document/category/item/id /
     field column=author xpath=/document/category/item/author /
    /entity
   /document
  /dataConfig
 
 
 
  This is how I have specified my schema
  fields
    field name=id type=string indexed=true stored=true 
  required=true /
    field name=author type=string indexed=true stored=true/
    field name=category type=string indexed=true stored=true/
  /fields
 
  uniqueKeyid/uniqueKey
  defaultSearchFieldid/defaultSearchField
 
 
 
 
 
 
  _
  Need a place to rent, buy or share? Let us find your next place for you!
  http://clk.atdmt.com/NMN/go/157631292/direct/01/



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com

 _
 Get Hotmail on your iPhone Find out how here
 http://windowslive.ninemsn.com.au/article.aspx?id=845706



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re : Indexing fields dynamically

Thanks for the quick reply.

Ok for dynamicFields but  how can i rename fields during indexation/search to 
add suffix corresponding to the type ?  

What is the best way to do this?

Nourredine.





De : Yonik Seeley yo...@lucidimagination.com
À : solr-user@lucene.apache.org
Envoyé le : Jeudi, 10 Septembre 2009, 14h24mn 26s
Objet : Re: Indexing fields dynamically

On Thu, Sep 10, 2009 at 5:58 AM, nourredine khadri
nourredin...@yahoo.com wrote:
 I want to index my fields dynamically.

 DynamicFields don't suit my need because I don't know fields name in advance 
 and fields type must be set  dynamically too (need strong typage).

This is what dynamic fields are meant for - you pick both the name and
type (from a pre-defined set of types of course) at runtime.  The
suffix of the field name matches one of the dynamic fields and
essentially picks the type.

-Yonik
http://www.lucidimagination.com

Solr http post performance seems slow - help?

2009-09-10 Thread Dan A. Dickey

I'm posting documents to Solr using http (curl) from
C++/C code and am seeing approximately 3.3 - 3.4
documents per second being posted.  Is this to be expected?
Granted - I understand that this depends somewhat on the
machine running Solr.  By the way - I'm running Solr inside JBoss.

I was hoping for maybe 20 or more docs/sec, and 3 or so
is quite a way from that.

Also, I'm posting just a single document at a time.  I once tried
5 processes each posting documents, and that slowed things
down considerably.  Down into the multiple (5-10) seconds per document.

Does anyone have suggestions on what I can try?  I'll soon
have better servers installed and will be splitting the indexing
work from the searching - but at this point in time, I wasn't doing
indexing while searching anyway.  Thanks for any and all help!
-Dan

-- 
Dan A. Dickey | Senior Software Engineer

Savvis
10900 Hampshire Ave. S., Bloomington, MN  55438
Office: 952.852.4803 | Fax: 952.852.4951
E-mail: dan.dic...@savvis.net

RE: Extract info from parent node during data import

2009-09-10 Thread Fergus McMenemie

Hi Paul,
The forEach=/document/category/item | /document/category/name didn't work 
(no categoryname was stored or indexed).
However forEach=/document/category/item | /document/category seems to work 
well. I am not sure why category on its own works, but not category/name...
But thanks for tip. It wasn't as painful as I thought it would be.
Venn

Hmmm, I had bother with this. Although each occurance of 
/document/category/item 
causes a new solr document to indexed, that document contained all the fields 
from
the parent element as well.

Did you see this?


 From: noble.p...@corp.aol.com
 Date: Thu, 10 Sep 2009 09:58:21 +0530
 Subject: Re: Extract info from parent node during data import
 To: solr-user@lucene.apache.org
 
 try this
 
 add two xpaths in your forEach
 
 forEach=/document/category/item | /document/category/name
 
 and add a field as follows
 
 field column=catgoryname xpath =/document/category/name
 commonField=true/
 
 Please try it out and let me know.
 
 On Thu, Sep 10, 2009 at 7:30 AM, venn hardy venn.ha...@hotmail.com wrote:
 
  Hello,
 
 
 
  I am using SOLR 1.4 (from nighly build) and its URLDataSource in 
  conjunction with the XPathEntityProcessor. I have successfully imported 
  XML content, but I think I may have found a limitation when it comes to 
  the commonField attribute in the DataImportHandler.
 
 
 
  Before writing my own parser to read in a whole XML document, I thought 
  I'd post the question here (since I got some great advice last time).
 
 
 
  The bulk of my content is contained within each item tag. However, each 
  item has a parent called category and each category has a name which I 
  would like to import. In my forEach loop I specify the 
  /document/category/item as the collection of items I am interested in. Is 
  there anyway to extract an element from underneath a parent node? To be a 
  more more specific (see eg xml below). I would like to index the following:
 
  - category: Category 1; id: 1; author: Author 1
 
  - category: Category 1; id: 2; author: Author 2
 
  - category: Category 2; id: 3; author: Author 3
 
  - category: Category 2; id: 4; author: Author 4
 
 
 
  Any ideas on how I can get to a parent node from within a child during 
  data import? If it cant be done, what do you suggest would be the best way 
  so I can keep using the DataImportHandler... would XSLT be a good idea to 
  'flatten out' the structure a bit?
 
 
 
  Thanks
 
 
 
  This is what my XML document looks like:
 
  document
   category
   nameCategory 1/name
   item
id1/id
authorAuthor 1/author
   /item
   item
id2/id
authorAuthor 2/author
   /item
   /category
   category
   nameCategory 2/name
   item
id3/id
authorAuthor 3/author
   /item
   item
id4/id
authorAuthor 4/author
   /item
   /category
  /document
 
 
 
  And this is what my dataConfig looks like:
  dataConfig
   dataSource type=URLDataSource /
   document
entity name=archive pk=id 
  url=http://localhost:9080/data/20090817070752.xml; 
  processor=XPathEntityProcessor forEach=/document/category/item 
  transformer=DateFormatTransformer stream=true dataSource=dataSource
 field column=category xpath=/document/category/name 
  commonField=true /
 field column=id xpath=/document/category/item/id /
 field column=author xpath=/document/category/item/author /
/entity
   /document
  /dataConfig
 
 
 
  This is how I have specified my schema
  fields
field name=id type=string indexed=true stored=true 
  required=true /
field name=author type=string indexed=true stored=true/
field name=category type=string indexed=true stored=true/
  /fields
 
  uniqueKeyid/uniqueKey
  defaultSearchFieldid/defaultSearchField
 
 
 
 
 
 
  _
  Need a place to rent, buy or share? Let us find your next place for you!
  http://clk.atdmt.com/NMN/go/157631292/direct/01/
 
 
 
 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com

_
Get Hotmail on your iPhone Find out how here
http://windowslive.ninemsn.com.au/article.aspx?id=845706

-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===

Re: Solr http post performance seems slow - help?

On Thu, Sep 10, 2009 at 9:13 AM, Dan A. Dickey dan.dic...@savvis.net wrote:
 I'm posting documents to Solr using http (curl) from
 C++/C code and am seeing approximately 3.3 - 3.4
 documents per second being posted.  Is this to be expected?

No, that's very slow.
Are you using libcurl, or actually forking a new process for every document?
Are you committing on every document?

If you can, using Java would make your life much easier since you
could use the SolrJ client and it's binary protocol for indexing.

-Yonik
http://www.lucidimagination.com

Re: Solr http post performance seems slow - help?

2009-09-10 Thread Walter Underwood

How big are your documents? Is your index on local disk or network- 
mounted disk?


wunder

On Sep 10, 2009, at 6:39 AM, Yonik Seeley wrote:

On Thu, Sep 10, 2009 at 9:13 AM, Dan A. Dickey  
dan.dic...@savvis.net wrote:

I'm posting documents to Solr using http (curl) from
C++/C code and am seeing approximately 3.3 - 3.4
documents per second being posted.  Is this to be expected?


No, that's very slow.
Are you using libcurl, or actually forking a new process for every  
document?

Are you committing on every document?

If you can, using Java would make your life much easier since you
could use the SolrJ client and it's binary protocol for indexing.

-Yonik
http://www.lucidimagination.com

Re: solr 1.3 and multicore data directory

2009-09-10 Thread Paul Rosen

Ok. I have a workaround for now. I've duplicated the conf folder three 
times and changed this line in solrconfig.xml in each folder:


  dataDir${solr.data.dir:./solr/exhibits/data}/dataDir

I can't wait for solr 1.4!

Noble Paul നോബിള്‍ नोब्ळ् wrote:

the dataDir is a Solr1.4 feature

On Thu, Sep 10, 2009 at 1:57 AM, Paul Rosen p...@performantsoftware.com wrote:

Hi All,

I'm trying to set up solr 1.3 to use multicore but I'm getting some puzzling
results. My solr.xml file is:

?xml version=1.0 encoding=UTF-8?
solr persistent=true sharedLib=../lib
 cores adminPath=/admin/cores
 core name=resources instanceDir=resources
dataDir=solr/resources/data/ /
 core name=exhibits instanceDir=exhibits dataDir=solr/exhibits/data/
/
 core name=reindex_resources instanceDir=reindex_resources
dataDir=solr/reindex_resources/data/ /
 /cores
/solr

When I start up solr, everything looks normal until I get this line in the
log:

INFO: [resources] Opening new SolrCore at solr/resources/,
dataDir=./solr/data/

And a new folder is created ./solr/data/index with a blank index. And, of
course, any queries go to that blank index and not to one of my cores.

Actually, what I'd really like is to have my directory structure look like
this (some items removed for brevity):

-
solr_1.3
   lib
   solr
   solr.xml
   bin
   conf
   data
   resources
   index
   exhibits
   index
   reindex_resources
   index
start.jar
-

And have all the cores share everything except an index.

How would I set that up?

Are there differences between 1.3 and 1.4 in this respect?

Thanks,
Paul

Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-10 Thread Uri Boness

All work and progress on this patch is done under the JIRA issue: 
https://issues.apache.org/jira/browse/SOLR-236



R. Tan wrote:

The patch which will be committed soon will add this functionality.




Where can I follow the progress of this patch?


On Mon, Sep 7, 2009 at 3:38 PM, Uri Boness ubon...@gmail.com wrote:

  

Great. Nice site and very similar to my requirements.

  

thanks.

 So, right now, you get all field values by default?

Right now, no field values are returned for the collapsed documents. The

patch which will be committed soon will add this functionality.


R. Tan wrote:



Great. Nice site and very similar to my requirements.



  

There's work on the patch that is being done now which will enable you to
ask for specific field values of the collapsed documents using a
dedicated
request parameter.




So, right now, you get all field values by default?


On Sun, Sep 6, 2009 at 3:58 AM, Uri Boness ubon...@gmail.com wrote:



  

You can check out http://www.ilocal.nl. If you search for a bank in
Amsterdam then you'll see that a lot of the results are collapsed. For
this
we used an older version of this patch (which works on 1.3) but a lot has
changed since then. We're currently using this patch on another project,
but
it's not live yet.


Uri

R. Tan wrote:





Thanks Uri. Your personal suggestion is appreciated and I think I'll
follow
your advice. We're still early in development and 1.4 would be a good
choice. I hope I can get field collapsing to work with my requirements.
Do
you know any live site using field collapsing already?

On Sat, Sep 5, 2009 at 5:57 PM, Uri Boness ubon...@gmail.com wrote:





  

There's work on the patch that is being done now which will enable you
to
ask for specific field values of the collapsed documents using a
dedicated
request parameter. This work is not committed yet to the latest patch,
but
will be very soon. There is of course a drawback to that as well, the
collapsed documents set can be very large (depends on your data of
course)
in which case the returned result which includes the fields values can
be
rather large, which will impact performance, this is why this feature
will
be enabled only if you specify this extra parameter - by default no
field
values will be returned.

AFAIK, the latest patch should work fine with the latest build. Martijn
(which is the main maintainer of this patch) tries to keep it up to
date
with the latest builds. But I guess the safest way is to work with the
nightly build of the same date as the latest patch (though I would give
it a
try first with the latest build).

BTW, it's not an official suggestion from the Solr development team,
but
if
you ask me, if you have to choose now whether to use 1.3 or 1.4-dev, I
would
go for the later. 1.4 is supposed to be released in the upcoming week
or
two
and it bring loads of bug fixes, enhancements and extra functionality.
But
again, this is my personal suggestion.


cheers,
Uri

R. Tan wrote:







Okay. Thanks for giving an insight on how it works in general. Without
trying it myself, are the field values for the collapsed ones also
part
of
the results data?
What is the latest build that is safe to use on a production
environment?
I'd probably go for that and use field collapsing.

Thank you very much.


On Fri, Sep 4, 2009 at 4:49 AM, Uri Boness ubon...@gmail.com wrote:







  

The collapsed documents are represented by one master document
which
can
be part of the normal search result (the doc list), so pagination
just
works
as expected, meaning taking only the returned documents in account
(ignoring
the collapsed ones). As for the scoring, the master document is
actually
the document with the highest score in the collapsed group.

As for Solr 1.3 compatibility... well... it's very hart to tell. All
latest
patch are certainly *not* 1.3 compatible (I think they're also
depending
on
some changes in lucene which are not available for solr 1.3). I guess
you'll
have to try some of the old patches, but I'm not sure about their
stability.

cheers,
Uri


R. Tan wrote:









Thanks Uri. How does paging and scoring work when using field
collapsing?
What patch works with 1.3? Is it production ready?

R


On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness ubon...@gmail.com
wrote:









  

The development on this patch is quite active. It works well for
single
solr instance, but distributed search (ie. shards) is not yet
supported.
Using this page you can group search results based on a specific
field.
There are two flavors of field collapsing - adjacent and
non-adjacent,
the
former collapses only document which happen to be located next to
each
other
in the otherwise-non-collapsed results set. The later (the
non-adjacent)
one
collapses all documents with the same field value (regardless of
their
position in the otherwise-non-collapsed results set). Note, that
non-adjacent

Re: Re : Re : Re : Pb using delta import with XPathEntityProcessor

everything looks fine and it beats me completely. I guess you will
have to debug this

On Thu, Sep 10, 2009 at 6:17 PM, nourredine khadri
nourredin...@yahoo.com wrote:
 Some fields are null but not the one parsed by XPathEntityProcessor (named 
 XML)

 10 sept. 2009 14:40:34 org.apache.solr.handler.dataimport.LogTransformer 
 transformRow
 FIN: Map content : {KEYWORDS=pub, SPECIFIC=null, FATHERSID=, CONTAINERID=, 
 ARCHIVEDDATE=0, SITE=12308, LANGUAGE=null, ARCHIVESTATE=false, 
 OFFLINEATDATE=0, ONLINEATDATE=1026307864230, STATUS=0, 
 DATESTATUS=1113905585726, MODEL=0, ACTIVATIONSTATE=true, 
 MOUNTED_SITE_IDS=null, SPECIFIC_XML=null, PUBLICATIONSTATE=true, XML=?xml 
 version=1.0 encoding=ISO-8859-1? Contenu  ArtCourt Template=Article 
 Ref=10   SurTitreParagEmpty Subtitle - Click Here to 
 edit/Parag/SurTitre   TitreParagEmpty Title - Click Here to 
 edit/Parag/Titre   ChapeauParagEmpty Chap¶ - Click Here to 
 edit/Parag/Chapeau   AuteurWEmpty Autor - Click Here to edit/AuteurW  
  AccrocheParagEmpty Catchword - Click Here to edit/Parag/Accroche   
 TxtCourtIntTitreParagEmpty InterTitle - Cl
 ick Here to edit Text/Parag/IntTitreParagEmpty Paragraph - Click Here 
 to edit Text/Parag/TxtCourt   Images/Images   Refs/Refs  
 /ArtCourt/Contenu, IDENTIFIERVERSION=5040052, CONTENTID=5040052}
 10 sept. 2009 14:40:34 org.apache.solr.handler.dataimport.DocBuilder 
 buildDocument
 GRAVE: Exception while processing: xml_document document : 
 SolrInputDocument[{keywords=keywords(1.0)={pub}, fathersId=fathersId(1.0)={}, 
 containerId=containerId(1.0)={}, site=site(1.0)={12308}, 
 archiveState=archiveState(1.0)={false}, offlineAtDate=offlineAtDate(1.0)={0}, 
 onlineAtDate=onlineAtDate(1.0)={1026307864230}, status=status(1.0)={0}, 
 dateStatus=dateStatus(1.0)={1113905585726}, model=model(1.0)={0}, 
 activationState=activationState(1.0)={true}, 
 publicationState=publicationState(1.0)={true}, xml=xml(1.0)={?xml 
 version=1.0 encoding=ISO-8859-1? Contenu  ArtCourt Template=Article 
 Ref=10   SurTitreParagEmpty Subtitle - Click Here to 
 edit/Parag/SurTitre   TitreParagEmpty Title - Click Here to 
 edit/Parag/Titre   ChapeauParagEmpty Chap¶ - Click Here to edit
 /Parag/Chapeau   AuteurWEmpty Autor - Click Here to edit/AuteurW   
 AccrocheParagEmpty Catchword - Click Here to edit/Parag/Accroche   
 TxtCourtIntTitreParagEmpty InterTitle - Click Here to edit 
 Text/Parag/IntTitreParagEmpty Paragraph - Click Here to edit 
 Text/Parag/TxtCourt   Images/Images   Refs/Refs  
 /ArtCourt/Contenu}, identifierversion=identifierversion(1.0)={5040052}, 
 contentid=contentid(1.0)={5040052}}]
 org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed 
 for xml, url:null rows processed:0 Processing Document # 1
        at 
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
        at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:292)
        at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187)
        at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164)
        at 
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
        at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339)
        at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:365)
        at 
 org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:259)
        at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
        at 
 org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:354)
        at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:395)
        at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
 Caused by: java.lang.RuntimeException: java.lang.NullPointerException
        at 
 org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:92)
        at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:282)
        ... 10 more
 Caused by: java.lang.NullPointerException
        at 
 com.ctc.wstx.io.ReaderBootstrapper.initialLoad(ReaderBootstrapper.java:245)
        at 
 com.ctc.wstx.io.ReaderBootstrapper.bootstrapInput(ReaderBootstrapper.java:132)
        at 
 com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:543)
        at 
 com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:604)
        at 
 com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:660)
        at 
 com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:331)
        at 
 org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:88)
        ... 11 more
 10 sept. 2009 14:40:34

Re: solr 1.3 and multicore data directory

you do not have to make 3 copies of conf dir even in Solr1.3

you can try this

dataDir${./solr/${solr.core.name}/data}/dataDir



On Thu, Sep 10, 2009 at 7:55 PM, Paul Rosen p...@performantsoftware.com wrote:
 Ok. I have a workaround for now. I've duplicated the conf folder three times
 and changed this line in solrconfig.xml in each folder:

  dataDir${solr.data.dir:./solr/exhibits/data}/dataDir

 I can't wait for solr 1.4!

 Noble Paul നോബിള്‍ नोब्ळ् wrote:

 the dataDir is a Solr1.4 feature

 On Thu, Sep 10, 2009 at 1:57 AM, Paul Rosen p...@performantsoftware.com
 wrote:

 Hi All,

 I'm trying to set up solr 1.3 to use multicore but I'm getting some
 puzzling
 results. My solr.xml file is:

 ?xml version=1.0 encoding=UTF-8?
 solr persistent=true sharedLib=../lib
  cores adminPath=/admin/cores
  core name=resources instanceDir=resources
 dataDir=solr/resources/data/ /
  core name=exhibits instanceDir=exhibits
 dataDir=solr/exhibits/data/
 /
  core name=reindex_resources instanceDir=reindex_resources
 dataDir=solr/reindex_resources/data/ /
  /cores
 /solr

 When I start up solr, everything looks normal until I get this line in
 the
 log:

 INFO: [resources] Opening new SolrCore at solr/resources/,
 dataDir=./solr/data/

 And a new folder is created ./solr/data/index with a blank index. And, of
 course, any queries go to that blank index and not to one of my cores.

 Actually, what I'd really like is to have my directory structure look
 like
 this (some items removed for brevity):

 -
 solr_1.3
   lib
   solr
       solr.xml
       bin
       conf
       data
           resources
               index
           exhibits
               index
           reindex_resources
               index
 start.jar
 -

 And have all the cores share everything except an index.

 How would I set that up?

 Are there differences between 1.3 and 1.4 in this respect?

 Thanks,
 Paul









-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Facet fields and the DisMax query handler

2009-09-10 Thread Villemos, Gert

I'm trying to understand the DisMax query handler. I orginally
configured it to ensure that the query was mapped onto different fields
in the documents and a boost assigned if the fields match. And that
works pretty smoothly.
 
However when it comes to facetted searches the results perplexes me.
Consider the following example;
 
Document A:
field name=StaffJohn Doe/field
 
Document B:
field name=ProjectManagerJohn Doe/field
 
The following queries does not return anything;
Staff:Doe
Staff:Doe*
Staff:John
Staff:John*
 
The query;
Staff:John
 
Returns Document A and B, even though document B doesnt even contain the
field 'Staff' (which is optional)! Through the qf field dismax has
been configured to search over the field 'ProjectManager' but I expected
the usage of a facet value would exclude the field... Looking at the
score of the documents, document A does score much higher than Document
B (a factor 20) but I would expect not to see B at all. I have changed
the dismax configuration minimum match to be 1, to ensure that all hits
with a single match is returned without effect. I have changed the tie
to 0 with no effect.
 
What am I missing here? I would like queries such as 'Staff:Doe' to
return document A, and only A.
 
Cheers,
Gert.
 


Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.

Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-10 Thread Uri Boness


The current patch definitely supports facet before and after the collapsing.

Stephen Weiss wrote:
I just noticed this and it reminded me of an issue I've had with 
collapsed faceting with an older version of the patch in Solr 1.3.  
Would it be possible, if we can get the terms for all the collapsed 
documents on a field, to then facet each collapsed document on the 
unique terms it has collectively?  What I mean is for example:


Doc 1, 2, 3 collapse together on some other field

Doc 1 is the main document and has the colors blue and red
Doc 2 has red
Doc 3 has green

For the purposes of faceting, it would be ideal in our case for 
faceting on color to count one each for blue, red, and green on this 
document (the user drills down on this value to yet another collapsed 
set).  Right now, when you facet after collapse you just get blue and 
red (green is dropped because it collapses out).  To the user it makes 
the counts seem inaccurate, like they're missing something.  Instead 
we facet before collapsing and get an inflated value (which ticks 2 
for red - but when you drill down, you still only get 1 because Doc 1 
and Doc 2 collapse together again).  Either way it's not ideal.


At the time (many months ago) there was no way to account for this but 
it sounds like this patch could make it possible, maybe.


Thanks!

--
Steve

On Sep 5, 2009, at 5:57 AM, Uri Boness wrote:

There's work on the patch that is being done now which will enable 
you to ask for specific field values of the collapsed documents using 
a dedicated request parameter. This work is not committed yet to the 
latest patch, but will be very soon. There is of course a drawback to 
that as well, the collapsed documents set can be very large (depends 
on your data of course) in which case the returned result which 
includes the fields values can be rather large, which will impact 
performance, this is why this feature will be enabled only if you 
specify this extra parameter - by default no field values will be 
returned.


AFAIK, the latest patch should work fine with the latest build. 
Martijn (which is the main maintainer of this patch) tries to keep it 
up to date with the latest builds. But I guess the safest way is to 
work with the nightly build of the same date as the latest patch 
(though I would give it a try first with the latest build).


BTW, it's not an official suggestion from the Solr development team, 
but if you ask me, if you have to choose now whether to use 1.3 or 
1.4-dev, I would go for the later. 1.4 is supposed to be released in 
the upcoming week or two and it bring loads of bug fixes, 
enhancements and extra functionality. But again, this is my personal 
suggestion.


cheers,
Uri

Re: Re : Re : Re : Pb using delta import with XPathEntityProcessor

I guess there is a bug. I shall raise an issue.



2009/9/10 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 everything looks fine and it beats me completely. I guess you will
 have to debug this

 On Thu, Sep 10, 2009 at 6:17 PM, nourredine khadri
 nourredin...@yahoo.com wrote:
 Some fields are null but not the one parsed by XPathEntityProcessor (named 
 XML)

 10 sept. 2009 14:40:34 org.apache.solr.handler.dataimport.LogTransformer 
 transformRow
 FIN: Map content : {KEYWORDS=pub, SPECIFIC=null, FATHERSID=, CONTAINERID=, 
 ARCHIVEDDATE=0, SITE=12308, LANGUAGE=null, ARCHIVESTATE=false, 
 OFFLINEATDATE=0, ONLINEATDATE=1026307864230, STATUS=0, 
 DATESTATUS=1113905585726, MODEL=0, ACTIVATIONSTATE=true, 
 MOUNTED_SITE_IDS=null, SPECIFIC_XML=null, PUBLICATIONSTATE=true, XML=?xml 
 version=1.0 encoding=ISO-8859-1? Contenu  ArtCourt 
 Template=Article Ref=10   SurTitreParagEmpty Subtitle - Click Here 
 to edit/Parag/SurTitre   TitreParagEmpty Title - Click Here to 
 edit/Parag/Titre   ChapeauParagEmpty Chap¶ - Click Here to 
 edit/Parag/Chapeau   AuteurWEmpty Autor - Click Here to edit/AuteurW 
   AccrocheParagEmpty Catchword - Click Here to edit/Parag/Accroche   
 TxtCourtIntTitreParagEmpty InterTitle - Cl
 ick Here to edit Text/Parag/IntTitreParagEmpty Paragraph - Click Here 
 to edit Text/Parag/TxtCourt   Images/Images   Refs/Refs  
 /ArtCourt/Contenu, IDENTIFIERVERSION=5040052, CONTENTID=5040052}
 10 sept. 2009 14:40:34 org.apache.solr.handler.dataimport.DocBuilder 
 buildDocument
 GRAVE: Exception while processing: xml_document document : 
 SolrInputDocument[{keywords=keywords(1.0)={pub}, 
 fathersId=fathersId(1.0)={}, containerId=containerId(1.0)={}, 
 site=site(1.0)={12308}, archiveState=archiveState(1.0)={false}, 
 offlineAtDate=offlineAtDate(1.0)={0}, 
 onlineAtDate=onlineAtDate(1.0)={1026307864230}, status=status(1.0)={0}, 
 dateStatus=dateStatus(1.0)={1113905585726}, model=model(1.0)={0}, 
 activationState=activationState(1.0)={true}, 
 publicationState=publicationState(1.0)={true}, xml=xml(1.0)={?xml 
 version=1.0 encoding=ISO-8859-1? Contenu  ArtCourt 
 Template=Article Ref=10   SurTitreParagEmpty Subtitle - Click Here 
 to edit/Parag/SurTitre   TitreParagEmpty Title - Click Here to 
 edit/Parag/Titre   ChapeauParagEmpty Chap¶ - Click Here to edit
 /Parag/Chapeau   AuteurWEmpty Autor - Click Here to edit/AuteurW   
 AccrocheParagEmpty Catchword - Click Here to edit/Parag/Accroche   
 TxtCourtIntTitreParagEmpty InterTitle - Click Here to edit 
 Text/Parag/IntTitreParagEmpty Paragraph - Click Here to edit 
 Text/Parag/TxtCourt   Images/Images   Refs/Refs  
 /ArtCourt/Contenu}, identifierversion=identifierversion(1.0)={5040052}, 
 contentid=contentid(1.0)={5040052}}]
 org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing 
 failed for xml, url:null rows processed:0 Processing Document # 1
        at 
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
        at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:292)
        at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187)
        at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164)
        at 
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
        at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339)
        at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:365)
        at 
 org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:259)
        at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
        at 
 org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:354)
        at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:395)
        at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
 Caused by: java.lang.RuntimeException: java.lang.NullPointerException
        at 
 org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:92)
        at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:282)
        ... 10 more
 Caused by: java.lang.NullPointerException
        at 
 com.ctc.wstx.io.ReaderBootstrapper.initialLoad(ReaderBootstrapper.java:245)
        at 
 com.ctc.wstx.io.ReaderBootstrapper.bootstrapInput(ReaderBootstrapper.java:132)
        at 
 com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:543)
        at 
 com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:604)
        at 
 com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:660)
        at 
 com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:331)
        at

Re: Re : Re : Re : Pb using delta import with XPathEntityProcessor

https://issues.apache.org/jira/browse/SOLR-1421

2009/9/10 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 I guess there is a bug. I shall raise an issue.



 2009/9/10 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 everything looks fine and it beats me completely. I guess you will
 have to debug this

 On Thu, Sep 10, 2009 at 6:17 PM, nourredine khadri
 nourredin...@yahoo.com wrote:
 Some fields are null but not the one parsed by XPathEntityProcessor (named 
 XML)

 10 sept. 2009 14:40:34 org.apache.solr.handler.dataimport.LogTransformer 
 transformRow
 FIN: Map content : {KEYWORDS=pub, SPECIFIC=null, FATHERSID=, CONTAINERID=, 
 ARCHIVEDDATE=0, SITE=12308, LANGUAGE=null, ARCHIVESTATE=false, 
 OFFLINEATDATE=0, ONLINEATDATE=1026307864230, STATUS=0, 
 DATESTATUS=1113905585726, MODEL=0, ACTIVATIONSTATE=true, 
 MOUNTED_SITE_IDS=null, SPECIFIC_XML=null, PUBLICATIONSTATE=true, XML=?xml 
 version=1.0 encoding=ISO-8859-1? Contenu  ArtCourt 
 Template=Article Ref=10   SurTitreParagEmpty Subtitle - Click Here 
 to edit/Parag/SurTitre   TitreParagEmpty Title - Click Here to 
 edit/Parag/Titre   ChapeauParagEmpty Chap¶ - Click Here to 
 edit/Parag/Chapeau   AuteurWEmpty Autor - Click Here to 
 edit/AuteurW   AccrocheParagEmpty Catchword - Click Here to 
 edit/Parag/Accroche   TxtCourtIntTitreParagEmpty InterTitle - Cl
 ick Here to edit Text/Parag/IntTitreParagEmpty Paragraph - Click Here 
 to edit Text/Parag/TxtCourt   Images/Images   Refs/Refs  
 /ArtCourt/Contenu, IDENTIFIERVERSION=5040052, CONTENTID=5040052}
 10 sept. 2009 14:40:34 org.apache.solr.handler.dataimport.DocBuilder 
 buildDocument
 GRAVE: Exception while processing: xml_document document : 
 SolrInputDocument[{keywords=keywords(1.0)={pub}, 
 fathersId=fathersId(1.0)={}, containerId=containerId(1.0)={}, 
 site=site(1.0)={12308}, archiveState=archiveState(1.0)={false}, 
 offlineAtDate=offlineAtDate(1.0)={0}, 
 onlineAtDate=onlineAtDate(1.0)={1026307864230}, status=status(1.0)={0}, 
 dateStatus=dateStatus(1.0)={1113905585726}, model=model(1.0)={0}, 
 activationState=activationState(1.0)={true}, 
 publicationState=publicationState(1.0)={true}, xml=xml(1.0)={?xml 
 version=1.0 encoding=ISO-8859-1? Contenu  ArtCourt 
 Template=Article Ref=10   SurTitreParagEmpty Subtitle - Click Here 
 to edit/Parag/SurTitre   TitreParagEmpty Title - Click Here to 
 edit/Parag/Titre   ChapeauParagEmpty Chap¶ - Click Here to edit
 /Parag/Chapeau   AuteurWEmpty Autor - Click Here to edit/AuteurW   
 AccrocheParagEmpty Catchword - Click Here to edit/Parag/Accroche   
 TxtCourtIntTitreParagEmpty InterTitle - Click Here to edit 
 Text/Parag/IntTitreParagEmpty Paragraph - Click Here to edit 
 Text/Parag/TxtCourt   Images/Images   Refs/Refs  
 /ArtCourt/Contenu}, identifierversion=identifierversion(1.0)={5040052}, 
 contentid=contentid(1.0)={5040052}}]
 org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing 
 failed for xml, url:null rows processed:0 Processing Document # 1
        at 
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
        at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:292)
        at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187)
        at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164)
        at 
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
        at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339)
        at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:365)
        at 
 org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:259)
        at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
        at 
 org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:354)
        at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:395)
        at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
 Caused by: java.lang.RuntimeException: java.lang.NullPointerException
        at 
 org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:92)
        at 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:282)
        ... 10 more
 Caused by: java.lang.NullPointerException
        at 
 com.ctc.wstx.io.ReaderBootstrapper.initialLoad(ReaderBootstrapper.java:245)
        at 
 com.ctc.wstx.io.ReaderBootstrapper.bootstrapInput(ReaderBootstrapper.java:132)
        at 
 com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:543)
        at 
 com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:604)
        at 
 com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:660)
        at

Re: Backups using Replication

2009-09-10 Thread wojtekpia


I'm using trunk from July 8, 2009. Do you know if it's more recent than that?


Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
 
 which version of Solr are you using? the backupAfter name was
 introduced recently
 

-- 
View this message in context: 
http://www.nabble.com/Backups-using-Replication-tp25350083p25386886.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Passing FuntionQuery string parameters

2009-09-10 Thread wojtekpia


It looks like parseArg was added on Aug 20, 2009. I'm working with slightly
older code. Thanks!


Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
 
 did you implement your own ValueSourceParser . the
 FunctionQParser#parseArg() method supports strings
 
 On Wed, Sep 9, 2009 at 12:10 AM, wojtekpiawojte...@hotmail.com wrote:

 Hi,

 I'm writing a function query to score documents based on Levenshtein
 distance from a string. I want my function calls to look like:

 lev(myFieldName, 'my string to match')

 I'm running into trouble parsing the string I want to match ('my string
 to
 match' above). It looks like all the built in support is for parsing
 field
 names and numeric values. Am I missing the string parsing support, or is
 it
 not there, and if not, why?

 Thanks,

 Wojtek
 --
 View this message in context:
 http://www.nabble.com/Passing-FuntionQuery-string-parameters-tp25351825p25351825.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com
 
 

-- 
View this message in context: 
http://www.nabble.com/Passing-FuntionQuery-string-parameters-tp25351825p25386910.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Pagination with solr json data

2009-09-10 Thread Jay Hill

All you have to do is use the start and rows parameters to get the
results you want. For example, the query for the first page of results might
look like this,
?q=solrstart=0rows=10 (other params omitted). So you'll start at the
beginning (0) and get 10 results. They next page would be
?q=solrstart=10rows=10 - start at the 10th result and display the next 10
rows. Then ?q=solrstart=20rows=10, and so on.

-Jay
http://www.lucidimagination.com


On Wed, Sep 9, 2009 at 12:24 PM, Elaine Li elaine.bing...@gmail.com wrote:

 Hi,

 What is the best way to do pagination?

 I searched around and only found some YUI utilities can do this. But
 their examples don't have very close match to the pattern I have in
 mind. I would like to have pretty plain display, something like the
 search results from google.

 Thanks.

 Elaine

Re: TermsComponent

2009-09-10 Thread Jay Hill

If you need an alternative to using the TermsComponent for auto-suggest,
have a look at this blog on using EdgeNGrams instead of the TermsComponent.

http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

-Jay
http://www.lucidimagination.com

On Wed, Sep 9, 2009 at 3:35 PM, Todd Benge todd.be...@gmail.com wrote:

We're using the StandardAnalyzer but I'm fairly certain that's not the
issue.

In fact, I there doesn't appear to be any issue with Lucene or Solr. There
are many instances of data in which users have removed the whitespace so
they have a high frequency which means they bubble to the top of the sort.
The result is that a search for a name shows a first and last name without
the whitespace.

One thing I've noticed is that since TermsComponent is working on a single
Term, there doesn't seem to be a way to query against a phrase. The same
example as above applies, so if you're querying for name it'd be prefered
to
get multi-term responses back if a first name matches.

Any suggestions?

Thanks for all the help. It's much appreciated.

Todd

On Wed, Sep 9, 2009 at 12:11 PM, Grant Ingersoll gsing...@apache.org
wrote:

And what Analyzer are you using? I'm guessing that your words are being
split up during analysis, which is why you aren't seeing whitespace. If
you
want to keep the whitespace, you will need to use the String field type
or
possibly the Keyword Analyzer.

-Grant

On Sep 9, 2009, at 11:06 AM, Todd Benge wrote:

It's set as Field.Store.YES, Field.Index.ANALYZED.

On Wed, Sep 9, 2009 at 8:15 AM, Grant Ingersoll gsing...@apache.org
wrote:

How are you tokenizing/analyzing the field you are accessing?

On Sep 9, 2009, at 8:49 AM, Todd Benge wrote:

Hi Rekha,

Here's teh link to the TermsComponent info:

http://wiki.apache.org/solr/TermsComponent

and another link Matt Weber did on autocompletion:

http://www.mattweber.org/2009/05/02/solr-autosuggest-with-termscomponent-and-jquery/

We had to upgrade to the latest nightly to get the TermsComponent to
work.

Good Luck!

Todd

On Wed, Sep 9, 2009 at 5:17 AM, dharhsana rekha.dharsh...@gmail.com
wrote:

Hi,

I have a requirement on Autocompletion search , iam using solr 1.4.

Could you please tell me how you worked on that Terms component using
solr
1.4,
i could'nt find terms component in solr 1.4 which i have
downloaded,is
there
anyother configuration should be done.

Do you have code for autocompletion, please share wih me..

Regards
Rekha

tbenge wrote:

Hi,

I was looking at TermsComponent in Solr 1.4 as a way of building a
autocomplete function. I have a prototype working but noticed that
terms
that have whitespace in them when indexed are absent the whitespace
when
returned from the TermsComponent.

Any ideas on why that may be happening? Am I just missing a

configuration

option?

Thanks,

Todd

--
View this message in context:
http://www.nabble.com/TermsComponent-tp25302503p25362829.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using
Solr/Lucene:
http://www.lucidimagination.com/search

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
Solr/Lucene:
http://www.lucidimagination.com/search

Re: TermsComponent

2009-09-10 Thread Todd Benge

Thanks for the pointer. Definitely appreciate the help.

Todd

On Thu, Sep 10, 2009 at 11:10 AM, Jay Hill jayallenh...@gmail.com wrote:

If you need an alternative to using the TermsComponent for auto-suggest,
have a look at this blog on using EdgeNGrams instead of the TermsComponent.

http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

-Jay
http://www.lucidimagination.com

On Wed, Sep 9, 2009 at 3:35 PM, Todd Benge todd.be...@gmail.com wrote:

We're using the StandardAnalyzer but I'm fairly certain that's not the
issue.

In fact, I there doesn't appear to be any issue with Lucene or Solr.
There
are many instances of data in which users have removed the whitespace so
they have a high frequency which means they bubble to the top of the
sort.
The result is that a search for a name shows a first and last name
without
the whitespace.

One thing I've noticed is that since TermsComponent is working on a
single
Term, there doesn't seem to be a way to query against a phrase. The same
example as above applies, so if you're querying for name it'd be prefered
to
get multi-term responses back if a first name matches.

Any suggestions?

Thanks for all the help. It's much appreciated.

Todd

On Wed, Sep 9, 2009 at 12:11 PM, Grant Ingersoll gsing...@apache.org
wrote:

And what Analyzer are you using? I'm guessing that your words are
being
split up during analysis, which is why you aren't seeing whitespace.
If
you
want to keep the whitespace, you will need to use the String field type
or
possibly the Keyword Analyzer.

-Grant

On Sep 9, 2009, at 11:06 AM, Todd Benge wrote:

It's set as Field.Store.YES, Field.Index.ANALYZED.

On Wed, Sep 9, 2009 at 8:15 AM, Grant Ingersoll gsing...@apache.org
wrote:

How are you tokenizing/analyzing the field you are accessing?

On Sep 9, 2009, at 8:49 AM, Todd Benge wrote:

Hi Rekha,

Here's teh link to the TermsComponent info:

http://wiki.apache.org/solr/TermsComponent

and another link Matt Weber did on autocompletion:

http://www.mattweber.org/2009/05/02/solr-autosuggest-with-termscomponent-and-jquery/

We had to upgrade to the latest nightly to get the TermsComponent to
work.

Good Luck!

Todd

On Wed, Sep 9, 2009 at 5:17 AM, dharhsana
rekha.dharsh...@gmail.com
wrote:

Hi,

I have a requirement on Autocompletion search , iam using solr 1.4.

Could you please tell me how you worked on that Terms component
using
solr
1.4,
i could'nt find terms component in solr 1.4 which i have
downloaded,is
there
anyother configuration should be done.

Do you have code for autocompletion, please share wih me..

Regards
Rekha

tbenge wrote:

Hi,

I was looking at TermsComponent in Solr 1.4 as a way of building a
autocomplete function. I have a prototype working but noticed
that
terms
that have whitespace in them when indexed are absent the
whitespace
when
returned from the TermsComponent.

Any ideas on why that may be happening? Am I just missing a

configuration

option?

Thanks,

Todd

--
View this message in context:
http://www.nabble.com/TermsComponent-tp25302503p25362829.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using
Solr/Lucene:
http://www.lucidimagination.com/search

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using
Solr/Lucene:
http://www.lucidimagination.com/search

query parser question

2009-09-10 Thread Joe Calderon

i have field called text_stem that has a kstemmer on it, im having
trouble matching wildcard searches on a word that got stemmed

for example i index the word america's, which according to
analysis.jsp after stemming gets indexed as america

when matching i do a query like myfield:(ame*) which matches the
indexed term, this all works fine until the query becomes
myfield:(america's*) at which point it doesnt match, however if i
remove the wildcard like myfield:(america's) the it works again

its almost like the term doesnt get stemmed when using a wildcard

im using 1.4 nightly, is this the correct behaviour, is there
something i should do differently?

in the mean time ive added americas as protected word in the
kstemmer but im afraid of more edge cases that will come up

--joe

RE: OutOfMemory error on solr 1.3

2009-09-10 Thread Francis Yakin

SO, do you think increasing the JVM will help? We also have 
queryResultMaxDocsCached500/queryResultMaxDocsCached in solrconfig.xml
Originally was set to queryResultMaxDocsCached200/queryResultMaxDocsCached

Currently we give solr 1.5GB for Xms and Xmx, we use jrockit version 1.5.0_15

4 S root 12543 12495 16  76   0 - 848974 184466 Jul20 ?   8-11:12:03 
/opt/bea/jrmc-3.0.3-1.5.0/bin/java -Xms1536m -Xmx1536m -Xns:128m -Xgc:gencon 
-Djavelin.jsp.el.elcache=4096 
-Dsolr.solr.home=/opt/apache-solr-1.3.0/example/solr

Francis

-Original Message-
From: Constantijn Visinescu [mailto:baeli...@gmail.com]
Sent: Wednesday, September 09, 2009 11:35 PM
To: solr-user@lucene.apache.org
Subject: Re: OutOfMemory error on solr 1.3

Just wondering, how much memory are you giving your JVM ?

On Thu, Sep 10, 2009 at 7:46 AM, Francis Yakin fya...@liquid.com wrote:


 I am having OutOfMemory error on our slaves server, I would like to know if
 someone has the same issue and have the solution for this.

 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@96cd2ffc:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
 SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size:
 441216, Num elements: 55150
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@519116e0:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@74dc52fa:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@d0dd3e28:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@b6dfa5bc:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@482b13ef:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@2309438c:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@277bd48c:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
 Exception in thread [ACTIVE] ExecuteThread: '7' for queue:
 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
 Exception in thread [ACTIVE] ExecuteThread: '8' for queue:
 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
 Exception in thread [ACTIVE] ExecuteThread: '10' for queue:
 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
 Exception in thread [ACTIVE] ExecuteThread: '11' for queue:
 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@41405463:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 751552, Num elements: 187884
  java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208,
 Num elements: 8192
 java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208,
 Num elements: 8192
 java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5096,
 Num elements: 2539
 java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5400,
 Num elements: 2690
 Sep 7, 2009 7:22:50 PM GMT Warning DeploymentService BEA-290064
 Deployment service servlet encountered an Exception while handling the
 deployment service message for request id -1 from server AdminServer.
 Exception is: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
 size: 4368, Num elements: 2174
 SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size:
 14140768, Num elements: 3535188
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@8dbcc7ab:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
 java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5320,
 Num elements: 2649
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@4d0c6fc5:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 751560, Num elements: 187885
 java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 16400,
 Num elements: 8192
 SEVERE: Error during

Re: query parser question

On Thu, Sep 10, 2009 at 1:28 PM, Joe Calderon calderon@gmail.com wrote:
 i have field called text_stem that has a kstemmer on it, im having
 trouble matching wildcard searches on a word that got stemmed

 for example i index the word america's, which according to
 analysis.jsp after stemming gets indexed as america

 when matching i do a query like myfield:(ame*) which matches the
 indexed term, this all works fine until the query becomes
 myfield:(america's*) at which point it doesnt match, however if i
 remove the wildcard like myfield:(america's) the it works again

 its almost like the term doesnt get stemmed when using a wildcard

Correct - it's not stemmed.  If it were stemmed, there would be
multiple cases where that wouldn't work either.

For example, with the porter stemmer, any-ani and anywhere-anywher

So if you had a document with anywhere, a prefix query of any*
wouldn't work if you stemmed it, and would match other things like
animal.

-Yonik
http://www.lucidimagination.com

Re: Solr http post performance seems slow - help?

2009-09-10 Thread Dan A. Dickey

On Thursday 10 September 2009 08:39:38 am Yonik Seeley wrote:
 On Thu, Sep 10, 2009 at 9:13 AM, Dan A. Dickey dan.dic...@savvis.net wrote:
  I'm posting documents to Solr using http (curl) from
  C++/C code and am seeing approximately 3.3 - 3.4
  documents per second being posted.  Is this to be expected?
 
 No, that's very slow.
 Are you using libcurl, or actually forking a new process for every document?

I'm using libcurl and not forking.

 Are you committing on every document?

No.

 If you can, using Java would make your life much easier since you
 could use the SolrJ client and it's binary protocol for indexing.

As much as I'd like to, I can't.  At this point in time it would take far
too much code restructuring and rewriting.  There is a database involved,
and some senseless portability library being used - though we only run on
Linux at this point in time.  It's just too much work to switch over to using
Java, for now.
-Dan

-- 
Dan A. Dickey | Senior Software Engineer

Savvis
10900 Hampshire Ave. S., Bloomington, MN  55438
Office: 952.852.4803 | Fax: 952.852.4951
E-mail: dan.dic...@savvis.net

Re: Solr http post performance seems slow - help?

2009-09-10 Thread Dan A. Dickey

On Thursday 10 September 2009 09:10:27 am Walter Underwood wrote:
 How big are your documents?

For the most part, I'm just indexing metadata that has been pulled from
the documents.  I think I have currently about 40 or so fields that I'm setting.
When the document is an actual document - pdf, doc, etc... I use the DIH
to extract stuff and also set the metadata then.

 Is your index on local disk or network- 
 mounted disk?

I'm basically pulling the metadata info from a database and the documents
themselves are shared via NFS to the Solr indexer.
-Dan

 
 wunder
 
 On Sep 10, 2009, at 6:39 AM, Yonik Seeley wrote:
 
  On Thu, Sep 10, 2009 at 9:13 AM, Dan A. Dickey  
  dan.dic...@savvis.net wrote:
  I'm posting documents to Solr using http (curl) from
  C++/C code and am seeing approximately 3.3 - 3.4
  documents per second being posted.  Is this to be expected?
 
  No, that's very slow.
  Are you using libcurl, or actually forking a new process for every  
  document?
  Are you committing on every document?
 
  If you can, using Java would make your life much easier since you
  could use the SolrJ client and it's binary protocol for indexing.
 
  -Yonik
  http://www.lucidimagination.com
 
 
 

-- 
Dan A. Dickey | Senior Software Engineer

Savvis
10900 Hampshire Ave. S., Bloomington, MN  55438
Office: 952.852.4803 | Fax: 952.852.4951
E-mail: dan.dic...@savvis.net

Re: Default Query Type For Facet Queries

2009-09-10 Thread Stephen Duncan Jr

If using {!type=customparser} is the only way now, should I file an issue to
make the default configurable?

-- 
Stephen Duncan Jr
www.stephenduncanjr.com

On Thu, Sep 3, 2009 at 11:23 AM, Stephen Duncan Jr stephen.dun...@gmail.com
 wrote:

 We have a custom query parser plugin registered as the default for
 searches, and we'd like to have the same parser used for facet.query.

 Is there a way to register it as the default for FacetComponent in
 solrconfig.xml?

 I know I can add {!type=customparser} to each query as a workaround, but
 I'd rather register it in the config that make my code send that and strip
 it off on every facet query.

 --
 Stephen Duncan Jr
 www.stephenduncanjr.com

RE: Solr http post performance seems slow - help?

2009-09-10 Thread Walter Underwood

What kind of storage is used for the Solr index files? When I tested it, NFS
was 100X slower than local disk.

wunder 

-Original Message-
From: Dan A. Dickey [mailto:dan.dic...@savvis.net] 
Sent: Thursday, September 10, 2009 11:15 AM
To: solr-user@lucene.apache.org
Cc: Walter Underwood
Subject: Re: Solr http post performance seems slow - help?

On Thursday 10 September 2009 09:10:27 am Walter Underwood wrote:
 How big are your documents?

For the most part, I'm just indexing metadata that has been pulled from
the documents.  I think I have currently about 40 or so fields that I'm
setting.
When the document is an actual document - pdf, doc, etc... I use the DIH
to extract stuff and also set the metadata then.

 Is your index on local disk or network- 
 mounted disk?

I'm basically pulling the metadata info from a database and the documents
themselves are shared via NFS to the Solr indexer.
-Dan

 wunder

 On Sep 10, 2009, at 6:39 AM, Yonik Seeley wrote:

  On Thu, Sep 10, 2009 at 9:13 AM, Dan A. Dickey  
  dan.dic...@savvis.net wrote:
  I'm posting documents to Solr using http (curl) from
  C++/C code and am seeing approximately 3.3 - 3.4
  documents per second being posted.  Is this to be expected?

  No, that's very slow.
  Are you using libcurl, or actually forking a new process for every  
  document?
  Are you committing on every document?

  If you can, using Java would make your life much easier since you
  could use the SolrJ client and it's binary protocol for indexing.

  -Yonik
  http://www.lucidimagination.com

-- 
Dan A. Dickey | Senior Software Engineer

Savvis
10900 Hampshire Ave. S., Bloomington, MN  55438
Office: 952.852.4803 | Fax: 952.852.4951
E-mail: dan.dic...@savvis.net

Query runs faster without filter queries?

Hi all!
I'm trying to measure the query response time when using just a query and
when using some filter queries. From what I read and understand adding
filter query should boost the query response time. I used luke to understand
over which fields I should use filter query (those that have few unique
terms, in my case 2 fields of 30 and 400 unique fields). I'm using solr 1.3.
In order to test the query performance I disabled queryCache and
documentCache, so I just have filterCache enabled.I did that because I
wanted to be sure that there is no caching when I measure my queries. I left
filterCache because it makes sense since filter query uses that.

When I first execute my query without filter cache it runs in 400ms, next
execution of the same query around 20ms.
When I first execute my query with filter cache it runs in 500ms, next
execution of the same query around 50ms.

Why the query with filter query runs slower than the query without filter
query? Shouldn't it be the other way around?

My index is around 12M documents. My filterCache max size is set to 4 (I
think more than enough). The fields that I use as filter queries are integer
and in my query I search over a tokenized text field.

What do you think?

Thanks a lot,

Jonathan

Re: Query runs faster without filter queries?

Try 1.4
http://www.lucidimagination.com/blog/2009/05/27/filtered-query-performance-increases-for-solr-14/

-Yonik
http://www.lucidimagination.com

On Thu, Sep 10, 2009 at 4:35 PM, Jonathan Ariel ionat...@gmail.com wrote:
Hi all!
I'm trying to measure the query response time when using just a query and
when using some filter queries. From what I read and understand adding
filter query should boost the query response time. I used luke to understand
over which fields I should use filter query (those that have few unique
terms, in my case 2 fields of 30 and 400 unique fields). I'm using solr 1.3.
In order to test the query performance I disabled queryCache and
documentCache, so I just have filterCache enabled.I did that because I
wanted to be sure that there is no caching when I measure my queries. I left
filterCache because it makes sense since filter query uses that.

When I first execute my query without filter cache it runs in 400ms, next
execution of the same query around 20ms.
When I first execute my query with filter cache it runs in 500ms, next
execution of the same query around 50ms.

Why the query with filter query runs slower than the query without filter
query? Shouldn't it be the other way around?

My index is around 12M documents. My filterCache max size is set to 4 (I
think more than enough). The fields that I use as filter queries are integer
and in my query I search over a tokenized text field.

What do you think?

Thanks a lot,

Jonathan

Re: Query runs faster without filter queries?

2009-09-10 Thread Uri Boness

If I recall correctly, in solr 1.3 there was an issue where filters 
didn't really behaved as they should have. Basically, if you had a query 
and filters defined, the query would have executed normally and only 
after that the filter would be applied. AFAIK this is fixed in 1.4 where 
now the documents which are defined by the filters are skipped during 
the query execution.


Uri

Jonathan Ariel wrote:

Hi all!
I'm trying to measure the query response time when using just a query and
when using some filter queries. From what I read and understand adding
filter query should boost the query response time. I used luke to understand
over which fields I should use filter query (those that have few unique
terms, in my case 2 fields of 30 and 400 unique fields). I'm using solr 1.3.
In order to test the query performance I disabled queryCache and
documentCache, so I just have filterCache enabled.I did that because I
wanted to be sure that there is no caching when I measure my queries. I left
filterCache because it makes sense since filter query uses that.

When I first execute my query without filter cache it runs in 400ms, next
execution of the same query around 20ms.
When I first execute my query with filter cache it runs in 500ms, next
execution of the same query around 50ms.

Why the query with filter query runs slower than the query without filter
query? Shouldn't it be the other way around?

My index is around 12M documents. My filterCache max size is set to 4 (I
think more than enough). The fields that I use as filter queries are integer
and in my query I search over a tokenized text field.

What do you think?

Thanks a lot,

Jonathan

shards and facet_count

2009-09-10 Thread Paul Rosen


Hi again,

I've mostly gotten the multicore working except for one detail.

(I'm using solr 1.3 and solr-ruby 0.0.6 in a rails project.)

I've done a few queries and I appear to be able to get hits from either 
core. (yeah!)


I'm forming my request like this:

req = Solr::Request::Standard.new(
  :start = start,
  :rows = max,
  :sort = sort_param,
  :query = query,
  :filter_queries = filter_queries,
  :field_list = @field_list,
  :facets = {:fields = @facet_fields, :mincount = 1, :missing = 
true, :limit = -1},

  :highlighting = {:field_list = ['text'], :fragment_size = 600},
  :shards = @cores)

If I leave :shards = @cores out, then the response includes:

'facet_counts' = {
  'facet_dates' = {},
  'facet_queries' = {},
  'facet_fields' = { 'myfacet' = [ etc...], etc... }

which is what I expect.

If I add the :shards = @cores back in (so that I'm doing the exact 
request above), I get:


'facet_counts' = {
  'facet_dates' = {},
  'facet_queries' = {},
  'facet_fields' = {}

so I've lost my facet information.

Why would it correctly find my documents, but not report the facet info?

Thanks,
Paul

Highlighting in SolrJ?

2009-09-10 Thread Paul Tomblin

Can somebody point me to some sample code for using highlighting in
SolrJ?  I understand the highlighted versions of the field comes in a
separate NamedList?  How does that work?

-- 
http://www.linkedin.com/in/paultomblin

Re: about SOLR-1395 integration with katta

2009-09-10 Thread Jason Rutherglen

Hi Zhong,

For #2 the existing patch SOLR-1395 is a good start.  It should be
fairly simple to deploy indexes and distribute them to Solr Katta
nodes/servers.

-J

On Wed, Sep 9, 2009 at 11:41 PM, Zhenyu Zhong zhongresea...@gmail.com wrote:
 Jason,

 Thanks for the reply.

 In general, I would like to use katta to handle the management overhead such
 as single point of failure as well as the distributed index deployment. In
 the same time, I still want to use nice search features provided by solr.

 Basically, I would like to try both on the indexing part
 1. Using Hadoop to lauch MR jobs to build index. Then deploy the index to
 katta
 2. Using the new patch SOLR-1935
    Based on my understandings, it seems to support index building with
 Hadoop. I assume the index would have all the necessary information such as
 solr index schema so that I can still use the nice search features provided
 by solr.

 On the search part,
 I would like to try the distributed search on solr-index which is deployed
 on katta if that is possible.

 I would be very appreciated if you could share some thoughts with me.

 thanks
 zhong



 On Wed, Sep 9, 2009 at 6:06 PM, Jason Rutherglen jason.rutherg...@gmail.com
 wrote:

 Hi Zhong,

 It's a very new patch. I'll update the issue as we start the
 wiki page.

 I've been working on indexing in Hadoop in conjunction with
 Katta, which is different (it sounds) than your use case where
 you have prebuilt indexes you simply want to distributed using
 Katta?

 -J

 On Wed, Sep 9, 2009 at 12:33 PM, Zhenyu Zhong zhongresea...@gmail.com
 wrote:
  Hi,
 
  It is really exciting to see this integration coming out.
  May I ask how I need to make changes to be able to deploy Solr index on
  katta servers?
  Are there any tutorials?
 
  thanks
  zhong

Re: Highlighting in SolrJ?

2009-09-10 Thread Jay Hill

Set up the query like this to highlight a field named content:

SolrQuery query = new SolrQuery();
query.setQuery(foo);

query.setHighlight(true).setHighlightSnippets(1); //set other params as
needed
query.setParam(hl.fl, content);

QueryResponse queryResponse =getSolrServer().query(query);

Then to get back the highlight results you need something like this:

IteratorSolrDocument iter = queryResponse.getResults();

while (iter.hasNext()) {
  SolrDocument resultDoc = iter.next();

  String content = (String) resultDoc.getFieldValue(content));
  String id = (String) resultDoc.getFieldValue(id); //id is the
uniqueKey field

  if (queryResponse.getHighlighting().get(id) != null) {
ListString highightSnippets =
queryResponse.getHighlighting().get(id).get(content);
  }
}

Hope that gets you what you need.

-Jay
http://www.lucidimagination.com

On Thu, Sep 10, 2009 at 3:19 PM, Paul Tomblin ptomb...@xcski.com wrote:

 Can somebody point me to some sample code for using highlighting in
 SolrJ?  I understand the highlighted versions of the field comes in a
 separate NamedList?  How does that work?

 --
 http://www.linkedin.com/in/paultomblin

Re: Highlighting in SolrJ?

2009-09-10 Thread Paul Tomblin

If I set snippets to 9 and mergeContinuous to true, will I get
the entire contents of the field with all the search terms replaced?
I don't see what good it would be just getting one line out of the
whole field as a snippet.

On Thu, Sep 10, 2009 at 7:45 PM, Jay Hill jayallenh...@gmail.com wrote:
 Set up the query like this to highlight a field named content:

    SolrQuery query = new SolrQuery();
    query.setQuery(foo);

    query.setHighlight(true).setHighlightSnippets(1); //set other params as
 needed
    query.setParam(hl.fl, content);

    QueryResponse queryResponse =getSolrServer().query(query);

 Then to get back the highlight results you need something like this:

    IteratorSolrDocument iter = queryResponse.getResults();

    while (iter.hasNext()) {
      SolrDocument resultDoc = iter.next();

      String content = (String) resultDoc.getFieldValue(content));
      String id = (String) resultDoc.getFieldValue(id); //id is the
 uniqueKey field

      if (queryResponse.getHighlighting().get(id) != null) {
        ListString highightSnippets =
 queryResponse.getHighlighting().get(id).get(content);
      }
    }

 Hope that gets you what you need.

 -Jay
 http://www.lucidimagination.com

 On Thu, Sep 10, 2009 at 3:19 PM, Paul Tomblin ptomb...@xcski.com wrote:

 Can somebody point me to some sample code for using highlighting in
 SolrJ?  I understand the highlighted versions of the field comes in a
 separate NamedList?  How does that work?

 --
 http://www.linkedin.com/in/paultomblin





-- 
http://www.linkedin.com/in/paultomblin

Re: Single Core or Multiple Core?

Yes, it seems like I don't need to split. I could use different commit
times. In my use case it is too often and I could have a different commit
time on a country basis.Your questions made me rethink the need of splitting
into cores.

Thanks

On Fri, Sep 4, 2009 at 5:38 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Fri, Sep 4, 2009 at 4:35 AM, Jonathan Ariel ionat...@gmail.com wrote:

  It seems like it is really hard to decide when the Multiple Core solution
  is
  more appropriate.As I could understand from this list and wiki the
 Multiple
  Core feature was designed to address the need of handling different sets
 of
  data within the same solr instance, where the sets of data don't need to
 be
  joined.
 

 Correct. It is also useful when you don't want to setup multiple boxes or
 tomcats for each Solr.


  In my case the documents are of a specific site and country. So document
 A
  can be of Site 1 / Country 1, B of Site 2 / Country 1, C of Site 1 /
  Country
  2, and so on.
  For the use cases of my application I will never query across countries
 or
  sites. I will always have to provide to the query the country id and the
  site id.
  Would you suggest to split my data into cores? I have few sites (around
 20)
  and more countries (around 90).
  Should I split my data into sites (around 20 cores) and within a core
  filter
  by site? Should I split by Site and Country (around 1800 cores)?
  What should I consider when splitting my data into multiple cores?
 
 
 The first question is why do you want to split at all? Is the schema or
 solrconfig different? Are the different sites or countries updated at
 different times? Is the combined index very big that the response times
 jump
 wildly when all the caches are thrown out if documents related to one site
 or country are updated? Does warmup or optimize or replication take too
 much
 time with one big index?

 Each core will have its own configuration files (maintenance) and you need
 to setup replication separately for each core (which is a pain with the
 script based replication). Also note that by keeping all cores in one
 tomcat
 (one JVM), a stop-the-world GC will stop all cores which is not the case
 when using separate JVMs for each index/core.

 --
 Regards,
 Shalin Shekhar Mangar.

Re: Query runs faster without filter queries?

Thanks! I don't think I can use an unreleased version of solr even is it's
stable enough (crazy infrastructure guys) but I might be able to apply the 2
patches mentioned in the link you sent. I will try it in my local copy of
solr and see if it improves and let you know.
Thanks!

On Thu, Sep 10, 2009 at 5:43 PM, Uri Boness ubon...@gmail.com wrote:

 If I recall correctly, in solr 1.3 there was an issue where filters didn't
 really behaved as they should have. Basically, if you had a query and
 filters defined, the query would have executed normally and only after that
 the filter would be applied. AFAIK this is fixed in 1.4 where now the
 documents which are defined by the filters are skipped during the query
 execution.

 Uri


 Jonathan Ariel wrote:

 Hi all!
 I'm trying to measure the query response time when using just a query and
 when using some filter queries. From what I read and understand adding
 filter query should boost the query response time. I used luke to
 understand
 over which fields I should use filter query (those that have few unique
 terms, in my case 2 fields of 30 and 400 unique fields). I'm using solr
 1.3.
 In order to test the query performance I disabled queryCache and
 documentCache, so I just have filterCache enabled.I did that because I
 wanted to be sure that there is no caching when I measure my queries. I
 left
 filterCache because it makes sense since filter query uses that.

 When I first execute my query without filter cache it runs in 400ms, next
 execution of the same query around 20ms.
 When I first execute my query with filter cache it runs in 500ms, next
 execution of the same query around 50ms.

 Why the query with filter query runs slower than the query without filter
 query? Shouldn't it be the other way around?

 My index is around 12M documents. My filterCache max size is set to 4
 (I
 think more than enough). The fields that I use as filter queries are
 integer
 and in my query I search over a tokenized text field.

 What do you think?

 Thanks a lot,

 Jonathan

Re: SnowballPorterFilterFactory stemming word question

2009-09-10 Thread darniz

Thanks Yonik
i have a task where my user is giving me 20 words of english dictionary and
i have to run a program and generate a report with all stemmed words.

I have to use EnglishPorterFilterFactory and SnowballPorterFilterFactory to
check which one is faster and gets the best results

Should i write a java module and use the library which comes with solr.
is there any code snipped which i can use

If i can get a faint idea of how to do it is to create EnglishPorterFilter
from EnglishPorterFilterFactory by passing a tokenizer etc...

i will appreciate if some one can give me a hint on this.

thanks
darniz

Yonik Seeley-2 wrote:

On Mon, Sep 7, 2009 at 2:49 AM, darnizrnizamud...@edmunds.com wrote:
Does solr provide any implementation for dictionary stemmer, please let
me
know

The Krovetz stemmer is dictionary based (english only):
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem

But from your original question, maybe you are concerned when the
stemmer doesn't return real words? For normal search, don't be.
During index time, words are stemmed, and then later the query is
stemmed. If the results match up, you're good. For example, a
document containing the word machines may stem to machin and then
a query of machined will stem to machin and thus match the
document.

-Yonik
http://www.lucidimagination.com

--
View this message in context:
http://www.nabble.com/SnowballPorterFilterFactory-stemming-word-question-tp25180310p25393323.html
Sent from the Solr - User mailing list archive at Nabble.com.

Using EnglishPorterFilterFactory in code

2009-09-10 Thread darniz


hello
i have a task where my user is giving me 20 words of english dictionary and
i have to run a program and generate a report with all stemmed words.

I have to use EnglishPorterFilterFactory and SnowballPorterFilterFactory to
check which one is faster and gets the best results

Should i write a java module and use the library which comes with solr.
is there any code snipped which i can use

Is there any utiltiy which solr provides. 

If i can get a faint idea of how to do it is to create EnglishPorterFilter
from EnglishPorterFilterFactory by passing a tokenizer etc...

i will appreciate if some one can give me a hint on this.

thanks
darniz

-- 
View this message in context: 
http://www.nabble.com/Using-EnglishPorterFilterFactory-in-code-tp25393325p25393325.html
Sent from the Solr - User mailing list archive at Nabble.com.

Very slow first query

Hi!Why would it take for the first query that I execute almost 60 seconds to
run and after that no more than 50ms? I disabled all my caching to check if
it is the reason for the subsequent fast responses, but the same happens.
I'm using solr 1.3.
Something really strange is that it doesn't happen with all the queries. It
is happening with a query that filters some integer and string fields joined
by an AND operator. Something like A:1 AND B:2 AND (C:3 AND D:CA) (exact
match).
My index is around 1200M documents.

Thanks,

Jonathan

Re: An issue with commit/ using Solr Cell and multiple files

It is a windows problem (or curl, whatever).  This works with double-quotes.

C:\Users\work\Downloads\cygwin\home\work\curl-7.19.4\curl.exe
http://localhost:8983/solr/update --data-binary commit/ -H
Content-type:text/xml; charset=utf-8
Single-quotes inside double-quotes should work: commit
waitFlush='false'/


On Tue, Sep 8, 2009 at 11:59 AM, caman aboxfortheotherst...@gmail.comwrote:


 seems to be an error with curl




 Kevin Miller-17 wrote:
 
  I am getting the same error message.  I am running Solr on a Windows
  machine.  Is the commit command a curl command or is it a Solr command?
 
 
  Kevin Miller
  Web Services
 
  -Original Message-
  From: Grant Ingersoll [mailto:gsing...@apache.org]
  Sent: Tuesday, September 08, 2009 12:52 PM
  To: solr-user@lucene.apache.org
  Subject: Re: An issue with commit/ using Solr Cell and multiple files
 
  solr/examples/exampledocs/post.sh does:
  curl $URL --data-binary 'commit/' -H 'Content-type:text/xml;
  charset=utf-8'
 
  Not sure if that helps or how it compares to the book.
 
  On Sep 8, 2009, at 1:48 PM, Kevin Miller wrote:
 
  I am using the Solr nightly build from 8/11/2009.  I am able to index
  my documents using the Solr Cell but when I attempt to send the commit
 
  command I get an error.  I am using the example found in the Solr 1.4
  Enterprise Search Server book (recently released) found on page 84.
  It
  shows to commit the changes as follows (I am showing where my files
  are located not the example in the book):
 
  c:\curl\bin\curl http://echo12:8983/solr/update/ -H Content-Type:
  text/xml --data-binary 'commit waitFlush=false/'
 
  this give me this error: The system cannot find the file specified.
 
  I get the same error when I modify it to look like the following:
 
  c:\curl\bin\curl http://echo12:8983/solr/update/ 'commit
  waitFlush=false/'
  c:\curl\bin\curl http://echo12:8983/solr/update/; -H Content-Type:
  text/xml --data-binary 'commit waitFlush=false/'
  c:\curl\bin\curl http://echo12:8983/solr/update/ 'commit /'
  c:\curl\bin\curl http://echo12:8983/solr/update/; 'commit /'
 
  I am using the example configuration in Solr so my documents are found
 
  in the exampledocs folder also my curl program in located in the root
  directory which is the reason for the way the curl command is being
  executed.
 
  I would appreciate any information on where to look or how to get the
  commit command to execute after indexing multiple files.
 
  Kevin Miller
  Oklahoma Tax Commission
  Web Services
 
  --
  Grant Ingersoll
  http://www.lucidimagination.com/
 
  Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
  using Solr/Lucene:
  http://www.lucidimagination.com/search
 
 
 

 --
 View this message in context:
 http://www.nabble.com/An-issue-with-%3Ccommit-%3E-using-Solr-Cell-and-multiple-files-tp25350995p25352122.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Lance Norskog
goks...@gmail.com

Re: Very slow first query

At 12M documents, operating system cache can be significant.
Also, the first time you sort or facet on a field, a field cache
instance is populated which can take a lot of time.  You can prevent
slow first queries by configuring a static warming query in
solrconfig.xml that includes the common sorts and facets.

-Yonik
http://www.lucidimagination.com

On Thu, Sep 10, 2009 at 8:55 PM, Jonathan Ariel ionat...@gmail.com wrote:
 Hi!Why would it take for the first query that I execute almost 60 seconds to
 run and after that no more than 50ms? I disabled all my caching to check if
 it is the reason for the subsequent fast responses, but the same happens.
 I'm using solr 1.3.
 Something really strange is that it doesn't happen with all the queries. It
 is happening with a query that filters some integer and string fields joined
 by an AND operator. Something like A:1 AND B:2 AND (C:3 AND D:CA) (exact
 match).
 My index is around 1200M documents.

 Thanks,

 Jonathan

Re: Dynamically building the value of a field upon indexing

This has to be done by an UpdateRequestProcessor

http://wiki.apache.org/solr/UpdateRequestProcessor




On Tue, Sep 8, 2009 at 3:34 PM, Villemos, Gert gert.ville...@logica.comwrote:

 I would like to build the value of a field based on the value of multiple
 other fields at submission time. I.e. I would like to submit a document such
 as;

 field name=field1foo/field
 field name=field2baa/field

 And would like SOLR to store the document as

 field name=field1foo/field
 field name=field2baa/field
 field name=aggregatedfoo:baa/field

 Just to complicate matters I would like the aggregated field to be the
 unique key.

 Is this possible?

 Thanks,
 Gert.


 Please help Logica to respect the environment by not printing this email  /
 Pour contribuer comme Logica au respect de l'environnement, merci de ne pas
 imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen
 Sie so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a
 respeitar o ambiente nao imprimindo este correio electronico.



 This e-mail and any attachment is for authorised use by the intended
 recipient(s) only. It may contain proprietary material, confidential
 information and/or be subject to legal privilege. It should not be copied,
 disclosed to, retained or used by, any other party. If you are not an
 intended recipient then please promptly delete this e-mail and any
 attachment and all copies and inform the sender. Thank you.




-- 
Lance Norskog
goks...@gmail.com

RE: Extract info from parent node during data import

2009-09-10 Thread venn hardy


Hi Fergus,

When I debugged in the development console 
http://localhost:9080/solr/admin/dataimport.jsp?handler=/dataimport

I had no problems. Each category/item seems to be only indexed once, and no 
parent fields are available (except the category name).

I am not entirely sure how the forEach statement works, but my interpretation 
of forEach=/document/category/item | /document/category is something like 
this:

1. Whenever DIH encounters a document/category it will extract the 
/document/category/

name field as a common field
2. Whenever DIH encounters a document/category/item it will extract all of the 
item fields.
3. When all fields have been encountered, save the document in solr and go to 
the next category/item

 
 Date: Thu, 10 Sep 2009 14:19:31 +0100
 To: solr-user@lucene.apache.org
 From: fer...@twig.me.uk
 Subject: RE: Extract info from parent node during data import
 
 Hi Paul,
 The forEach=/document/category/item | /document/category/name didn't work 
 (no categoryname was stored or indexed).
 However forEach=/document/category/item | /document/category seems to work 
 well. I am not sure why category on its own works, but not category/name...
 But thanks for tip. It wasn't as painful as I thought it would be.
 Venn
 
 Hmmm, I had bother with this. Although each occurance of 
 /document/category/item 
 causes a new solr document to indexed, that document contained all the fields 
 from
 the parent element as well.
 
 Did you see this?
 
 
  From: noble.p...@corp.aol.com
  Date: Thu, 10 Sep 2009 09:58:21 +0530
  Subject: Re: Extract info from parent node during data import
  To: solr-user@lucene.apache.org
  
  try this
  
  add two xpaths in your forEach
  
  forEach=/document/category/item | /document/category/name
  
  and add a field as follows
  
  field column=catgoryname xpath =/document/category/name
  commonField=true/
  
  Please try it out and let me know.
  
  On Thu, Sep 10, 2009 at 7:30 AM, venn hardy venn.ha...@hotmail.com wrote:
  
   Hello,
  
  
  
   I am using SOLR 1.4 (from nighly build) and its URLDataSource in 
   conjunction with the XPathEntityProcessor. I have successfully imported 
   XML content, but I think I may have found a limitation when it comes to 
   the commonField attribute in the DataImportHandler.
  
  
  
   Before writing my own parser to read in a whole XML document, I thought 
   I'd post the question here (since I got some great advice last time).
  
  
  
   The bulk of my content is contained within each item tag. However, 
   each item has a parent called category and each category has a name 
   which I would like to import. In my forEach loop I specify the 
   /document/category/item as the collection of items I am interested in. 
   Is there anyway to extract an element from underneath a parent node? To 
   be a more more specific (see eg xml below). I would like to index the 
   following:
  
   - category: Category 1; id: 1; author: Author 1
  
   - category: Category 1; id: 2; author: Author 2
  
   - category: Category 2; id: 3; author: Author 3
  
   - category: Category 2; id: 4; author: Author 4
  
  
  
   Any ideas on how I can get to a parent node from within a child during 
   data import? If it cant be done, what do you suggest would be the best 
   way so I can keep using the DataImportHandler... would XSLT be a good 
   idea to 'flatten out' the structure a bit?
  
  
  
   Thanks
  
  
  
   This is what my XML document looks like:
  
   document
   category
   nameCategory 1/name
   item
   id1/id
   authorAuthor 1/author
   /item
   item
   id2/id
   authorAuthor 2/author
   /item
   /category
   category
   nameCategory 2/name
   item
   id3/id
   authorAuthor 3/author
   /item
   item
   id4/id
   authorAuthor 4/author
   /item
   /category
   /document
  
  
  
   And this is what my dataConfig looks like:
   dataConfig
   dataSource type=URLDataSource /
   document
   entity name=archive pk=id 
   url=http://localhost:9080/data/20090817070752.xml; 
   processor=XPathEntityProcessor forEach=/document/category/item 
   transformer=DateFormatTransformer stream=true 
   dataSource=dataSource
   field column=category xpath=/document/category/name 
   commonField=true /
   field column=id xpath=/document/category/item/id /
   field column=author xpath=/document/category/item/author /
   /entity
   /document
   /dataConfig
  
  
  
   This is how I have specified my schema
   fields
   field name=id type=string indexed=true stored=true 
   required=true /
   field name=author type=string indexed=true stored=true/
   field name=category type=string indexed=true stored=true/
   /fields
  
   uniqueKeyid/uniqueKey
   defaultSearchFieldid/defaultSearchField
  
  
  
  
  
  
   _
   Need a place to rent, buy or share? Let us find your next place for you!
   http://clk.atdmt.com/NMN/go/157631292/direct/01/
  
  
  
  --

Re: Date Faceting and Double Counting

datefield:[X TO* Y] for X to Y-0....1

This would be backwards-compatible. {} are used for other things and lexing
is a dying art. Using a * causes mistakes to trigger wildcard syntaxes,
which will fail loudly.

On Tue, Sep 8, 2009 at 5:20 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 : I ran into that problem as well but the solution was provided to me by
 : this very list :) See
 : http://www.nabble.com/Range-queries-td24057317.html It's not the
 : cleanest solution, but as long as you know what you're doing it's not
 : that bad.

 Hmmm... yeah, that's a total hack.  one of these days we really need to
 fix the lucene query parser grammer so inclusive/exclusive can be
 different for hte upper/lower bounds...

datefield:[NOW/DAY TO NOW/DAY+1DAY}


 -Hoss




-- 
Lance Norskog
goks...@gmail.com

Re: Why dismax isn't the default with 1.4 and why it doesn't support fuzzy search ?

A QueryParser is a Lucene class that parses a string into a tree of query
objects.

A request handler in solrconfig.xml describes a Solr RequestHandler object.
This object binds strings into http parameter strings. If a request handler
name is /abc then it is called by
http://localhot:8983/solr/abc but if there is no slash, the name abc is
available when some other request handler is called. Available means that
some other code can search for the name. In qt=dismax, the code that
searches for qt knows that dismax is a requesthandler.

(It all made sense when I started typing ...)


On 9/9/09, Villemos, Gert gert.ville...@logica.com wrote:

 Sorry for being a bit dim, I dont understand this;

 Looking at my default configuration for SOLR, I have a request handler
 named 'dismax' and request handler named 'standard' with the default=true.
 I understand that I can configure the usage of this in the query using the
 qt=dismax or qt=standard (... Or no qt as standard is set to default). And
 if I set the 'defType=dismax' flag in the standard requesthandler then I
 will use the dismax queryparser per default. This far, so good.

 What I dont understand is whether a requesthandler and a queryparser is the
 same thing, i.e. The configuration contains a REQUESTHANDLER with the name
 'dismax', but does not contain a QUERYPARSER with the name 'dismax'. Where
 does the 'dismax' queryparser come from? Do I have to configure this extra?
 Or is it there per default? Or does it come from the 'dismax'
 requesthandler?

 Gert.






 -Original Message-
 From: kaoul@gmail.com [mailto:kaoul@gmail.com] On Behalf Of Erwin
 Sent: Wednesday, September 09, 2009 10:55 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Why dismax isn't the default with 1.4 and why it doesn't
 support fuzzy search ?

 Hi Gert,

 qt=dismax in URL works with Solr 1.3 and 1.4 without further
 configuration. You are right, you should find a dismax query parser in
 solrconfig.xml by default.

 Erwin

 On Wed, Sep 9, 2009 at 7:49 AM, Villemos, Gertgert.ville...@logica.com
 wrote:
  On question to this;
 
  Do you need to explicitly configure a 'dismax' queryparser in the
  solrconfig.xml to enable this, or is a queryparser named 'dismax'
  available per default?
 
  Cheers,
  Gert.
 
 
 
 
  -Original Message-
  From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
  Sent: Wednesday, September 02, 2009 2:44 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Why dismax isn't the default with 1.4 and why it doesn't
  support fuzzy search ?
 
  : The wiki says As of Solr 1.3, the DisMaxRequestHandler is simply
  the
  : standard request handler with the default query parser set to the
  : DisMax Query Parser (defType=dismax).. I just made a checkout of
  svn
  : and dismax doesn't seems to be the default as :
 
  that paragraph doesn't say that dismax is the default handler ... it
  says that using qt=dismax is the same as using qt=standard with the 
  query parser set to be the DisMaxQueryParser (using defType=dismax)
 
 
  so doing this replacement on any URL...
 
 qt=dismax   =  qt=standarddefTYpe=dismax
 
  ...should produce identical results.
 
  : Secondly, I've patched solr with
  : http://issues.apache.org/jira/browse/SOLR-629 as I would like to
  have
  : fuzzy with dismax. I built it with ant example. Now, behavior is
  : still the same, no fuzzy search with dismax (using the qt=dismax
  : parameter in GET URL).
 
  questions/discussion of uncommitted patches is best done in the Jira
  issue wherey ou found the patch ... that way it helps other people
  evaluate the patch, and the author of the patch is more likelye to see
  your feedback.
 
 
  -Hoss
 
 
 
  Please help Logica to respect the environment by not printing this
 email  / Pour contribuer comme Logica au respect de l'environnement, merci
 de ne pas imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus
 und helfen Sie so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a
 Logica a respeitar o ambiente nao imprimindo este correio electronico.
 
 
 
  This e-mail and any attachment is for authorised use by the intended
 recipient(s) only. It may contain proprietary material, confidential
 information and/or be subject to legal privilege. It should not be copied,
 disclosed to, retained or used by, any other party. If you are not an
 intended recipient then please promptly delete this e-mail and any
 attachment and all copies and inform the sender. Thank you.
 
 
 


 Please help Logica to respect the environment by not printing this email  /
 Pour contribuer comme Logica au respect de l'environnement, merci de ne pas
 imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen
 Sie so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a
 respeitar o ambiente nao imprimindo este correio electronico.



 This e-mail and any attachment is for authorised use by the intended
 recipient(s) only. It may contain

Re: Very Urjent

Another, slower way is to create a spell checking dictionary and do spelling
requests on the first few characters the user types.
http://wiki.apache.org/solr/SpellCheckerRequestHandler?highlight=%28spell%29%7C%28checker%29

Another way is to search against facet values with the facet.prefix feature:
http://wiki.apache.org/solr/SimpleFacetParameters?highlight=%28facet%29%7C%28%2A%29#head-021d583a1430f6485c6e929930fceec3e15e1e8a

All of these have the same problem: programmers are all perfect spellers,
while normal people are not. None of these techniques assist normal people
to find homonyms.


On 9/9/09, dharhsana rekha.dharsh...@gmail.com wrote:


 Hi Shalin Shekhar Mangar,

 I got some come from this site

 http://www.mattweber.org/2009/05/02/solr-autosuggest-with-termscomponent-and-jquery/

 When i use that code in my project ,then only i came to know that there is
 no Termscomponent jar or plugin ..

 There is any other way for doing autocompletion search with out terms
 component.

 If so please tell me how to implement it.

 waiting for your reply

 Regards,

 Rekha.



 --
 View this message in context:
 http://www.nabble.com/Very-Urjent-tp25359244p25360892.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Lance Norskog
goks...@gmail.com

Re: Very slow first query

Yes, but in this case the query that I'm executing doesn't have any facet. I
mean for this query I'm not using any filter cache.What does it means
operating system cache can be significant? That my first query uploads a
big chunk on the index into memory (maybe even the entire index)?

On Thu, Sep 10, 2009 at 10:07 PM, Yonik Seeley
yo...@lucidimagination.comwrote:

 At 12M documents, operating system cache can be significant.
 Also, the first time you sort or facet on a field, a field cache
 instance is populated which can take a lot of time.  You can prevent
 slow first queries by configuring a static warming query in
 solrconfig.xml that includes the common sorts and facets.

 -Yonik
 http://www.lucidimagination.com

 On Thu, Sep 10, 2009 at 8:55 PM, Jonathan Ariel ionat...@gmail.com
 wrote:
  Hi!Why would it take for the first query that I execute almost 60 seconds
 to
  run and after that no more than 50ms? I disabled all my caching to check
 if
  it is the reason for the subsequent fast responses, but the same happens.
  I'm using solr 1.3.
  Something really strange is that it doesn't happen with all the queries.
 It
  is happening with a query that filters some integer and string fields
 joined
  by an AND operator. Something like A:1 AND B:2 AND (C:3 AND D:CA)
 (exact
  match).
  My index is around 1200M documents.
 
  Thanks,
 
  Jonathan

Re: Query regarding incremental index replication

There is only one index. The index has newer segments which represent new
records and deletes to old records (sort of). Incremental replication copies
new segments; putting the new segments together with the previous index
makes the new index.

Incremental replication under rsync does work; perhaps it did not work for
you.

If you do not want to store the full index on the indexer, that is a
problem. You will not be able to optimize the index on the indexer and ship
the new index to the slaves.

This has more on large-volume Solr installation design:

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr

On 9/9/09, Silent Surfer silentsurfe...@yahoo.com wrote:

Hi ,

Currently we are using Solr 1.3 and we have the following requirement.

As we need to process very high volumes of documents (of the order of 400
GB per day), we are planning to separate indexer(s) and searcher(s), so that
there won't be performance hit.

Our idea is to have have a set of servers which is used only for indexers
for index creation and then every 5 mins or so, the index will be copied to
the searchers(set of solr servers only for querying). For this we tried to
use the snapshooter,rsysnc etc.

But the problem with this approach is, the same index is present on both
the indexer and searcher, and hence occupying large FS.

What we need is a mechanism, where in the indexer contains only the index
for the past 5 mins(last indexing cycle before the snap shooter is run) and
the searcher should have the accumulated(total) index i.e every 5 mins, we
should be able to move the entire index from indexer to searcher and so on.

The above scenario is slightly different from master/slave implementation,
as on master we want only the latest(WIP) index and the slave should contain
the entire index.

Appreciate if anyone can throw some light on how to achieve this.

Thanks,
sS

--
Lance Norskog
goks...@gmail.com

Re: How to Convert Lucene index files to XML Format

It is best to start off with Solr by playing around with the example in the
example/ directory. Index the data in the example/exampledocs directory, do
some searches, look at the index with the admin/luke page. After that, this
will be much easier.

 To bring your Lucene under Solr, you have to examine the design of the
Lucene index and create a matching Solr schema in solr/conf/schema.xml.


On 9/10/09, busbus balaji.send...@tcs.com wrote:


 Thanks for your reply





  On Sep 10, 2009, at 6:41 AM, busbus wrote:
  Solr defers to Lucene on reading the index.  You just need to tell
  Solr whether the index is a compound file or not and make sure the
  versions are compatible.
 

 This part seems to be the point.
 How to make solr to read lucene index files.
 There is a tag in Solrconfig.xml
 useCompundFile false useCompundFile

 Enable it to true does not seem to be working.

 What else need to be done.

 Should i change the config file or add new tag.

 Also how to check the compatibility of Lucen and solr

 Thanks in advance

 --
 View this message in context:
 http://www.nabble.com/How-to-Convert-Lucene-index-files-to-XML-Format-tp25381017p25382367.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Lance Norskog
goks...@gmail.com

Re: An issue with commit/ using Solr Cell and multiple files

2009-09-10 Thread caman


You are right. 
I got into same thing. Windows curl gave me error but cygwin ran without any
issues.

thanks


Lance Norskog-2 wrote:
 
 It is a windows problem (or curl, whatever).  This works with
 double-quotes.
 
 C:\Users\work\Downloads\cygwin\home\work\curl-7.19.4\curl.exe
 http://localhost:8983/solr/update --data-binary commit/ -H
 Content-type:text/xml; charset=utf-8
 Single-quotes inside double-quotes should work: commit
 waitFlush='false'/
 
 
 On Tue, Sep 8, 2009 at 11:59 AM, caman
 aboxfortheotherst...@gmail.comwrote:
 

 seems to be an error with curl




 Kevin Miller-17 wrote:
 
  I am getting the same error message.  I am running Solr on a Windows
  machine.  Is the commit command a curl command or is it a Solr command?
 
 
  Kevin Miller
  Web Services
 
  -Original Message-
  From: Grant Ingersoll [mailto:gsing...@apache.org]
  Sent: Tuesday, September 08, 2009 12:52 PM
  To: solr-user@lucene.apache.org
  Subject: Re: An issue with commit/ using Solr Cell and multiple files
 
  solr/examples/exampledocs/post.sh does:
  curl $URL --data-binary 'commit/' -H 'Content-type:text/xml;
  charset=utf-8'
 
  Not sure if that helps or how it compares to the book.
 
  On Sep 8, 2009, at 1:48 PM, Kevin Miller wrote:
 
  I am using the Solr nightly build from 8/11/2009.  I am able to index
  my documents using the Solr Cell but when I attempt to send the commit
 
  command I get an error.  I am using the example found in the Solr 1.4
  Enterprise Search Server book (recently released) found on page 84.
  It
  shows to commit the changes as follows (I am showing where my files
  are located not the example in the book):
 
  c:\curl\bin\curl http://echo12:8983/solr/update/ -H Content-Type:
  text/xml --data-binary 'commit waitFlush=false/'
 
  this give me this error: The system cannot find the file specified.
 
  I get the same error when I modify it to look like the following:
 
  c:\curl\bin\curl http://echo12:8983/solr/update/ 'commit
  waitFlush=false/'
  c:\curl\bin\curl http://echo12:8983/solr/update/; -H Content-Type:
  text/xml --data-binary 'commit waitFlush=false/'
  c:\curl\bin\curl http://echo12:8983/solr/update/ 'commit /'
  c:\curl\bin\curl http://echo12:8983/solr/update/; 'commit /'
 
  I am using the example configuration in Solr so my documents are found
 
  in the exampledocs folder also my curl program in located in the root
  directory which is the reason for the way the curl command is being
  executed.
 
  I would appreciate any information on where to look or how to get the
  commit command to execute after indexing multiple files.
 
  Kevin Miller
  Oklahoma Tax Commission
  Web Services
 
  --
  Grant Ingersoll
  http://www.lucidimagination.com/
 
  Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
  using Solr/Lucene:
  http://www.lucidimagination.com/search
 
 
 

 --
 View this message in context:
 http://www.nabble.com/An-issue-with-%3Ccommit-%3E-using-Solr-Cell-and-multiple-files-tp25350995p25352122.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 -- 
 Lance Norskog
 goks...@gmail.com
 
 

-- 
View this message in context: 
http://www.nabble.com/An-issue-with-%3Ccommit-%3E-using-Solr-Cell-and-multiple-files-tp25350995p25394203.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Re : Indexing fields dynamically

In the schema.xml file, *_i is defined as a wildcard type for integer.
If a name-value pair is an integer, use: name_i as the field name.



On 9/10/09, nourredine khadri nourredin...@yahoo.com wrote:

 Thanks for the quick reply.

 Ok for dynamicFields but  how can i rename fields during indexation/search
 to add suffix corresponding to the type ?

 What is the best way to do this?

 Nourredine.




 
 De : Yonik Seeley yo...@lucidimagination.com
 À : solr-user@lucene.apache.org
 Envoyé le : Jeudi, 10 Septembre 2009, 14h24mn 26s
 Objet : Re: Indexing fields dynamically

 On Thu, Sep 10, 2009 at 5:58 AM, nourredine khadri
 nourredin...@yahoo.com wrote:
  I want to index my fields dynamically.
 
  DynamicFields don't suit my need because I don't know fields name in
 advance and fields type must be set  dynamically too (need strong typage).

 This is what dynamic fields are meant for - you pick both the name and
 type (from a pre-defined set of types of course) at runtime.  The
 suffix of the field name matches one of the dynamic fields and
 essentially picks the type.

 -Yonik
 http://www.lucidimagination.com








-- 
Lance Norskog
goks...@gmail.com

Re: Facet fields and the DisMax query handler

Facets are not involved here. These are only simple searches.

The DisMax parser does not use field names in the query. DisMax creates a
nice simple syntax for people to type into a web browser search field. The
various parameters let you sculpt the relevance in order to tune the user
experience.

There are ways to intermix dismax parsing in the standard query parser
syntax, but I am no expert. You can also use these field queries as filter
queries; this is a hack but does work. Also, using wildcards interferes with
upper/lower case handling.

On 9/10/09, Villemos, Gert gert.ville...@logica.com wrote:

 I'm trying to understand the DisMax query handler. I orginally
 configured it to ensure that the query was mapped onto different fields
 in the documents and a boost assigned if the fields match. And that
 works pretty smoothly.

 However when it comes to facetted searches the results perplexes me.
 Consider the following example;

 Document A:
field name=StaffJohn Doe/field

 Document B:
field name=ProjectManagerJohn Doe/field

 The following queries does not return anything;
Staff:Doe
Staff:Doe*
Staff:John
Staff:John*

 The query;
Staff:John

 Returns Document A and B, even though document B doesnt even contain the
 field 'Staff' (which is optional)! Through the qf field dismax has
 been configured to search over the field 'ProjectManager' but I expected
 the usage of a facet value would exclude the field... Looking at the
 score of the documents, document A does score much higher than Document
 B (a factor 20) but I would expect not to see B at all. I have changed
 the dismax configuration minimum match to be 1, to ensure that all hits
 with a single match is returned without effect. I have changed the tie
 to 0 with no effect.

 What am I missing here? I would like queries such as 'Staff:Doe' to
 return document A, and only A.

 Cheers,
 Gert.



 Please help Logica to respect the environment by not printing this email  /
 Pour contribuer comme Logica au respect de l'environnement, merci de ne pas
 imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen
 Sie so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a
 respeitar o ambiente nao imprimindo este correio electronico.



 This e-mail and any attachment is for authorised use by the intended
 recipient(s) only. It may contain proprietary material, confidential
 information and/or be subject to legal privilege. It should not be copied,
 disclosed to, retained or used by, any other party. If you are not an
 intended recipient then please promptly delete this e-mail and any
 attachment and all copies and inform the sender. Thank you.




-- 
Lance Norskog
goks...@gmail.com

Re: Extract info from parent node during data import

On Fri, Sep 11, 2009 at 6:48 AM, venn hardy venn.ha...@hotmail.com wrote:

 Hi Fergus,

 When I debugged in the development console 
 http://localhost:9080/solr/admin/dataimport.jsp?handler=/dataimport

 I had no problems. Each category/item seems to be only indexed once, and no 
 parent fields are available (except the category name).

 I am not entirely sure how the forEach statement works, but my interpretation 
 of forEach=/document/category/item | /document/category is something like 
 this:

 1. Whenever DIH encounters a document/category it will extract the 
 /document/category/

 name field as a common field
 2. Whenever DIH encounters a document/category/item it will extract all of 
 the item fields.
 3. When all fields have been encountered, save the document in solr and go to 
 the next category/item

/document/category/item | /document/category

means there are two paths which triggers a new doc (it is possible to
have more). Whenever it encounters the closing tag of that xpath , it
emits all the fields it collected since the opening of the same tag.
after that it clears all the fields it collected since the opening of
the tag.

If there are fields it collected before opening of the same tag, it retains it





 Date: Thu, 10 Sep 2009 14:19:31 +0100
 To: solr-user@lucene.apache.org
 From: fer...@twig.me.uk
 Subject: RE: Extract info from parent node during data import

 Hi Paul,
 The forEach=/document/category/item | /document/category/name didn't work 
 (no categoryname was stored or indexed).
 However forEach=/document/category/item | /document/category seems to 
 work well. I am not sure why category on its own works, but not 
 category/name...
 But thanks for tip. It wasn't as painful as I thought it would be.
 Venn

 Hmmm, I had bother with this. Although each occurance of 
 /document/category/item
 causes a new solr document to indexed, that document contained all the 
 fields from
 the parent element as well.

 Did you see this?

 
  From: noble.p...@corp.aol.com
  Date: Thu, 10 Sep 2009 09:58:21 +0530
  Subject: Re: Extract info from parent node during data import
  To: solr-user@lucene.apache.org
 
  try this
 
  add two xpaths in your forEach
 
  forEach=/document/category/item | /document/category/name
 
  and add a field as follows
 
  field column=catgoryname xpath =/document/category/name
  commonField=true/
 
  Please try it out and let me know.
 
  On Thu, Sep 10, 2009 at 7:30 AM, venn hardy venn.ha...@hotmail.com 
  wrote:
  
   Hello,
  
  
  
   I am using SOLR 1.4 (from nighly build) and its URLDataSource in 
   conjunction with the XPathEntityProcessor. I have successfully imported 
   XML content, but I think I may have found a limitation when it comes to 
   the commonField attribute in the DataImportHandler.
  
  
  
   Before writing my own parser to read in a whole XML document, I thought 
   I'd post the question here (since I got some great advice last time).
  
  
  
   The bulk of my content is contained within each item tag. However, 
   each item has a parent called category and each category has a name 
   which I would like to import. In my forEach loop I specify the 
   /document/category/item as the collection of items I am interested in. 
   Is there anyway to extract an element from underneath a parent node? To 
   be a more more specific (see eg xml below). I would like to index the 
   following:
  
   - category: Category 1; id: 1; author: Author 1
  
   - category: Category 1; id: 2; author: Author 2
  
   - category: Category 2; id: 3; author: Author 3
  
   - category: Category 2; id: 4; author: Author 4
  
  
  
   Any ideas on how I can get to a parent node from within a child during 
   data import? If it cant be done, what do you suggest would be the best 
   way so I can keep using the DataImportHandler... would XSLT be a good 
   idea to 'flatten out' the structure a bit?
  
  
  
   Thanks
  
  
  
   This is what my XML document looks like:
  
   document
   category
   nameCategory 1/name
   item
   id1/id
   authorAuthor 1/author
   /item
   item
   id2/id
   authorAuthor 2/author
   /item
   /category
   category
   nameCategory 2/name
   item
   id3/id
   authorAuthor 3/author
   /item
   item
   id4/id
   authorAuthor 4/author
   /item
   /category
   /document
  
  
  
   And this is what my dataConfig looks like:
   dataConfig
   dataSource type=URLDataSource /
   document
   entity name=archive pk=id 
   url=http://localhost:9080/data/20090817070752.xml; 
   processor=XPathEntityProcessor forEach=/document/category/item 
   transformer=DateFormatTransformer stream=true 
   dataSource=dataSource
   field column=category xpath=/document/category/name 
   commonField=true /
   field column=id xpath=/document/category/item/id /
   field column=author xpath=/document/category/item/author /
   /entity
   /document
   /dataConfig
  
  
  
   This is how I have specified my schema
   fields
   field name=id type=string indexed=true

Re: Default Query Type For Facet Queries