Re: issue inquiry: unterminated index lock after optimize update command

2009-07-30 Thread Chris Hostetter

: I'm using solr build 2009-06-16_08-06-14, in multicore configuration.
: When I issue the update command optimize to a core, the index files
: are locked and never released.  Calling the coreAdmin unload method on
: the core unload the core but does not unlock the underlying index files.
: The core has no other alias, the data path is not referenced by any
: other core when a full status is requested.  The end result is that
: optimized cores that have been unloaded cannot be deleted until jetty is
: restarted.
... 
: I have searched jira but did not find anything relevant.  Is this a bug
: that should be reported, or is this an intended behavior?

...this is certinaly not intented behavior .. you shouldn't need to 
restart the server (or even reload the core) to unlock the index ... it 
should be unlocked automaticly when the optimize completes.

are you sure there wasn't any sort of serious error in the logs?  like an 
OutOfMemory perhaps?

if you can reproduce this consistently a detailed bug report showing your 
exact config files, describing your OS and filesystem, and describing 
exactly what setps you take to trigger this problem would certainly be 
appreciated.


-Hoss



Re: DocList Pagination

2009-07-30 Thread Chris Hostetter

: Hi, I am try to get the next DocList page in my custom search component.
: Could I get a code example of this?

you just increase the offset value you pass to 
SolrIndexSearcher.getDocList by whatever your page size is.  (if you use 
the newer QueryCommand versions you just call setOffset with the same 
value).





-Hoss



Re: solr indexing on same set of records with different value of unique field...not working...

2009-07-30 Thread Chris Hostetter

I'm not really understanding how you could get the situation you describe 
... which suggests that one (or both) of us don't understand exactly what 
happened.

if you can post the actual schema.xml file you used and an example of the 
input you indexed perhaps we can spot the discrepency.

FWIW: using a timestamp as a uniqueKey doesn't make much sense ...

 1) if you have heavy parallelization two docs indexed at the exact same 
time might overwrite eachother.
 2) you have no way of ever replacing an existing doc (unless you roll the 
clock back) in which case there's no advantage to using a uniqueKey -- 
so you might as leave it out of your schema (which makes indexing 
slightly faster) 

: I need to run around 10 million records to index, by solr.
: I has nearly 2lakh records, so i made a program to looping it till 10 million.
: Here, i specified 20 fields in schema.xml file. the unoque field i set was,
: currentTimeStamp field.
: So, when i run the loader program (which loads xml data into solr) it creates
: currentTimestamp value...and loads into solr.
: 
: For this situation,
: i stopped the loader program, after 100 records indexed into solr.
: Then again, i run the loader program for the SAME 100 records to indexed
: means,
: the solr results 100, rather than 200.
: 
: Because, i set currentTimeStamp field as uniqueField. So i expect the result
: as 200, if i run again the same 100 records...
: 
: Any suggestions please...



-Hoss



Re: update some index documents after indexing process is done with DIH

2009-07-30 Thread Chris Hostetter

This thread all sounds really kludgy ... among other things the 
newSearcher listener is going to need to some how keep track of when it 
was called as a result of a real commit, vs when it was called as the 
result of a commit it itself triggered to make changes.

wouldn't an easier place to implement this logic be in an UpdateProcessor?  
you'll still need the double commit (once so you can see the 
main changes, and once so the rest of the world can see your 
modifications) but you can execute them both directly in your 
processCommit(CommitUpdateCommand) method (so you don't have to worry 
about being able to tell them apart)

: Date: Thu, 30 Jul 2009 10:14:16 +0530
: From:
: =?UTF-8?B?Tm9ibGUgUGF1bCDgtKjgtYvgtKzgtL/gtLPgtY3igI0gIOCkqOCli+CkrOCljeCk
: s+CljQ==?= noble.p...@corp.aol.com
: Reply-To: solr-user@lucene.apache.org, noble.p...@gmail.com
: To: solr-user@lucene.apache.org
: Subject: Re: update some index documents after indexing process is done with 
: DIH
: 
: If you make your EventListener implements SolrCoreAware you can get
: hold of the core on inform. use that to get hold of the
: SolrIndexWriter
: 
: On Wed, Jul 29, 2009 at 9:20 PM, Marc Sturlesemarc.sturl...@gmail.com wrote:
: 
:  From the newSearcher(..) of a CustomEventListener which extends of
:  AbstractSolrEventListener  can access to SolrIndexSearcher and all core
:  properties but can't get a SolrIndexWriter. Do you now how can I get from
:  there a SolrIndexWriter? This way I would be able to modify the documents (I
:  need to modify them depending on values of other documents, that's why I
:  can't do it with DIH delta-import).
:  Thanks in advance
: 
: 
:  Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
: 
:  On Tue, Jul 28, 2009 at 5:17 PM, Marc Sturlesemarc.sturl...@gmail.com
:  wrote:
: 
:  That really sounds the best way to reach my goal. How could I invoque a
:  listener from the newSearcher?Would be something like:
:     listener event=newSearcher class=solr.QuerySenderListener
:       arr name=queries
:         lst str name=qsolr/str str name=start0/str str
:  name=rows10/str /lst
:         lst str name=qrocks/str str name=start0/str str
:  name=rows10/str /lst
:         lststr name=qstatic newSearcher warming query from
:  solrconfig.xml/str/lst
:       /arr
:     /listener
:     listener event=newSearcher class=solr.MyCustomListener
: 
:  And MyCustomListener would be the class who open the reader:
: 
:         RefCountedSolrIndexSearcher searchHolder = null;
:         try {
:           searchHolder = dataImporter.getCore().getSearcher();
:           IndexReader reader = searchHolder.get().getReader();
: 
:           //Here I iterate over the reader doing docuemnt modifications
: 
:         } finally {
:            if (searchHolder != null) searchHolder.decref();
:         }
:         } catch (Exception ex) {
:             LOG.info(error);
:         }
: 
:  you may not be able to access the DIH API from a newSearcher event .
:  But the API would give you the searcher directly as a method
:  parameter.
: 
:  Finally, to access to documents and add fields to some of them, I have
:  thought in using SolrDocument classes. Can you please point me where
:  something similar is done in solr source (I mean creation of
:  SolrDocuemnts
:  and conversion of them to proper lucene docuements).
: 
:  Does this way for reaching the goal makes sense?
: 
:  Thanks in advance
: 
: 
: 
:  Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
: 
:  when a core is reloaded the event fired is firstSearcher. newSearcher
:  is fired when a commit happens
: 
: 
:  On Tue, Jul 28, 2009 at 4:19 PM, Marc Sturlesemarc.sturl...@gmail.com
:  wrote:
: 
:  Ok, but if I handle it in a newSearcher listener it will be executed
:  every
:  time I reload a core, isn't it? The thing is that I want to use an
:  IndexReader to load in a HashMap some doc fields of the index and
:  depending
:  of the values of some field docs modify other docs. Its very memory
:  consuming (I have tested it with a simple lucene script). Thats why I
:  wanted
:  to do it just after the indexing process.
: 
:  My ideal case would be to do it in the commit function of
:  DirectUpdatehandler2.java just before
:  writer.optimize(cmd.maxOptimizeSegments); is executed. But I don't want
:  to
:  mess that code... so trying to find out the best way to do that as a
:  plugin
:  instead of a hack as possible.
: 
:  Thanks in advance
: 
: 
:  Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
: 
:  It is best handled as a 'newSearcher' listener in solrconfig.xml.
:  onImportEnd is invoked before committing
: 
:  On Tue, Jul 28, 2009 at 3:13 PM, Marc
:  Sturlesemarc.sturl...@gmail.com
:  wrote:
: 
:  Hey there,
:  I would like to be able to do something like: After the indexing
:  process
:  is
:  done with DIH I would like to open an indexreader, iterate over all
:  docs,
:  modify some of them depending on others and delete some others. I can
:  easy
:  do this directly coding with lucene but would like to know if 

Re: search suggest

2009-07-30 Thread Shalin Shekhar Mangar
On Thu, Jul 30, 2009 at 4:52 AM, Jason Rutherglen 
jason.rutherg...@gmail.com wrote:

 I created an issue and have added some notes
 https://issues.apache.org/jira/browse/SOLR-1316


Also see https://issues.apache.org/jira/browse/SOLR-706

-- 
Regards,
Shalin Shekhar Mangar.


FATAL ERROR

2009-07-30 Thread Jörg Agatz
Good Morning SolR :-) its morning in Germany!

i have a Problem, with the Indexing...

I often become an Error.

I think it is because in the XML stand this Character 
I need the Character, what happens?


SimplePostTool: FATAL: Solr returned an error:
comctcwstxexcWstxLazyException_Unexpected_character___code_32_missing_name__at_rowcol_unknownsource_1465__comctcwstxexcWstxLazyException_comctcwstxexcWstxUnexpectedCharException_Unexpected_character___code_32_missing_name__at_rowcol_unknownsource_1465__at_comctcwstxexcWstxLazyExceptionthrowLazilyWstxLazyExceptionjava45__at_comctcwstxsrStreamScannerthrowLazyErrorStreamScannerjava729__at_comctcwstxsrBasicStreamReadersafeFinishTokenBasicStreamReaderjava3659__at_comctcwstxsrBasicStreamReadergetTextBasicStreamReaderjava809__at_orgapachesolrhandlerXMLLoaderreadDocXMLLoaderjava278__at_orgapachesolrhandlerXMLLoaderprocessUpdateXMLLoaderjava139__at_orgapachesolrhandlerXMLLoaderloadXMLLoaderjava69__at_orgapachesolrhandlerContentStreamHandlerBasehandleRequestBodyContentStreamHandlerBasejava54__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava131__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1299__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava338__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava241__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHt

_


Re: solr indexing on same set of records with different value of unique field, not working fine.

2009-07-30 Thread noor

FYI
 Attached schema.xml file.
 And the add doc xml snippets are,
add
  doc
field name=evid501/field
field name=ssidESQ.VISION.A72/field
field name=evnum201/field
field name=evtextCpuLoopEnd Process=$Z4B1 CpuPin=0,992 
Program=\VEGAS.$SYSTEM.SYS00.MEASFH Terminal=\VEGAS.$TSPM.#TERM 
CpuBusy=0 MemPage=24 User=50,10/field

field name=proc\VEGAS.$QQDS/field
field name=layerPLGOVNPM/field
field name=evtime2008-10-07T03:00:30.0Z/field
field name=logtime2008-10-07T10:02:27.95Z/field
field name=curts1247905648000/field
   /doc
   .
/add

i just load the currentTimeStamps long value into the add doc xml to 
load into solr.



Chris Hostetter wrote:
I'm not really understanding how you could get the situation you describe 
... which suggests that one (or both) of us don't understand exactly what 
happened.


if you can post the actual schema.xml file you used and an example of the 
input you indexed perhaps we can spot the discrepency.


FWIW: using a timestamp as a uniqueKey doesn't make much sense ...

 1) if you have heavy parallelization two docs indexed at the exact same 
time might overwrite eachother.
 2) you have no way of ever replacing an existing doc (unless you roll the 
clock back) in which case there's no advantage to using a uniqueKey -- 
so you might as leave it out of your schema (which makes indexing 
slightly faster) 


: I need to run around 10 million records to index, by solr.
: I has nearly 2lakh records, so i made a program to looping it till 10 million.
: Here, i specified 20 fields in schema.xml file. the unoque field i set was,
: currentTimeStamp field.
: So, when i run the loader program (which loads xml data into solr) it creates
: currentTimestamp value...and loads into solr.
: 
: For this situation,

: i stopped the loader program, after 100 records indexed into solr.
: Then again, i run the loader program for the SAME 100 records to indexed
: means,
: the solr results 100, rather than 200.
: 
: Because, i set currentTimeStamp field as uniqueField. So i expect the result

: as 200, if i run again the same 100 records...
: 
: Any suggestions please...




-Hoss


  




Re: solr indexing on same set of records with different value of unique field, not working fine.

2009-07-30 Thread noor

Sorry, schema.xml file is here in this mail...

noor wrote:

FYI
 Attached schema.xml file.
 And the add doc xml snippets are,
add
  doc
field name=evid501/field
field name=ssidESQ.VISION.A72/field
field name=evnum201/field
field name=evtextCpuLoopEnd Process=$Z4B1 CpuPin=0,992 
Program=\VEGAS.$SYSTEM.SYS00.MEASFH Terminal=\VEGAS.$TSPM.#TERM 
CpuBusy=0 MemPage=24 User=50,10/field

field name=proc\VEGAS.$QQDS/field
field name=layerPLGOVNPM/field
field name=evtime2008-10-07T03:00:30.0Z/field
field name=logtime2008-10-07T10:02:27.95Z/field
field name=curts1247905648000/field
   /doc
   .
/add

i just load the currentTimeStamps long value into the add doc xml to 
load into solr.



Chris Hostetter wrote:
I'm not really understanding how you could get the situation you 
describe ... which suggests that one (or both) of us don't understand 
exactly what happened.


if you can post the actual schema.xml file you used and an example of 
the input you indexed perhaps we can spot the discrepency.


FWIW: using a timestamp as a uniqueKey doesn't make much sense ...

 1) if you have heavy parallelization two docs indexed at the exact 
same time might overwrite eachother.
 2) you have no way of ever replacing an existing doc (unless you 
roll the clock back) in which case there's no advantage to using 
a uniqueKey -- so you might as leave it out of your schema (which 
makes indexing slightly faster)

: I need to run around 10 million records to index, by solr.
: I has nearly 2lakh records, so i made a program to looping it till 
10 million.
: Here, i specified 20 fields in schema.xml file. the unoque field i 
set was,

: currentTimeStamp field.
: So, when i run the loader program (which loads xml data into solr) 
it creates

: currentTimestamp value...and loads into solr.
: : For this situation,
: i stopped the loader program, after 100 records indexed into solr.
: Then again, i run the loader program for the SAME 100 records to 
indexed

: means,
: the solr results 100, rather than 200.
: : Because, i set currentTimeStamp field as uniqueField. So i expect 
the result

: as 200, if i run again the same 100 records...
: : Any suggestions please...



-Hoss


  





?xml version=1.0 encoding=UTF-8 ?
!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the License); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an AS IS BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
--

!--  
 This is the Solr schema file. This file should be named schema.xml and
 should be in the conf directory under the solr home
 (i.e. ./solr/conf/schema.xml by default) 
 or located where the classloader for the Solr webapp can find it.

 This example schema is the recommended starting point for users.
 It should be kept correct and concise, usable out-of-the-box.

 For more information, on how to customize this file, please see
 http://wiki.apache.org/solr/SchemaXml
--

schema name=example version=1.1
  !-- attribute name is the name of this schema and is only used for display purposes.
   Applications should change this to reflect the nature of the search collection.
   version=1.1 is Solr's version number for the schema syntax and semantics.  It should
   not normally be changed by applications.
   1.0: multiValued attribute did not exist, all fields are multiValued by nature
   1.1: multiValued attribute introduced, false by default --

types
!-- field type definitions. The name attribute is
   just a label to be used by field definitions.  The class
   attribute and any other attributes determine the real
   behavior of the fieldType.
 Class names starting with solr refer to java classes in the
   org.apache.solr.analysis package.
--

!-- The StrField type is not analyzed, but indexed/stored verbatim.  
   - StrField and TextField support an optional compressThreshold which
   limits compression (if enabled in the derived fields) to values which
   exceed a certain size (in characters).
--
fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/

!-- boolean type: true or false --
fieldType name=boolean class=solr.BoolField sortMissingLast=true omitNorms=true/

!-- The optional sortMissingLast and sortMissingFirst attributes are
 currently 

Skipping fields from XML

2009-07-30 Thread Edwin Stauthamer
Hi,

I want to index a perfectly good solr XML-file into an Solr/Lucene instance.
The problem is that the XML has many fields that I don't want to be indexed.

I tried to index the file but Solr gives me an error because the XML
contains fields that I have not declared in my schema.xml

How can I tell Solr to skip unwanted fields and only index the fields that I
have declared in my schema.xml?

I know it must be something with a catchall setting and / or copyFields but
I can not get the configuration right. To be clear. I want Solr to index /
store only a few fields from the XML-file to be indexed and skip all the
other fields.

An answer or a link to a good reference would help.


Re: Skipping fields from XML

2009-07-30 Thread Noble Paul നോബിള്‍ नोब्ळ्
I don't think there is a way to do that.


On Thu, Jul 30, 2009 at 1:39 PM, Edwin
Stauthamerestautha...@emidconsult.com wrote:
 Hi,

 I want to index a perfectly good solr XML-file into an Solr/Lucene instance.
 The problem is that the XML has many fields that I don't want to be indexed.

 I tried to index the file but Solr gives me an error because the XML
 contains fields that I have not declared in my schema.xml

 How can I tell Solr to skip unwanted fields and only index the fields that I
 have declared in my schema.xml?

 I know it must be something with a catchall setting and / or copyFields but
 I can not get the configuration right. To be clear. I want Solr to index /
 store only a few fields from the XML-file to be indexed and skip all the
 other fields.

 An answer or a link to a good reference would help.




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Skipping fields from XML

2009-07-30 Thread AHMET ARSLAN

: I want Solr to index /  store only a few fields from the XML-file to be 
: indexed and  skip all the other fields. 

I think Dynamic fields [1] can help you.

dynamicField name=some regex to capture all unwanted fields  type=ignored/

[1] 
http://wiki.apache.org/solr/SchemaXml#head-82dba16404c8e3318021320638b669b3a6d780d0


  


Re: FATAL ERROR

2009-07-30 Thread Toby Cole

Any chance of getting that stack trace as more than one line? :)
Also, where are you posting your documents from? (e.g. Java, PHP,  
command line etc).


It sounds like you're not using 'entities' for your '' characters  
(ampersands) in your XML.
These should be converted to amp; This should look familiar if  
you've ever written any HTML.



On 30 Jul 2009, at 09:44, Jörg Agatz wrote:


Good Morning SolR :-) its morning in Germany!

i have a Problem, with the Indexing...

I often become an Error.

I think it is because in the XML stand this Character 
I need the Character, what happens?


SimplePostTool: FATAL: Solr returned an error:
comctcwstxexcWstxLazyException_Unexpected_character___code_32_missing_ 
name__at_rowcol_unknownsource_1465__comctcwstxexcWstxLazyException_com 
ctcwstxexcWstxUnexpectedCharException_Unexpected_character___code_32_m 
issing_name__at_rowcol_unknownsource_1465__at_comctcwstxexcWstxLazyExc 
eptionthrowLazilyWstxLazyExceptionjava45__at_comctcwstxsrStreamScanner 
throwLazyErrorStreamScannerjava729__at_comctcwstxsrBasicStreamReadersa 
feFinishTokenBasicStreamReaderjava3659__at_comctcwstxsrBasicStreamRead 
ergetTextBasicStreamReaderjava809__at_orgapachesolrhandlerXMLLoaderrea 
dDocXMLLoaderjava278__at_orgapachesolrhandlerXMLLoaderprocessUpdateXML 
Loaderjava139__at_orgapachesolrhandlerXMLLoaderloadXMLLoaderjava69__at 
_orgapachesolrhandlerContentStreamHandlerBasehandleRequestBodyContentS 
treamHandlerBasejava54__at_orgapachesolrhandlerRequestHandlerBasehandl 
eRequestRequestHandlerBasejava131__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1299__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava338__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava241__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHt


_


--

Toby Cole
Software Engineer, Semantico Limited
toby.c...@semantico.com tel:+44 1273 358 238
Registered in England and Wales no. 03841410, VAT no. GB-744614334.
Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK.

Check out all our latest news and thinking on the Discovery blog
http://blogs.semantico.com/discovery-blog/



Re: Skipping fields from XML

2009-07-30 Thread Koji Sekiguchi

Edwin Stauthamer wrote:

Hi,

I want to index a perfectly good solr XML-file into an Solr/Lucene instance.
The problem is that the XML has many fields that I don't want to be indexed.

I tried to index the file but Solr gives me an error because the XML
contains fields that I have not declared in my schema.xml

How can I tell Solr to skip unwanted fields and only index the fields that I
have declared in my schema.xml?

  
How about using ignored type for the fields which you don't want to be 
indexed:


fieldtype name=ignored stored=false indexed=false 
class=solr.StrField /


field name=unwanted-field-1 type=ignored multiValued=true/
field name=unwanted-field-2 type=ignored multiValued=true/
field name=unwanted-field-3 type=ignored multiValued=true/
   :

Koji


I know it must be something with a catchall setting and / or copyFields but
I can not get the configuration right. To be clear. I want Solr to index /
store only a few fields from the XML-file to be indexed and skip all the
other fields.

An answer or a link to a good reference would help.

  




Re: FATAL ERROR

2009-07-30 Thread Markus Jelsma - Buyways B.V.
Indeed, or enclose the text in CDATA tags which should work as well.






On Thu, 2009-07-30 at 09:52 +0100, Toby Cole wrote:

 Any chance of getting that stack trace as more than one line? :)
 Also, where are you posting your documents from? (e.g. Java, PHP,  
 command line etc).
 
 It sounds like you're not using 'entities' for your '' characters  
 (ampersands) in your XML.
 These should be converted to amp; This should look familiar if  
 you've ever written any HTML.
 
 
 On 30 Jul 2009, at 09:44, Jörg Agatz wrote:
 
  Good Morning SolR :-) its morning in Germany!
 
  i have a Problem, with the Indexing...
 
  I often become an Error.
 
  I think it is because in the XML stand this Character 
  I need the Character, what happens?
  
 
  SimplePostTool: FATAL: Solr returned an error:
 
 comctcwstxexcWstxLazyException_Unexpected_character___code_32_missing_ 
 
 name__at_rowcol_unknownsource_1465__comctcwstxexcWstxLazyException_com 
 
 ctcwstxexcWstxUnexpectedCharException_Unexpected_character___code_32_m 
 
 issing_name__at_rowcol_unknownsource_1465__at_comctcwstxexcWstxLazyExc 
 
 eptionthrowLazilyWstxLazyExceptionjava45__at_comctcwstxsrStreamScanner 
 
 throwLazyErrorStreamScannerjava729__at_comctcwstxsrBasicStreamReadersa 
 
 feFinishTokenBasicStreamReaderjava3659__at_comctcwstxsrBasicStreamRead 
 
 ergetTextBasicStreamReaderjava809__at_orgapachesolrhandlerXMLLoaderrea 
 
 dDocXMLLoaderjava278__at_orgapachesolrhandlerXMLLoaderprocessUpdateXML 
 
 Loaderjava139__at_orgapachesolrhandlerXMLLoaderloadXMLLoaderjava69__at 
 
 _orgapachesolrhandlerContentStreamHandlerBasehandleRequestBodyContentS 
 
 treamHandlerBasejava54__at_orgapachesolrhandlerRequestHandlerBasehandl 
 
 eRequestRequestHandlerBasejava131__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1299__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava338__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava241__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHt
 
  _
 
 --
 
 Toby Cole
 Software Engineer, Semantico Limited
 toby.c...@semantico.com tel:+44 1273 358 238
 Registered in England and Wales no. 03841410, VAT no. GB-744614334.
 Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK.
 
 Check out all our latest news and thinking on the Discovery blog
 http://blogs.semantico.com/discovery-blog/
 


Re: Skipping fields from XML

2009-07-30 Thread AHMET ARSLAN

 How can I tell Solr to skip unwanted fields and only index
 the fields that I have declared in my schema.xml?

More precisely: (taken from schema.xml)

!-- uncomment the following to ignore any fields that don't already match an 
existing  field name or dynamic field, rather than reporting them as an error. 
alternately, change the type=ignored to some other type e.g. text if you 
want  unknown fields indexed and/or stored by default -- 
   !--dynamicField name=* type=ignored multiValued=true /--


  


Re: Skipping fields from XML

2009-07-30 Thread Edwin Stauthamer
perfect!
That resolved my issue.

BTW. This was my first posting on this list.
I must say that the responses were quick and to the point!!! Good community
help!

On Thu, Jul 30, 2009 at 10:58 AM, AHMET ARSLAN iori...@yahoo.com wrote:


  How can I tell Solr to skip unwanted fields and only index
  the fields that I have declared in my schema.xml?

 More precisely: (taken from schema.xml)

 !-- uncomment the following to ignore any fields that don't already match
 an existing  field name or dynamic field, rather than reporting them as an
 error. alternately, change the type=ignored to some other type e.g. text
 if you want  unknown fields indexed and/or stored by default --
   !--dynamicField name=* type=ignored multiValued=true /--






-- 
Met vriendelijke groet / Kind regards,

Edwin Stauthamer
Adviser Search  Collaboration
Emid Consult
T: +31 (0) 70 8870700
M: +31 (0) 6 4555 4994
E: estautha...@emidconsult.com
I: http://www.emidconsult.com


Re: FATAL ERROR

2009-07-30 Thread Jörg Agatz
Also, i use the Comandline tool java .jar post.jar xyz.xml

i donkt know what you are mean with

It sounds like you're not using 'entities' for your '' characters
(ampersands) in your XML.
These should be converted to amp; This should look familiar if you've
ever written any HTML.
I dont understand this

i musst change even  to amp; ?


Re: FATAL ERROR

2009-07-30 Thread Erik Hatcher


On Jul 30, 2009, at 6:17 AM, Jörg Agatz wrote:


Also, i use the Comandline tool java .jar post.jar xyz.xml

i donkt know what you are mean with

It sounds like you're not using 'entities' for your '' characters
(ampersands) in your XML.
These should be converted to amp; This should look familiar if  
you've

ever written any HTML.
I dont understand this

i musst change even  to amp; ?


Yes, if you need an ampersand in an XML element, it must be escaped:

   field name=titleHarold amp; Maude/field

Erik



Re: FATAL ERROR

2009-07-30 Thread Toby Cole

On 30 Jul 2009, at 11:17, Jörg Agatz wrote:

It sounds like you're not using 'entities' for your '' characters
(ampersands) in your XML.
These should be converted to amp; This should look familiar if  
you've

ever written any HTML.
I dont understand this

i musst change even  to amp; ?



Yes, '' characters aren't allowed in XML unless they are either in a  
CDATA section or part of an 'entity'.

A good place to read up on this is: 
http://www.xml.com/pub/a/2001/01/31/qanda.html

In short, replace all your  with amp;

--
Toby Cole
Software Engineer, Semantico Limited
Registered in England and Wales no. 03841410, VAT no. GB-744614334.
Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK.

Check out all our latest news and thinking on the Discovery blog
http://blogs.semantico.com/discovery-blog/



Re: Multi select faceting

2009-07-30 Thread Grant Ingersoll


On Jul 29, 2009, at 2:38 PM, Mike wrote:


Hi,

We're using Lucid Imagination's LucidWorks Solr 1.3 and we have a  
requirement to implement multiple-select faceting where the facet  
cells show up as checkboxes and despite checked options, all of the  
options continue to persist with counts. The best example I found is  
the search on Lucid Imagination's site: http://www.lucidimagination.com/search/


It appears the Solr 1.4 release has support for doing this with  
filter tagging (http://wiki.apache.org/solr/SimpleFacetParameters#head-f277d409b221b407d9c5430f552bf40ee6185c4c 
), but I was wondering if there was another way to accomplish this  
in 1.3?



The only way I can think to do this is to backport the patch to 1.3.   
FWIW, we are running 1.4-dev at /search, which is where that  
functionality comes from.


-Grant


Re: Question about formatting the results returned from Solr

2009-07-30 Thread Noble Paul നോബിള്‍ नोब्ळ्
apparently all the dat ais going to one field 'author'

instead they should be sent to separate fields
author_fname
author_lname
author_email

so you would get details like

 str name=author_fnameJohn/str
 str name=author_lnameDoe/str
 str name=author_emailj...@doe.com/str



On Wed, Jul 29, 2009 at 7:39 PM, ahammadahmed.ham...@gmail.com wrote:

 Hi all,

 Not sure how good my title is, but here is a (hopefully) better explanation
 on what I mean.

 I am indexing a set of articles from a DB. Each article has an author. The
 author is saved in then the DB as an author ID, which is a number.

 There is another table in the DB with more relevant information about the
 author. Basically it has columns like:

 id, firstname, lastname, email, userid

 I set up the DIH so that it returns the userid, and it works fine:

 arr name=author
   strjdoe/str
   strmsmith/str
 /arr

 Would it be possible to return all of the information about the author
 (first name, ...) as a subset of the results above?

 Here is what I mean:

 arr name=author
   arr name=jdoe
      str name=firstNameJohn/str
      str name=lastNameDoe/str
      str name=emailj...@doe.com/str
   /arr
   ...
 /arr

 Something similar to that at least...

 Not sure how descriptive I was, but any pointers would be highly
 appreciated.

 Cheers

 --
 View this message in context: 
 http://www.nabble.com/Question-about-formatting-the-results-returned-from-Solr-tp24719831p24719831.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Range Query question

2009-07-30 Thread Matt Beaumont

Hi,

I have a set of XML data that holds Minimum and Maximum values and I need to
be able to do specific range queries against them.

(Note that this is a contrived example, and that in reality the garage would
probably hold all the individual prices of all its cars, but this is
analogous to the problem we have which is couched in terms that would
obscure the problem.)

For example, the following XML fragment is indexed so that each car
element becomes a Solr document:

cars
car
manufacturerFord/manufacturer
modelKa/model
garage
   namegarage1/name
   min2000/min
   max4000/max
/garage
garage
   namegarage2/name
   min8000/min
   max1/max
/garage

/car
/cars

I want to be able do a range query where
  search min value = 2500
  search max value = 3500

This should return garage1 as potentially having cars in my price range as
the range of prices for the garage contains the range I have input.  It's
also worth noting that we can't simply look for min prices that fall inside
our range or max prices that fall inside our range, as in the case outlined
above, none of the individual values fall inside our range, but there is
overlap.

The problem is that the indexed form of this XML is flattened so the car
entity has 2 garage names, 2 min values and 2 max values, but the grouping
between the garage name and it's min and max values is lost.  The danger is
that we end up doing a comparison of the min-of-the-mins and the
max-of-the-maxes, which tells us that a car is available in the price range
which may not be true if garage1 has all cars below our search range and
garage2 has all cars above our search range, e.g. if our search range is
5000-6000 then we should get no match.

We wanted to include the garage name as an attritube of the min/max values
to maintain this link, but couldn't find a way to do this.

Finally, it would be extremely difficult for us to modify the XML presented
to our system, hence our approach to date.

Has anyone had a similar problem and if so how did you overcome it?

Thanks for taking the time to look.

-
Matt Beaumont
mibe...@yahoo.co.uk

-- 
View this message in context: 
http://www.nabble.com/Range-Query-question-tp24737656p24737656.html
Sent from the Solr - User mailing list archive at Nabble.com.



How can i get lucene index format version information?

2009-07-30 Thread Licinio Fernández Maurelo
 i want to get the lucene index format version from solr web app (as
luke do), i've tried looking for the info at luke handler response,
but i havn't found this info

-- 
Lici


SOLR deleted almost everything?

2009-07-30 Thread Reece
Hello everyone :)

I was trying to purge out older things.. in this case of a certain
type of document that had an ID lower than 200.  So I posted this:

deletequeryid:[0 TO 200] AND type:I/query/delete

Now, I have only 49 type I items total in my index (shown by
/solr/select?q=type:I), when there should be numbers still up to about
2165000 which is far far more than 49  I'm curious why this would
be, as I'm trying to build it automatic purging of older things, but
this obviously didn't work the way I thought.

I'm on version 1.1, and my schema information for the fields is below:

   field name=id type=string indexed=true stored=true
required=true /
   field name=title type=text indexed=true stored=true
multiValued=true/
   field name=body type=text indexed=true stored=false
multiValued=true/
   field name=product type=text indexed=true stored=false
multiValued=true/
   field name=status type=string indexed=true stored=false
multiValued=true/
   field name=resolution type=string indexed=true
stored=false multiValued=true/
   field name=owner type=string indexed=true stored=false
multiValued=true/
   field name=touches type=text indexed=true stored=false
multiValued=true/
   field name=type type=string indexed=true stored=true
multiValued=true/
   field name=url type=string indexed=true stored=true
multiValued=true/

Thanks for any insight into why I broke it!
-Reece


Re: Multi select faceting

2009-07-30 Thread Mike
Grant, thanks for the reply. We tested our requirement against 1.4-dev and 
were able to achieve what we wanted. The site we're rebuilding has low 
traffic, so we're going to run with 1.4-dev.


Cheers.

- Original Message - 
From: Grant Ingersoll gsing...@apache.org

To: solr-user@lucene.apache.org
Sent: Thursday, July 30, 2009 8:05 AM
Subject: Re: Multi select faceting




On Jul 29, 2009, at 2:38 PM, Mike wrote:


Hi,

We're using Lucid Imagination's LucidWorks Solr 1.3 and we have a 
requirement to implement multiple-select faceting where the facet  cells 
show up as checkboxes and despite checked options, all of the  options 
continue to persist with counts. The best example I found is  the search 
on Lucid Imagination's site: http://www.lucidimagination.com/search/


It appears the Solr 1.4 release has support for doing this with  filter 
tagging 
(http://wiki.apache.org/solr/SimpleFacetParameters#head-f277d409b221b407d9c5430f552bf40ee6185c4c 
 ), but I was wondering if there was another way to accomplish this  in 
1.3?



The only way I can think to do this is to backport the patch to 1.3. 
FWIW, we are running 1.4-dev at /search, which is where that 
functionality comes from.


-Grant






RE: Boosting ('bq') on multi-valued fields

2009-07-30 Thread Ensdorf Ken

 Hey Ken,
 Thanks for your reply.
 When I wrote '5|6' I ment that this is a multiValued field with two
 values
 '5' and '6', rather than the literal string '5|6' (and any Tokenizer).
 Does
 your reply still holds? That is, are multiValued fields dependent on
 the
 notion of tokenization to such a degree so that I cant use str type
 with
 them meaningfully? if so, it seems weird to me that I should be able to
 define a str multiValued field to begin with..

I'm pretty sure you can use multiValued string fields in the way you are 
describing.  If you just do a query without the boost do documents with 
multiple values come back?  That would at least tell you whether the problem 
was matching on the term itself or something to do with your use of boosts.

-Ken


RE: Range Query question

2009-07-30 Thread Ensdorf Ken
 The problem is that the indexed form of this XML is flattened so the
 car
 entity has 2 garage names, 2 min values and 2 max values, but the
 grouping
 between the garage name and it's min and max values is lost.  The
 danger is
 that we end up doing a comparison of the min-of-the-mins and the
 max-of-the-maxes, which tells us that a car is available in the price
 range
 which may not be true if garage1 has all cars below our search range
 and
 garage2 has all cars above our search range, e.g. if our search range
 is
 5000-6000 then we should get no match.

You could index each garage-car pairing as a separate document, embedding all 
the necessary information you need for searching.

e.g.-

garage_car
car_manufacturerFord/manufacturer
car_modelKa/model
   garage_namegarage1/name
   min2000/min
   max4000/max
/garage_car


Re: SOLR deleted almost everything?

2009-07-30 Thread Erik Hatcher


On Jul 30, 2009, at 9:44 AM, Reece wrote:


Hello everyone :)

I was trying to purge out older things.. in this case of a certain
type of document that had an ID lower than 200.  So I posted this:

deletequeryid:[0 TO 200] AND type:I/query/delete

Now, I have only 49 type I items total in my index (shown by
/solr/select?q=type:I), when there should be numbers still up to about
2165000 which is far far more than 49  I'm curious why this would
be, as I'm trying to build it automatic purging of older things, but
this obviously didn't work the way I thought.

I'm on version 1.1, and my schema information for the fields is below:

  field name=id type=string indexed=true stored=true
required=true /


Use one of the sortable numeric types for your id field if you need to  
perform range queries on them.  A string is sorted lexicographically:  
1, 10, 11, 2, 3, 4, 5... and thus a range query won't work the way you  
might expect.


Erik



NullPointerException in DataImportHandler

2009-07-30 Thread Andrew Clegg

First of all, apologies if you get this twice. I posted it by email an hour
ago but it hasn't appeared in any of the archives, so I'm worried it's got
junked somewhere.

I'm trying to use a DataImportHandler to merge some data from a database
with some other fields from a collection of XML files, rather like the
example in the Architecture section here:

http://wiki.apache.org/solr/DataImportHandler

... so a given document is built from some fields from the database and some
from the XML.

My dataconfig.xml looks like this:


dataConfig
   dataSource name=database driver=org.postgresql.Driver
url=jdbc:postgresql://cathdb.info/cathdb_v3_3_0 user=cathreader
password=cathreader /

   dataSource name=filesystem type=FileDataSource
basePath=/cath/people/cathdata/v3_3_0/pdb-XML-noatom/ encoding=UTF-8
connectionTimeout=5000 readTimeout=1/

   document name=domain

   entity name=domain dataSource=database query=select domain_id
as id, 'PDB code ' || pdb_code || ', chain ' || chain_code || ', domain ' ||
domain_code as title, 'some keywords go here' as
keywords, pdb_code || ' ' || chain_id as related_ids, 'domain' as doc_type,
pdb_code from domain

   entity dataSource=filesystem name=domain_pdb
url=${domain.pdb_code}-noatom.xml
   field column=content
xpath=//*[local-name()='structCategory']/*[local-name()='struct']/*[local-name()='title']
/
   /entity

   /entity

   /document
/dataConfig


This works if I comment out the inner entity, but when I uncomment it, I get
this error:


30-Jul-2009 14:32:50 org.apache.solr.handler.dataimport.DocBuilder
buildDocument
SEVERE: Exception while processing: domain document :
SolrInputDocument[{id=id(1.0)={1s32D00}, title=title(1.0)={PDB code
1s32, chain D, domain 00}, keywords=keywords(1.0)={some ke
ywords go here}, pdb_code=pdb_code(1.0)={1s32},
doc_type=doc_type(1.0)={domain}, related_ids=related_ids(1.0)={1s32
1s32D}}]
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NullPointerException
   at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:64)
   at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71)
   at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
   at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:344)
   at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:372)
   at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:225)
   at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167)
   at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333)
   at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393)
   at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
Caused by: java.lang.NullPointerException
   at java.io.File.init(File.java:222)
   at
org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.java:75)
   at
org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.java:44)
   at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
   ... 9 more


I have checked that the file
/cath/people/cathdata/v3_3_0/pdb-XML-noatom/1s32-noatom.xml is readable, so
maybe the full path to the file isn't being constructed properly or
something?

I also tried with the full path template for the file in the entity url
attribute, instead of using a basePath in the dataSource, but I get exactly
the same exception.

This is with the 2009-07-30 nightly build. See attached for schema. 
http://www.nabble.com/file/p24739580/schema.xml schema.xml 

Any ideas? Thanks in advance!

Andrew.


--
:: http://biotext.org.uk/ ::
-- 
View this message in context: 
http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24739580.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Question about formatting the results returned from Solr

2009-07-30 Thread Avlesh Singh

 instead they should be sent to separate fields
 author_fname
 author_lname
 author_email


or, a dynamic field called author_* (I am assuming all of the author fields
to be of the same type).

And if you use SolrJ, you can transform this info into a data structure like
MapString, String authorInfo, where the keys would be firstName,
lastName, email etc. Look for more here -
http://issues.apache.org/jira/browse/SOLR-1129

Cheers
Avlesh

2009/7/30 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 apparently all the dat ais going to one field 'author'

 instead they should be sent to separate fields
 author_fname
 author_lname
 author_email

 so you would get details like

 str name=author_fnameJohn/str
 str name=author_lnameDoe/str
 str name=author_emailj...@doe.com/str



 On Wed, Jul 29, 2009 at 7:39 PM, ahammadahmed.ham...@gmail.com wrote:
 
  Hi all,
 
  Not sure how good my title is, but here is a (hopefully) better
 explanation
  on what I mean.
 
  I am indexing a set of articles from a DB. Each article has an author.
 The
  author is saved in then the DB as an author ID, which is a number.
 
  There is another table in the DB with more relevant information about the
  author. Basically it has columns like:
 
  id, firstname, lastname, email, userid
 
  I set up the DIH so that it returns the userid, and it works fine:
 
  arr name=author
strjdoe/str
strmsmith/str
  /arr
 
  Would it be possible to return all of the information about the author
  (first name, ...) as a subset of the results above?
 
  Here is what I mean:
 
  arr name=author
arr name=jdoe
   str name=firstNameJohn/str
   str name=lastNameDoe/str
   str name=emailj...@doe.com/str
/arr
...
  /arr
 
  Something similar to that at least...
 
  Not sure how descriptive I was, but any pointers would be highly
  appreciated.
 
  Cheers
 
  --
  View this message in context:
 http://www.nabble.com/Question-about-formatting-the-results-returned-from-Solr-tp24719831p24719831.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com



Posting data in JSON

2009-07-30 Thread Jérôme Etévé
Hi All,

 I'm wondering if it's possible to post documents to solr in JSON format.

JSON is much faster than XML to get the queries results, so I think
it'd be great to be able to post data in JSON to speed up the indexing
and lower the network load.

All the best !

Jerome Eteve.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net


Re: NullPointerException in DataImportHandler

2009-07-30 Thread Chantal Ackermann

Hi Andrew,

your inner entity uses an XML type datasource. The default entity 
processor is the SQL one, however.


For your inner entity, you have to specify the correct entity processor 
explicitly. You do that by adding the attribute processor, and the 
value is the classname of the processor you want to use.


e.g. entity dataSource=filesystem name=domain_pdb 
processor=XPathEntityProcessor 


(See the wikipedia example on the DataImportHandler wiki page.)

Cheers,
Chantal

Andrew Clegg schrieb:

First of all, apologies if you get this twice. I posted it by email an hour
ago but it hasn't appeared in any of the archives, so I'm worried it's got
junked somewhere.

I'm trying to use a DataImportHandler to merge some data from a database
with some other fields from a collection of XML files, rather like the
example in the Architecture section here:

http://wiki.apache.org/solr/DataImportHandler

... so a given document is built from some fields from the database and some
from the XML.

My dataconfig.xml looks like this:


dataConfig
   dataSource name=database driver=org.postgresql.Driver
url=jdbc:postgresql://cathdb.info/cathdb_v3_3_0 user=cathreader
password=cathreader /

   dataSource name=filesystem type=FileDataSource
basePath=/cath/people/cathdata/v3_3_0/pdb-XML-noatom/ encoding=UTF-8
connectionTimeout=5000 readTimeout=1/

   document name=domain

   entity name=domain dataSource=database query=select domain_id
as id, 'PDB code ' || pdb_code || ', chain ' || chain_code || ', domain ' ||
domain_code as title, 'some keywords go here' as
keywords, pdb_code || ' ' || chain_id as related_ids, 'domain' as doc_type,
pdb_code from domain

   entity dataSource=filesystem name=domain_pdb
url=${domain.pdb_code}-noatom.xml
   field column=content
xpath=//*[local-name()='structCategory']/*[local-name()='struct']/*[local-name()='title']
/
   /entity

   /entity

   /document
/dataConfig


This works if I comment out the inner entity, but when I uncomment it, I get
this error:


30-Jul-2009 14:32:50 org.apache.solr.handler.dataimport.DocBuilder
buildDocument
SEVERE: Exception while processing: domain document :
SolrInputDocument[{id=id(1.0)={1s32D00}, title=title(1.0)={PDB code
1s32, chain D, domain 00}, keywords=keywords(1.0)={some ke
ywords go here}, pdb_code=pdb_code(1.0)={1s32},
doc_type=doc_type(1.0)={domain}, related_ids=related_ids(1.0)={1s32
1s32D}}]
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NullPointerException
   at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:64)
   at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71)
   at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
   at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:344)
   at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:372)
   at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:225)
   at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167)
   at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333)
   at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393)
   at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
Caused by: java.lang.NullPointerException
   at java.io.File.init(File.java:222)
   at
org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.java:75)
   at
org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.java:44)
   at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
   ... 9 more


I have checked that the file
/cath/people/cathdata/v3_3_0/pdb-XML-noatom/1s32-noatom.xml is readable, so
maybe the full path to the file isn't being constructed properly or
something?

I also tried with the full path template for the file in the entity url
attribute, instead of using a basePath in the dataSource, but I get exactly
the same exception.

This is with the 2009-07-30 nightly build. See attached for schema.
http://www.nabble.com/file/p24739580/schema.xml schema.xml

Any ideas? Thanks in advance!

Andrew.


--
:: http://biotext.org.uk/ ::
--
View this message in context: 
http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24739580.html
Sent from the Solr - User mailing list archive at Nabble.com.



Posting Word documents

2009-07-30 Thread Kevin Miller
I am trying to post a Word document using the Solr post.jar file.  When
I attempt this, using a command line interface, I get a fatal error.

I have looked at the following resources:

Solr.com: Tutorial, Docs, FAQ,  ExtractingRequestHandler.

As near as I can tell, I have all the files in the proper place.

Following is a portion of the error displayed in the cmd window:

C:\Solr\Apache~1\example\exampledocsjava -jar post.jar *.doc
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in
UTF-8, other encodings are not currently supported
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file BadNews.doc
SimplePostTool: FATAL: Solr returned an error:
Unexpected_character__code_65533__0xfffd_in_prolog_expected___at_rowcol_
unknownsoruce_11_javaioIOException_Unexpected_charater__code65533__0xfff
d_in_prolog_expected___at_rowcol_unknownsource_11___at_orgapachesolrhand
lerXMLLoaderloadXMLLoaderjava73___at_orgapahcesolrhandlerContentStreamHa
ndlerBasehandlerRequrestBodyContentStreamHandlerBasejava54___...

There is more and if needed I will be happy to post all of it.

Here is the information that posted into the log file:

127.0.0.1 -  -  [30/07/2009:15:20:09 +] POST /solr/update HTTP/1.1
500 4011 

Kevin Miller
Web Services


Re: Posting Word documents

2009-07-30 Thread Mark Miller

Look again at ExtractingRequestHandler.

I havn't looked at what post.jar does internally, but it probably 
doesn't work with ExtractingRequestHandler unless you can send other 
params as well. I would use curl as the examples in the doc for 
ExtractingRequestHandler does. Or figure out if post.jar will work for 
you and use it correctly. What Handler is 'update..' mapped to? If its 
not mapped to ExtractingRequestHandler than you have no hope of this 
working in any case. Looks to me like its trying to process the file as 
SolrXml - which means you are not submitting it to ExtractingRequestHandler.


--
- Mark

http://www.lucidimagination.com



Kevin Miller wrote:

I am trying to post a Word document using the Solr post.jar file.  When
I attempt this, using a command line interface, I get a fatal error.

I have looked at the following resources:

Solr.com: Tutorial, Docs, FAQ,  ExtractingRequestHandler.

As near as I can tell, I have all the files in the proper place.

Following is a portion of the error displayed in the cmd window:

C:\Solr\Apache~1\example\exampledocsjava -jar post.jar *.doc
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in
UTF-8, other encodings are not currently supported
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file BadNews.doc
SimplePostTool: FATAL: Solr returned an error:
Unexpected_character__code_65533__0xfffd_in_prolog_expected___at_rowcol_
unknownsoruce_11_javaioIOException_Unexpected_charater__code65533__0xfff
d_in_prolog_expected___at_rowcol_unknownsource_11___at_orgapachesolrhand
lerXMLLoaderloadXMLLoaderjava73___at_orgapahcesolrhandlerContentStreamHa
ndlerBasehandlerRequrestBodyContentStreamHandlerBasejava54___...

There is more and if needed I will be happy to post all of it.

Here is the information that posted into the log file:

127.0.0.1 -  -  [30/07/2009:15:20:09 +] POST /solr/update HTTP/1.1
500 4011 


Kevin Miller
Web Services
  







Re: How can i get lucene index format version information?

2009-07-30 Thread Erik Hatcher


On Jul 30, 2009, at 9:19 AM, Licinio Fernández Maurelo wrote:


i want to get the lucene index format version from solr web app (as
luke do), i've tried looking for the info at luke handler response,
but i havn't found this info


the Luke request handler writes it out:

   indexInfo.add(version, reader.getVersion());

It appears in the index section near the top of the response.

Erik



Re: NullPointerException in DataImportHandler

2009-07-30 Thread Andrew Clegg


Chantal Ackermann wrote:
 
 Hi Andrew,
 
 your inner entity uses an XML type datasource. The default entity 
 processor is the SQL one, however.
 
 For your inner entity, you have to specify the correct entity processor 
 explicitly. You do that by adding the attribute processor, and the 
 value is the classname of the processor you want to use.
 
 e.g. entity dataSource=filesystem name=domain_pdb 
 processor=XPathEntityProcessor 
 

Thanks -- I was also missing a forEach expression -- in my case, just /
since each XML file contains the information for no more than one document.

However, I'm now getting a different exception:


30-Jul-2009 16:48:52 org.apache.solr.handler.dataimport.DocBuilder
buildDocument
SEVERE: Exception while processing: domain document :
SolrInputDocument[{id=id(1.0)={1udaA02}, title=title(1.0)={PDB code 1uda,
chain A, domain 02}, pdb_code=pdb_code(1.0)={1uda}, 
doc_type=doc_type(1.0)={domain}, related_ids=related_ids(1.0)={1uda,1udaA}}]
org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
while reading xpaths for fields Processing Document # 1
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:135)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:76)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:307)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:372)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:225)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.LinkedList.entry(LinkedList.java:365)
at java.util.LinkedList.get(LinkedList.java:315)
at
org.apache.solr.handler.dataimport.XPathRecordReader.addField0(XPathRecordReader.java:71)
at
org.apache.solr.handler.dataimport.XPathRecordReader.init(XPathRecordReader.java:50)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:121)
... 9 more


My data config now looks like this:


dataConfig

!-- TODO  change this back to v3.3.0 when the appropriate mapping
tables are available there --

dataSource name=database driver=org.postgresql.Driver
url=jdbc:postgresql://cathdb.info/cathdb_v3_2_0 user=*** password=***
/

dataSource name=filesystem type=FileDataSource
basePath=/cath/people/cathdata/v3_3_0/pdb-XML-noatom/ encoding=UTF-8
connectionTimeout=5000 readTimeout=1/

document name=domain

entity name=domain dataSource=database query=select domain_id
as id, 'PDB code ' || pdb_code || ', chain ' || chain_code || ', domain ' ||
domain_code as title, pdb_code || ',' || chain_id as related_ids, 'domain'
as doc_type, pdb_code from domain

entity dataSource=filesystem name=domain_pdb
url=${domain.pdb_code}-noatom.xml processor=XPathEntityProcessor
forEach=/
field column=content
xpath=//*[local-name()='structCategory']/*[local-name()='struct']/*[local-name()='title']
/
/entity


/entity

/document

/dataConfig


Thanks in advance, again :-)

Andrew.

-- 
View this message in context: 
http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24741292.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: NullPointerException in DataImportHandler

2009-07-30 Thread Erik Hatcher


On Jul 30, 2009, at 11:54 AM, Andrew Clegg wrote:

   entity dataSource=filesystem name=domain_pdb
url=${domain.pdb_code}-noatom.xml processor=XPathEntityProcessor
forEach=/
   field column=content
xpath=//*[local-name()='structCategory']/*[local-name()='struct']/ 
*[local-name()='title']

/


The XPathEntityProcessor doesn't support that fancy of an xpath - it  
supports only a limited subset.  Try /structCategory/struct/title  
perhaps?


Erik



Re: NullPointerException in DataImportHandler

2009-07-30 Thread Chantal Ackermann

Hi Andrew,

my experience with XPathEntityProcessor is non-existent. ;-)

Just after a quick look at the method that throws the exception:

  private void addField0(String xpath, String name, boolean multiValued,
 boolean isRecord) {
ListString paths = new 
LinkedListString(Arrays.asList(xpath.split(/)));

if (.equals(paths.get(0).trim()))
  paths.remove(0);
rootNode.build(paths, name, multiValued, isRecord);
  }

and your foreach attribute value in combination with the xpath:
 forEach=/
 field column=content
 
xpath=//*[local-name()='structCategory']/*[local-name()='struct']/*[local-name()='title']

 /

I would guess that the double slash at the beginning is not working with 
your foreach regex. I don't know whether this is something the processor 
should expect and handle correctly or whether you have to take care of 
in your configuration.


Cheers,
Chantal

Andrew Clegg schrieb:


Chantal Ackermann wrote:

Hi Andrew,

your inner entity uses an XML type datasource. The default entity
processor is the SQL one, however.

For your inner entity, you have to specify the correct entity processor
explicitly. You do that by adding the attribute processor, and the
value is the classname of the processor you want to use.

e.g. entity dataSource=filesystem name=domain_pdb
processor=XPathEntityProcessor 



Thanks -- I was also missing a forEach expression -- in my case, just /
since each XML file contains the information for no more than one document.

However, I'm now getting a different exception:


30-Jul-2009 16:48:52 org.apache.solr.handler.dataimport.DocBuilder
buildDocument
SEVERE: Exception while processing: domain document :
SolrInputDocument[{id=id(1.0)={1udaA02}, title=title(1.0)={PDB code 1uda,
chain A, domain 02}, pdb_code=pdb_code(1.0)={1uda},
doc_type=doc_type(1.0)={domain}, related_ids=related_ids(1.0)={1uda,1udaA}}]
org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
while reading xpaths for fields Processing Document # 1
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:135)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:76)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:307)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:372)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:225)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.LinkedList.entry(LinkedList.java:365)
at java.util.LinkedList.get(LinkedList.java:315)
at
org.apache.solr.handler.dataimport.XPathRecordReader.addField0(XPathRecordReader.java:71)
at
org.apache.solr.handler.dataimport.XPathRecordReader.init(XPathRecordReader.java:50)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:121)
... 9 more


My data config now looks like this:


dataConfig

!-- TODO  change this back to v3.3.0 when the appropriate mapping
tables are available there --

dataSource name=database driver=org.postgresql.Driver
url=jdbc:postgresql://cathdb.info/cathdb_v3_2_0 user=*** password=***
/

dataSource name=filesystem type=FileDataSource
basePath=/cath/people/cathdata/v3_3_0/pdb-XML-noatom/ encoding=UTF-8
connectionTimeout=5000 readTimeout=1/

document name=domain

entity name=domain dataSource=database query=select domain_id
as id, 'PDB code ' || pdb_code || ', chain ' || chain_code || ', domain ' ||
domain_code as title, pdb_code || ',' || chain_id as related_ids, 'domain'
as doc_type, pdb_code from domain

entity dataSource=filesystem name=domain_pdb
url=${domain.pdb_code}-noatom.xml processor=XPathEntityProcessor
forEach=/
field column=content
xpath=//*[local-name()='structCategory']/*[local-name()='struct']/*[local-name()='title']
/
/entity


/entity

/document

/dataConfig


Thanks in advance, again :-)

Andrew.

--
View this message in context: 
http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24741292.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Chantal Ackermann


Re: NullPointerException in DataImportHandler

2009-07-30 Thread Andrew Clegg


Erik Hatcher wrote:
 
 
 On Jul 30, 2009, at 11:54 AM, Andrew Clegg wrote:
entity dataSource=filesystem name=domain_pdb
 url=${domain.pdb_code}-noatom.xml processor=XPathEntityProcessor
 forEach=/
field column=content
 xpath=//*[local-name()='structCategory']/*[local-name()='struct']/ 
 *[local-name()='title']
 /
 
 The XPathEntityProcessor doesn't support that fancy of an xpath - it  
 supports only a limited subset.  Try /structCategory/struct/title  
 perhaps?
 
 

Sadly not...

I tried with:

field column=content
xpath=/datablock/structCategory/struct/title /

(full path from root)

and

field column=content
xpath=//structCategory/struct/title /

Same ArrayIndex error each time.

Doesn't it use javax.xml then? I was using the complex local-name
expressions to make it namespace-agnostic -- is it agnostic anyway?

Thanks,

Andrew.

-- 
View this message in context: 
http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24741696.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: NullPointerException in DataImportHandler

2009-07-30 Thread Andrew Clegg



Chantal Ackermann wrote:
 
 
 my experience with XPathEntityProcessor is non-existent. ;-)
 
 

Don't worry -- your hints put me on the right track :-)

I got it working with:

entity dataSource=filesystem name=domain_pdb
url=${domain.pdb_code}-noatom.xml processor=XPathEntityProcessor
forEach=/datablock
field column=content
xpath=/datablock/structCategory/struct/title /
/entity

Now, to get it to ignore missing files without an error... Hmm...

Cheers,

Andrew.

-- 
View this message in context: 
http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24741772.html
Sent from the Solr - User mailing list archive at Nabble.com.



Minimum facet length?

2009-07-30 Thread darren
Hi,
  I am exploring the faceted search results of Solr. My query is like this.

http://localhost:8983/solr/select?q=*:*facet=truefacet.field=textfacet.limit=500facet.prefix=wick

If I don't use the prefix, I get back totals for words like 1,a,of,2,3,4.
1 letter/number occurrences in my documents. Its not really useful since
all the documents have some free floating single-digit numbers.

Is there a way to restrict the word frequency results for a facet based on
the length so I can set it to  3 or is there a better way?

thanks,
Darren


Re: NullPointerException in DataImportHandler

2009-07-30 Thread Erik Hatcher


On Jul 30, 2009, at 12:19 PM, Andrew Clegg wrote:

Don't worry -- your hints put me on the right track :-)

I got it working with:

   entity dataSource=filesystem name=domain_pdb
url=${domain.pdb_code}-noatom.xml processor=XPathEntityProcessor
forEach=/datablock
   field column=content
xpath=/datablock/structCategory/struct/title /
   /entity

Now, to get it to ignore missing files without an error... Hmm...


onError=skip  or abort, or continue

Erik




Re: NullPointerException in DataImportHandler

2009-07-30 Thread Chantal Ackermann
It's very easy to write your own entity processor. At least, that is my 
experience with extending the SQLEntityProcessor to my needs. So, maybe 
you'd be better off subclassing the xpath processor and handling the 
xpath in a way you can keep your configuration straight forward.



Andrew Clegg schrieb:



Chantal Ackermann wrote:


my experience with XPathEntityProcessor is non-existent. ;-)




Don't worry -- your hints put me on the right track :-)

I got it working with:

entity dataSource=filesystem name=domain_pdb
url=${domain.pdb_code}-noatom.xml processor=XPathEntityProcessor
forEach=/datablock
field column=content
xpath=/datablock/structCategory/struct/title /
/entity

Now, to get it to ignore missing files without an error... Hmm...

Cheers,

Andrew.

--
View this message in context: 
http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24741772.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Chantal Ackermann
Consultant

mobil+49 (176) 10 00 09 45
emailchantal.ackerm...@btelligent.de



b.telligent GmbH  Co. KG
Lichtenbergstraße 8
D-85748 Garching / München

fon   +49 (89) 54 84 25 60
fax+49 (89) 54 84 25 69
web  www.btelligent.de

Registered in Munich: HRA 84393
Managing Director: b.telligent Verwaltungs GmbH, HRB 153164 represented 
by Sebastian Amtage and Klaus Blaschek

USt.Id.-Nr. DE814054803



Confidentiality Note
This email is intended only for the use of the individual or entity to 
which it is addressed, and may contain information that is privileged, 
confidential and exempt from disclosure under applicable law. If the 
reader of this email message is not the intended recipient, or the 
employee or agent responsible for delivery of the message to the 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this communication is prohibited. If you have 
received this email in error, please notify us immediately by telephone 
at +49 (0) 89 54 84 25 60. Thank you.


Re: update some index documents after indexing process is done with DIH

2009-07-30 Thread Marc Sturlese

Hoss I see what you mean. I am trying to implement a CustomUpdateProcessor
checking out here:
http://wiki.apache.org/solr/UpdateRequestProcessor
What is confusing me now is that I have to implement my logic in
processComit as you said:

you'll still need the double commit (once so you can see the 
main changes, and once so the rest of the world can see your 
modifications) but you can execute them both directly in your 
processCommit(CommitUpdateCommand)

I have noticed that in the processAdd you have acces to the concrete
SolrInpuntDocument you are going to add:
SolrInputDocument doc = cmd.getSolrInputDocument();

But in processCommit, having access to the core I can get the IndexReader
but I still don't know how to get the IndexWriter and SolrInputDocuments in
there.
My idea is to do something like:

   @Override
public void processCommit(CommitUpdateCommand cmd) throws IOException {
  //first commit that show me modification
  //open and iterate over the reader and create solrDocuments list
  //close reader
  //openwriter and update the docs in the list
  //close writer and second commit that shows my changes to the world
  
  if (next != null)
next.processCommit(cmd);

}

As I understood the process, the commitCommand will be sent to the
DirectUpdateHandler2. that will proper do the commit via
UpdateRequestProcessor.
Am I in the right way?  I haven't dealed with CustomUpdateProcessor for
doing something after a commit is executed so I am a bit confused...

Thanks in advance.




hossman wrote:
 
 
 This thread all sounds really kludgy ... among other things the 
 newSearcher listener is going to need to some how keep track of when it 
 was called as a result of a real commit, vs when it was called as the 
 result of a commit it itself triggered to make changes.
 
 wouldn't an easier place to implement this logic be in an UpdateProcessor?  
 you'll still need the double commit (once so you can see the 
 main changes, and once so the rest of the world can see your 
 modifications) but you can execute them both directly in your 
 processCommit(CommitUpdateCommand) method (so you don't have to worry 
 about being able to tell them apart)
 
 : Date: Thu, 30 Jul 2009 10:14:16 +0530
 : From:
 :
 =?UTF-8?B?Tm9ibGUgUGF1bCDgtKjgtYvgtKzgtL/gtLPgtY3igI0gIOCkqOCli+CkrOCljeCk
 : s+CljQ==?= noble.p...@corp.aol.com
 : Reply-To: solr-user@lucene.apache.org, noble.p...@gmail.com
 : To: solr-user@lucene.apache.org
 : Subject: Re: update some index documents after indexing process is done
 with 
 : DIH
 : 
 : If you make your EventListener implements SolrCoreAware you can get
 : hold of the core on inform. use that to get hold of the
 : SolrIndexWriter
 : 
 : On Wed, Jul 29, 2009 at 9:20 PM, Marc Sturlesemarc.sturl...@gmail.com
 wrote:
 : 
 :  From the newSearcher(..) of a CustomEventListener which extends of
 :  AbstractSolrEventListener  can access to SolrIndexSearcher and all
 core
 :  properties but can't get a SolrIndexWriter. Do you now how can I get
 from
 :  there a SolrIndexWriter? This way I would be able to modify the
 documents (I
 :  need to modify them depending on values of other documents, that's why
 I
 :  can't do it with DIH delta-import).
 :  Thanks in advance
 : 
 : 
 :  Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
 : 
 :  On Tue, Jul 28, 2009 at 5:17 PM, Marc
 Sturlesemarc.sturl...@gmail.com
 :  wrote:
 : 
 :  That really sounds the best way to reach my goal. How could I
 invoque a
 :  listener from the newSearcher?Would be something like:
 :     listener event=newSearcher class=solr.QuerySenderListener
 :       arr name=queries
 :         lst str name=qsolr/str str name=start0/str str
 :  name=rows10/str /lst
 :         lst str name=qrocks/str str name=start0/str
 str
 :  name=rows10/str /lst
 :         lststr name=qstatic newSearcher warming query from
 :  solrconfig.xml/str/lst
 :       /arr
 :     /listener
 :     listener event=newSearcher class=solr.MyCustomListener
 : 
 :  And MyCustomListener would be the class who open the reader:
 : 
 :         RefCountedSolrIndexSearcher searchHolder = null;
 :         try {
 :           searchHolder = dataImporter.getCore().getSearcher();
 :           IndexReader reader = searchHolder.get().getReader();
 : 
 :           //Here I iterate over the reader doing docuemnt
 modifications
 : 
 :         } finally {
 :            if (searchHolder != null) searchHolder.decref();
 :         }
 :         } catch (Exception ex) {
 :             LOG.info(error);
 :         }
 : 
 :  you may not be able to access the DIH API from a newSearcher event .
 :  But the API would give you the searcher directly as a method
 :  parameter.
 : 
 :  Finally, to access to documents and add fields to some of them, I
 have
 :  thought in using SolrDocument classes. Can you please point me where
 :  something similar is done in solr source (I mean creation of
 :  SolrDocuemnts
 :  and conversion of them to proper lucene docuements).
 : 

RE: Range Query question

2009-07-30 Thread Matt Beaumont

Thanks for the reply; 
I had thought the solution would be altering the XML.



Ensdorf Ken wrote:
 
 The problem is that the indexed form of this XML is flattened so the
 car
 entity has 2 garage names, 2 min values and 2 max values, but the
 grouping
 between the garage name and it's min and max values is lost.  The
 danger is
 that we end up doing a comparison of the min-of-the-mins and the
 max-of-the-maxes, which tells us that a car is available in the price
 range
 which may not be true if garage1 has all cars below our search range
 and
 garage2 has all cars above our search range, e.g. if our search range
 is
 5000-6000 then we should get no match.
 
 You could index each garage-car pairing as a separate document, embedding
 all the necessary information you need for searching.
 
 e.g.-
 
 garage_car
 car_manufacturerFord/manufacturer
 car_modelKa/model
garage_namegarage1/name
min2000/min
max4000/max
 /garage_car
 
 


-
Matt Beaumont
mibe...@yahoo.co.uk

-- 
View this message in context: 
http://www.nabble.com/Range-Query-question-tp24737656p24742062.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Posting data in JSON

2009-07-30 Thread Shalin Shekhar Mangar
On Thu, Jul 30, 2009 at 8:31 PM, Jérôme Etévé jerome.et...@gmail.comwrote:

 Hi All,

  I'm wondering if it's possible to post documents to solr in JSON format.

 JSON is much faster than XML to get the queries results, so I think
 it'd be great to be able to post data in JSON to speed up the indexing
 and lower the network load.


If you are using Java,Solrj on 1.4 (trunk), you can use the binary format
which is extremely compact and efficient. Note that with Solr/Solrj 1.3,
binary became the default response format for Solrj clients.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Minimum facet length?

2009-07-30 Thread Shalin Shekhar Mangar
On Thu, Jul 30, 2009 at 9:53 PM, dar...@ontrenet.com wrote:

 Hi,
  I am exploring the faceted search results of Solr. My query is like this.


 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=textfacet.limit=500facet.prefix=wick

 If I don't use the prefix, I get back totals for words like 1,a,of,2,3,4.
 1 letter/number occurrences in my documents. Its not really useful since
 all the documents have some free floating single-digit numbers.

 Is there a way to restrict the word frequency results for a facet based on
 the length so I can set it to  3 or is there a better way?


Yes, you can specify facet.mincount=3 to return only those terms present in
more than 3 documents. On a related note, a tokenized field (such as text
type in the example schema) will create a large number of unqiue terms.
Faceting on such a field may not be very useful and/or efficient. Typically
faceting is done on untokenized fields (such as string type).

-- 
Regards,
Shalin Shekhar Mangar.


Re: Minimum facet length?

2009-07-30 Thread Erik Hatcher


On Jul 30, 2009, at 1:00 PM, Shalin Shekhar Mangar wrote:


On Thu, Jul 30, 2009 at 9:53 PM, dar...@ontrenet.com wrote:


Hi,
I am exploring the faceted search results of Solr. My query is like  
this.



http://localhost:8983/solr/select?q=*:*facet=truefacet.field=textfacet.limit=500facet.prefix=wick

If I don't use the prefix, I get back totals for words like 1,a,of, 
2,3,4.
1 letter/number occurrences in my documents. Its not really useful  
since

all the documents have some free floating single-digit numbers.

Is there a way to restrict the word frequency results for a facet  
based on

the length so I can set it to  3 or is there a better way?



Yes, you can specify facet.mincount=3 to return only those terms  
present in
more than 3 documents. On a related note, a tokenized field (such as  
text
type in the example schema) will create a large number of unqiue  
terms.
Faceting on such a field may not be very useful and/or efficient.  
Typically

faceting is done on untokenized fields (such as string type).


I think what was meant by  3 was if faceting only returned terms of  
length greater than 3, not count.


You could copyField your text field to another field, set the analyzer  
to include a LengthFilterFactory with a minimum length specified, and  
also have other analysis tweaks to have numbers and other stop words  
removed.


Erik



Re: Minimum facet length?

2009-07-30 Thread Shalin Shekhar Mangar
On Thu, Jul 30, 2009 at 10:35 PM, Erik Hatcher
e...@ehatchersolutions.comwrote:


 On Jul 30, 2009, at 1:00 PM, Shalin Shekhar Mangar wrote:

  On Thu, Jul 30, 2009 at 9:53 PM, dar...@ontrenet.com wrote:

  Hi,
 I am exploring the faceted search results of Solr. My query is like this.



 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=textfacet.limit=500facet.prefix=wick

 If I don't use the prefix, I get back totals for words like 1,a,of,2,3,4.
 1 letter/number occurrences in my documents. Its not really useful since
 all the documents have some free floating single-digit numbers.

 Is there a way to restrict the word frequency results for a facet based
 on
 the length so I can set it to  3 or is there a better way?


 Yes, you can specify facet.mincount=3 to return only those terms present
 in
 more than 3 documents. On a related note, a tokenized field (such as text
 type in the example schema) will create a large number of unqiue terms.
 Faceting on such a field may not be very useful and/or efficient.
 Typically
 faceting is done on untokenized fields (such as string type).


 I think what was meant by  3 was if faceting only returned terms of length
 greater than 3, not count.


Ah, sorry. I was too fast to reply.

-- 
Regards,
Shalin Shekhar Mangar.


Re: How can i get lucene index format version information?

2009-07-30 Thread Chris Hostetter

:  i want to get the lucene index format version from solr web app (as

: the Luke request handler writes it out:
: 
:indexInfo.add(version, reader.getVersion());

that's the index version (as in i have added docs to the index, so the 
version number has changed) the question is about the format version (as 
in: i have upgraded Lucene from 2.1 to 2.3, so the index format has 
changed)

I'm not sure how Luke get's that ... it's not exposed via a public API on 
an IndexReader.

Hmm...  SegmentInfos.readCurrentVersion(Directory) seems like it would do 
the trick; but i'm not sure how that would interact with customized 
INdexReader implementations.  i suppose we could always make it non-fatal 
if it throws an exception (just print the exception mesg in place of hte 
number)

anybody want to submit a patch to add this to the LukeRequestHandler?


-Hoss



Re: How can i get lucene index format version information?

2009-07-30 Thread Walter Underwood
I think the properties page in the admin UI lists the Lucene version,  
but I don't have a live server to check that on at this instant.


wunder

On Jul 30, 2009, at 10:26 AM, Chris Hostetter wrote:



:  i want to get the lucene index format version from solr web app  
(as


: the Luke request handler writes it out:
:
:indexInfo.add(version, reader.getVersion());

that's the index version (as in i have added docs to the index, so  
the
version number has changed) the question is about the format  
version (as

in: i have upgraded Lucene from 2.1 to 2.3, so the index format has
changed)

I'm not sure how Luke get's that ... it's not exposed via a public  
API on

an IndexReader.

Hmm...  SegmentInfos.readCurrentVersion(Directory) seems like it  
would do

the trick; but i'm not sure how that would interact with customized
INdexReader implementations.  i suppose we could always make it non- 
fatal
if it throws an exception (just print the exception mesg in place of  
hte

number)

anybody want to submit a patch to add this to the LukeRequestHandler?


-Hoss




Re: Posting data in JSON

2009-07-30 Thread Jérôme Etévé
Hi,

  Nope, I'm not using solrj (my client code is in Perl), and I'm with solr 1.3.

J.

2009/7/30 Shalin Shekhar Mangar shalinman...@gmail.com:
 On Thu, Jul 30, 2009 at 8:31 PM, Jérôme Etévé jerome.et...@gmail.com
 wrote:

 Hi All,

  I'm wondering if it's possible to post documents to solr in JSON format.

 JSON is much faster than XML to get the queries results, so I think
 it'd be great to be able to post data in JSON to speed up the indexing
 and lower the network load.

 If you are using Java,Solrj on 1.4 (trunk), you can use the binary format
 which is extremely compact and efficient. Note that with Solr/Solrj 1.3,
 binary became the default response format for Solrj clients.

 --
 Regards,
 Shalin Shekhar Mangar.




-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net


Mailing list: Change the reply too ?

2009-07-30 Thread Jérôme Etévé
Hi all,

 I don't know if it does the same from everyone, but when I use the
reply function of my mail agent, it sets the recipient to the user who
sent the message, and not the mailing list.

So it's quite annoying cause I have to change the recipient each time
I reply to someone on the list.

Can the list admins fix this issue ?

Cheers !

J.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net


Re: Mailing list: Change the reply too ?

2009-07-30 Thread Erik Hatcher


On Jul 30, 2009, at 1:44 PM, Jérôme Etévé wrote:


Hi all,

I don't know if it does the same from everyone, but when I use the
reply function of my mail agent, it sets the recipient to the user who
sent the message, and not the mailing list.

So it's quite annoying cause I have to change the recipient each time
I reply to someone on the list.

Can the list admins fix this issue ?


All my replies go to the list.

From your message, the header says:

 Reply-To: solr-user@lucene.apache.org

Erik



Re: Mailing list: Change the reply too ?

2009-07-30 Thread Jérôme Etévé
2009/7/30 Erik Hatcher e...@ehatchersolutions.com:

 On Jul 30, 2009, at 1:44 PM, Jérôme Etévé wrote:

 Hi all,

 I don't know if it does the same from everyone, but when I use the
 reply function of my mail agent, it sets the recipient to the user who
 sent the message, and not the mailing list.

 So it's quite annoying cause I have to change the recipient each time
 I reply to someone on the list.

 Can the list admins fix this issue ?

 All my replies go to the list.

 From your message, the header says:

  Reply-To: solr-user@lucene.apache.org

Erik

It works with your messages. It might depends on mail agents.

Jerome.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net


Re: Posting data in JSON

2009-07-30 Thread Ryan McKinley

check:
https://issues.apache.org/jira/browse/SOLR-945

this will not likely make it into 1.4



On Jul 30, 2009, at 1:41 PM, Jérôme Etévé wrote:


Hi,

 Nope, I'm not using solrj (my client code is in Perl), and I'm with  
solr 1.3.


J.

2009/7/30 Shalin Shekhar Mangar shalinman...@gmail.com:
On Thu, Jul 30, 2009 at 8:31 PM, Jérôme Etévé  
jerome.et...@gmail.com

wrote:


Hi All,

I'm wondering if it's possible to post documents to solr in JSON  
format.


JSON is much faster than XML to get the queries results, so I think
it'd be great to be able to post data in JSON to speed up the  
indexing

and lower the network load.


If you are using Java,Solrj on 1.4 (trunk), you can use the binary  
format
which is extremely compact and efficient. Note that with Solr/Solrj  
1.3,

binary became the default response format for Solrj clients.

--
Regards,
Shalin Shekhar Mangar.





--
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net




Re: Minimum facet length?

2009-07-30 Thread Darren Govoni
Hi Erik,
  Thanks for the tip. H, well that's a good point, or maybe I will
just do the word filtering upfront and store it separately now that I
think about it more.

Darren

On Thu, 2009-07-30 at 13:05 -0400, Erik Hatcher wrote:
 On Jul 30, 2009, at 1:00 PM, Shalin Shekhar Mangar wrote:
 
  On Thu, Jul 30, 2009 at 9:53 PM, dar...@ontrenet.com wrote:
 
  Hi,
  I am exploring the faceted search results of Solr. My query is like  
  this.
 
 
  http://localhost:8983/solr/select?q=*:*facet=truefacet.field=textfacet.limit=500facet.prefix=wick
 
  If I don't use the prefix, I get back totals for words like 1,a,of, 
  2,3,4.
  1 letter/number occurrences in my documents. Its not really useful  
  since
  all the documents have some free floating single-digit numbers.
 
  Is there a way to restrict the word frequency results for a facet  
  based on
  the length so I can set it to  3 or is there a better way?
 
 
  Yes, you can specify facet.mincount=3 to return only those terms  
  present in
  more than 3 documents. On a related note, a tokenized field (such as  
  text
  type in the example schema) will create a large number of unqiue  
  terms.
  Faceting on such a field may not be very useful and/or efficient.  
  Typically
  faceting is done on untokenized fields (such as string type).
 
 I think what was meant by  3 was if faceting only returned terms of  
 length greater than 3, not count.
 
 You could copyField your text field to another field, set the analyzer  
 to include a LengthFilterFactory with a minimum length specified, and  
 also have other analysis tweaks to have numbers and other stop words  
 removed.
 
   Erik
 



Re: Mailing list: Change the reply too ?

2009-07-30 Thread Chris Hostetter

:  I don't know if it does the same from everyone, but when I use the
: reply function of my mail agent, it sets the recipient to the user who
: sent the message, and not the mailing list.
: 
: So it's quite annoying cause I have to change the recipient each time
: I reply to someone on the list.
: 
: Can the list admins fix this issue ?

The list software allways adds a Reply-To header indicating that replies 
should be sent to the list.  It does *not* remove any existing Reply-To 
headers that the orriginal sender may have included -- it does this 
because it trusts that the orriginal sender had a reason for putting it 
there (ie: when someone off list, like the apachecon coordinators, sends 
an announcement and the moderators let it through)

It's mail client dependant as to what to do when you reply to a message 
like that -- yours apparently just picks one (and sometimes it's not the 
list) most either reply to both, or ask the user do you want to reply to 
all


-Hoss



Reasonable number of maxWarming searchers

2009-07-30 Thread Jérôme Etévé
Hi All,

 I'm planning to have a certain number of processes posting
independently in a solr instance.
 This instance will solely act as a master instance. No clients queries on it.

 Is there a problem if i set maxWarmingSearchers to something like 30 or 40?
 Also, how do I disable the cache warming? Is setting autowarmCount's
to 0 enough?


 Regards,

 Jerome.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net


Re: Reasonable number of maxWarming searchers

2009-07-30 Thread Erik Hatcher
I recommend, in this case, that you use Solr's autocommit feature (see  
solrconfig.xml) rather than having your indexing clients issue their  
own commits.  Overlapped searcher warming is just going to be too much  
of a hit on RAM, and generally unnecessary with autocommit.


Erik

On Jul 30, 2009, at 2:28 PM, Jérôme Etévé wrote:


Hi All,

I'm planning to have a certain number of processes posting
independently in a solr instance.
This instance will solely act as a master instance. No clients  
queries on it.


Is there a problem if i set maxWarmingSearchers to something like 30  
or 40?

Also, how do I disable the cache warming? Is setting autowarmCount's
to 0 enough?


Regards,

Jerome.

--
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net




What does showItems config mean on fieldValueCache mean?

2009-07-30 Thread Stephen Duncan Jr
What's the effect of showItems attribute on the fieldValueCache in Solr 1.4?

-- 
Stephen Duncan Jr
www.stephenduncanjr.com


How to get a stack trace

2009-07-30 Thread Nicolae Mihalache
Hello,

I'm a new user of solr but I have worked a bit with Lucene before. I get some 
out of memory exception when optimizing the index through Solr and I would like 
to find out why.
However, the only message I get on standard output is:
Jul 30, 2009 9:20:22 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space

Is there a way to get a stack trace for this exception? I had a look into the 
java.util.logging options and didn't find anything. 

My solr runs in some standard configuration inside jetty.
Any suggestion would be appreciated.

Thanks,
nicolae


  


Problem with retrieving field from database using DIH

2009-07-30 Thread ahammad

Hello all,

I've been having this issue for a while now. I am indexing a Sybase
database. Everything is fantastic, except that there is 1 column that I can
never get back. I don't have direct database access via Sybase client, but I
was able to extract the data using some Java code.

The field is essentially a Last Modified field. In the DB I believe that it
is of type long. In the Java program that I have, I am able to retrieve the
data that is in that column and put it in a variable of type Long. This is
not the case in Solr, however.

I set the variable in the schema as required to see why the data is never
stored:
field name=lastModified type=long indexed=true stored=true
required=true/

This is what I get in the Tomcat logs:

org.apache.solr.common.SolrException: Document [00069391] missing required
field: lastModified
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:292)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59)
at 
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:67)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:276)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:373)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:224)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:316)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:374)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355)

From what I can gather, it is not finding the data and/or column, and thus
cannot populate the required field. However, the data is there, which I was
able to prove outside of Solr.

Is there a way to generate more descriptive logs for this? I am completely
lost. I hit this problem a few months ago but I was never able to resolve
it. Any help on this will be much appreciated.

BTW, Solr was successful in retrieving data from other columns in the same
table...

Thanks
-- 
View this message in context: 
http://www.nabble.com/Problem-with-retrieving-field-from-database-using-DIH-tp24746530p24746530.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: What does showItems config mean on fieldValueCache mean?

2009-07-30 Thread Erik Hatcher


On Jul 30, 2009, at 3:32 PM, Stephen Duncan Jr wrote:

What's the effect of showItems attribute on the fieldValueCache in  
Solr 1.4?


Just outputs details of the last accessed items from the cache in the  
stats display.


Erik

if (showItems != 0) {
  Map items = cache.getLatestAccessedItems( showItems == -1 ?  
Integer.MAX_VALUE : showItems );

  for (Map.Entry e : (Set Map.Entry)items.entrySet()) {
Object k = e.getKey();
Object v = e.getValue();

String ks = item_ + k;
String vs = v.toString();
lst.add(ks,vs);
  }

}



Solr/Lucene performance differences on Mac OS X running Tiger vs. Leopard ?

2009-07-30 Thread Mark Bennett
As far as our NOC guys know the machines are approximately the same, aside
from the OS.  The Leopard machine is running the default 1.5 JVM.

And it's possible that some other application or config issues is to blame.
Nobody's blaming the OS or Lucene, we're just asking around.

Searches on Google haven't turned up any reports, so I'm suspecting the
issue lies elsewhere.  Also I've run on Leopard for months without any
performance issues, though I really don't tax anything on my workstation.

--
Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513


Re: What does showItems config mean on fieldValueCache mean?

2009-07-30 Thread Stephen Duncan Jr
On Thu, Jul 30, 2009 at 4:18 PM, Erik Hatcher e...@ehatchersolutions.comwrote:


 On Jul 30, 2009, at 3:32 PM, Stephen Duncan Jr wrote:

  What's the effect of showItems attribute on the fieldValueCache in Solr
 1.4?


 Just outputs details of the last accessed items from the cache in the stats
 display.

Erik

if (showItems != 0) {
  Map items = cache.getLatestAccessedItems( showItems == -1 ?
 Integer.MAX_VALUE : showItems );
  for (Map.Entry e : (Set Map.Entry)items.entrySet()) {
Object k = e.getKey();
Object v = e.getValue();

String ks = item_ + k;
String vs = v.toString();
lst.add(ks,vs);
  }

}


Makes sense.  Thanks!

-- 
Stephen Duncan Jr
www.stephenduncanjr.com


Re: Problem with retrieving field from database using DIH

2009-07-30 Thread Shalin Shekhar Mangar
On Fri, Jul 31, 2009 at 1:43 AM, ahammad ahmed.ham...@gmail.com wrote:

 From what I can gather, it is not finding the data and/or column, and thus
 cannot populate the required field. However, the data is there, which I was
 able to prove outside of Solr.

 Is there a way to generate more descriptive logs for this? I am completely
 lost. I hit this problem a few months ago but I was never able to resolve
 it. Any help on this will be much appreciated.


Can you try using the debug mode and see what your sql query is returning?
You can either use the /admin/dataimport.jsp or add a debug=onverbose=true
parameter to the import. You should probably limit the number of documents
to be indexed by adding rows=X to the full-import command otherwise the
response would be huge.

-- 
Regards,
Shalin Shekhar Mangar.


Re: µTorrent indexed as µTorrent

2009-07-30 Thread Bill Au
Thanks, Robert.  That's exactly what my problem was.  Things work find after
I make sure that all my processing (index and query) are using UTF-8.  FYI,
it took me a while to discover that SolrJ by default uses a GET request for
query, which uses ISO-8859-1.  I had to explicitly use a POST to do query in
SolrJ in order to get it to use UTF-8.

Bill

On Tue, Jul 28, 2009 at 5:27 PM, Robert Muir rcm...@gmail.com wrote:

 Bill, somewhere in the process I think you might be treating your
 UTF-8 text as ISO-8859-1.

 Your character: 00B5 (µ)
 Bits: 10110101

 UTF8-encoded: 1110 10110101

 If you were to treat these bytes as ISO-8859-1 (i.e. reading from a
 file or wrong url encoding) then it looks like:
 0xC2 (Å) followed by 0xB5 (µ)


 On Tue, Jul 28, 2009 at 3:26 PM, Bill Aubill.w...@gmail.com wrote:
  I am using SolrJ to index the word µTorrent.  After a commit I was not
 able
  to query for it.  It turns out that the document in my Solr index
 contains
  the word µTorrent instead of µTorrent.  Any one has any idea what's
 going
  on???
 
  Bill
 



 --
 Robert Muir
 rcm...@gmail.com



Re: µTorrent indexed as µTorrent

2009-07-30 Thread Yonik Seeley
On Thu, Jul 30, 2009 at 6:34 PM, Bill Aubill.w...@gmail.com wrote:
  FYI, it took me a while to discover that SolrJ by default uses a GET request 
 for
 query, which uses ISO-8859-1.

That depends on the servlet container.  SolrJ GET requests are sent in
UTF-8.  Some servlet containers such as Tomcat need extra
configuration to treat URLs as UTF-8 instead of latin-1, but the
standard http://www.ietf.org/rfc/rfc3986.txt clearly specifies UTF-8.

To test the servlet container configuration, check out
example/exampledocs/test_utf8.sh

-Yonik
http://www.lucidimagination.com

  I had to explicitly use a POST to do query in
 SolrJ in order to get it to use UTF-8.

 Bill

 On Tue, Jul 28, 2009 at 5:27 PM, Robert Muir rcm...@gmail.com wrote:

 Bill, somewhere in the process I think you might be treating your
 UTF-8 text as ISO-8859-1.

 Your character: 00B5 (µ)
 Bits: 10110101

 UTF8-encoded: 1110 10110101

 If you were to treat these bytes as ISO-8859-1 (i.e. reading from a
 file or wrong url encoding) then it looks like:
 0xC2 (Å) followed by 0xB5 (µ)


 On Tue, Jul 28, 2009 at 3:26 PM, Bill Aubill.w...@gmail.com wrote:
  I am using SolrJ to index the word µTorrent.  After a commit I was not
 able
  to query for it.  It turns out that the document in my Solr index
 contains
  the word µTorrent instead of µTorrent.  Any one has any idea what's
 going
  on???
 
  Bill
 



 --
 Robert Muir
 rcm...@gmail.com




facet sorting by index on sint fields

2009-07-30 Thread Simon Stanlake
Hi,
I have a field in my schema specified using

field name=wordCount type=sint/

Where sint is specified as follows (the default from schema.xml)

fieldType name=sint class=solr.SortableIntField sortMissingLast=true 
omitNorms=true/

When I do a facet on this field using sort=index I always get the values back 
in lexicographic order. Eg: adding this to a query string...

facet=truefacet.field=wordCountf.wordCount.facet.sort=index

gives me
lst name=wordCount
int name=15/int
int name=102/int
int name=26/int
...

Is this a current limitation of solr faceting or am I missing a configuration 
step somewhere? I couldn't find any notes in the docs about this.

Cheers,
Simon



Re: How can i get lucene index format version information?

2009-07-30 Thread Jay Hill
Check the system request handler: http://localhost:8983/solr/admin/system

Should look something like this:
lst name=lucene
str name=solr-spec-version1.3.0.2009.07.28.10.39.42/str
str name=solr-impl-version1.4-dev 797693M - jayhill - 2009-07-28
10:39:42/str
str name=lucene-spec-version2.9-dev/str
str name=lucene-impl-version2.9-dev 794238 - 2009-07-15 18:05:08/str
/lst

-Jay


On Thu, Jul 30, 2009 at 10:32 AM, Walter Underwood wun...@wunderwood.orgwrote:

 I think the properties page in the admin UI lists the Lucene version, but I
 don't have a live server to check that on at this instant.

 wunder


 On Jul 30, 2009, at 10:26 AM, Chris Hostetter wrote:


 :  i want to get the lucene index format version from solr web app (as

 : the Luke request handler writes it out:
 :
 :indexInfo.add(version, reader.getVersion());

 that's the index version (as in i have added docs to the index, so the
 version number has changed) the question is about the format version (as
 in: i have upgraded Lucene from 2.1 to 2.3, so the index format has
 changed)

 I'm not sure how Luke get's that ... it's not exposed via a public API on
 an IndexReader.

 Hmm...  SegmentInfos.readCurrentVersion(Directory) seems like it would do
 the trick; but i'm not sure how that would interact with customized
 INdexReader implementations.  i suppose we could always make it non-fatal
 if it throws an exception (just print the exception mesg in place of hte
 number)

 anybody want to submit a patch to add this to the LukeRequestHandler?


 -Hoss





Re: query in solr lucene

2009-07-30 Thread Sushan Rungta

I tried this but this didn't worked...

Regards,
Sushan

At 12:37 AM 7/30/2009, Avlesh Singh wrote:

You may index your data using a delimiter, like $my-field-content$. While
searching, perform a phrase query with the leading and trailing $ appended
to the query string.

Cheers
Avlesh

On Wed, Jul 29, 2009 at 12:04 PM, Sushan Rungta s...@clickindia.com wrote:

 I tried using AND, but it even provided me doc 3 which was not required.

 Hence my problem still persists...

 regards,
 Sushan


 At 06:59 AM 7/29/2009, Avlesh Singh wrote:

 
  No, phrase query would match docs 2 and 3. Sushan only wantsdoc 2 as I
 read
  it.
 
 Sorry, my bad. I did not read properly before replying.

 Cheers
 Avlesh

 On Wed, Jul 29, 2009 at 3:23 AM, Erick Erickson erickerick...@gmail.com
 wrote:

  No, phrase query would match docs 2 and 3. Sushan only wantsdoc 2 as I
 read
  it.
 
  You might have some joy with KeywordAnalyzer, which does
  not break the incoming stream up into tokens. You have to be
  careful, though, because it also won't fold the case, so 'Hello'
  would not match 'hello'.
 
  Best
  Erick
 
  On Tue, Jul 28, 2009 at 11:11 AM, Avlesh Singh avl...@gmail.com
 wrote:
 
   You should perform a PhraseQuery on the required field.
   Meaning, http://your-solr-host:port:
   /your-core-path/select?q=fieldName:Hello
   how are you sushan would work for you.
  
   Cheers
   Avlesh
  
   2009/7/28 Gérard Dupont ger.dup...@gmail.com
  
Hi Sushan,
   
I'm not an expert of Solr, just beginner, but it appears to me that
 you
 may
have default 'OR' combinaison fo keywords so that will explain this
behavior. Try to modify the configuration for an 'AND' combinaison.
   
cheers
   
On Tue, Jul 28, 2009 at 16:49, Sushan Rungta s...@clickindia.com
  wrote:
   
 I am extremely sorry for responding late as I was ill from past
 few
   days.

 My problem is explained below with an example:

 I am having three documents with following list:

 1. Hello how are you
 2. Hello how are you sushan
 3. Hello how are you sushan. I am fine.

 When I search for a query Hello how are you sushan, I should
 only
  get
 document 2 in my result.

 I hope this will give you all a better insight in my problem.

 regards,

 Sushan Rungta

   
   
   
--
Gérard Dupont
Information Processing Control and Cognition (IPCC) - EADS DS
http://weblab-project.org
   
Document  Learning team - LITIS Laboratory
   
  
 









Using DIH for parallel indexing

2009-07-30 Thread Avlesh Singh
I am using Solr 1.3 and have a few questions regarding DIH:

   1. Can I pass parameters to DIH and be able to use them inside the
   query attribute of an entity inside the data-config file?
   2. I am indexing some 2 million database records using DIH with 4-5
   nested entities (just one level). These subqueries are highly optimized
   cannot be avoided. Since, DIH processes records sequentially, it takes a lot
   of time (approximately 3 hours) to re-build the indexes. My question is -
   Can I use DIH in someway so that indexing can be carried out in parallel?
   3. What happens if I register multiple DIH's (like dih1, dih2, dih3
   ...) with different data-config files inside the same core and run
   full-import on each of them at the same time? Are the indexes created by
   each of these (inside the same data directory) merged?

Due to my lack of knowledge on Lucene/Solr internals, some of these
questions might be funny.

Cheers
Avlesh


Re: query in solr lucene

2009-07-30 Thread Avlesh Singh
What field type are you using? What kind of filters have you applied on the
field?
The easiest way to make it work it to use a string field.

Cheers
Avlesh

On Fri, Jul 31, 2009 at 11:09 AM, Sushan Rungta s...@clickindia.com wrote:

 I tried this but this didn't worked...

 Regards,
 Sushan

 At 12:37 AM 7/30/2009, Avlesh Singh wrote:

 You may index your data using a delimiter, like $my-field-content$. While
 searching, perform a phrase query with the leading and trailing $
 appended
 to the query string.

 Cheers
 Avlesh

 On Wed, Jul 29, 2009 at 12:04 PM, Sushan Rungta s...@clickindia.com
 wrote:

  I tried using AND, but it even provided me doc 3 which was not required.
 
  Hence my problem still persists...
 
  regards,
  Sushan
 
 
  At 06:59 AM 7/29/2009, Avlesh Singh wrote:
 
  
   No, phrase query would match docs 2 and 3. Sushan only wantsdoc 2 as
 I
  read
   it.
  
  Sorry, my bad. I did not read properly before replying.
 
  Cheers
  Avlesh
 
  On Wed, Jul 29, 2009 at 3:23 AM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
   No, phrase query would match docs 2 and 3. Sushan only wantsdoc 2 as
 I
  read
   it.
  
   You might have some joy with KeywordAnalyzer, which does
   not break the incoming stream up into tokens. You have to be
   careful, though, because it also won't fold the case, so 'Hello'
   would not match 'hello'.
  
   Best
   Erick
  
   On Tue, Jul 28, 2009 at 11:11 AM, Avlesh Singh avl...@gmail.com
  wrote:
  
You should perform a PhraseQuery on the required field.
Meaning, http://your-solr-host:port:
/your-core-path/select?q=fieldName:Hello
how are you sushan would work for you.
   
Cheers
Avlesh
   
2009/7/28 Gérard Dupont ger.dup...@gmail.com
   
 Hi Sushan,

 I'm not an expert of Solr, just beginner, but it appears to me
 that
  you
  may
 have default 'OR' combinaison fo keywords so that will explain
 this
 behavior. Try to modify the configuration for an 'AND'
 combinaison.

 cheers

 On Tue, Jul 28, 2009 at 16:49, Sushan Rungta s...@clickindia.com
   wrote:

  I am extremely sorry for responding late as I was ill from past
  few
days.
 
  My problem is explained below with an example:
 
  I am having three documents with following list:
 
  1. Hello how are you
  2. Hello how are you sushan
  3. Hello how are you sushan. I am fine.
 
  When I search for a query Hello how are you sushan, I should
  only
   get
  document 2 in my result.
 
  I hope this will give you all a better insight in my problem.
 
  regards,
 
  Sushan Rungta
 



 --
 Gérard Dupont
 Information Processing Control and Cognition (IPCC) - EADS DS
 http://weblab-project.org

 Document  Learning team - LITIS Laboratory