date:20090819

DO NOT RELY on your hosting provider. They use automated tools creating
complete mess with approved for production on CentOS versions of Lucene,
Servlet-API, java.util.* package, and etc; look at this:

 Here is my classpath entry when Tomcat starts up
 java.library.path:
 /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0/jre/lib/i386/client:/usr/lib/j
 vm/jav 
 a-1.6.0-openjdk-1.6.0.0/jre/lib/i386:/usr/lib/jvm/java-1.6.0-openjdk-1
 .6.0.0 /jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib


Who is vendor of this openjdk-1.6.0.0? Who is vendor of JVM which this JDK
runs on? Do you use client when you really need server environment? Is
it HotSpot? Is yoyr platform really i386?

I mentioned in previous post such installs for Java are totally mess, you
may have incompatible Servlet API loaded by bootstrap classloader before
Tomcat classes etc.


Install everything from scratch.



=???
INFO: Adding 'file:/usr/share/tomcat5/solr/lib/jetty-util-6.1.3.jar' to Solr
classloader
=???







-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca] 
Sent: August-19-09 1:43 AM
To: solr-user@lucene.apache.org
Subject: RE: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on
CentOS

-Dsolr.solr.home='/some/path'


CORRECT:
-Dsolr.data.dir=..

It should be in java startup parameters; for instance, JAVA_OPTS=-server
-Zms32768M -Xmx32768M -Dsolr.data.dir=/some/path inside catalina.sh as a
first statement...


According to the logs you posted probably  mistake in solr.xml which is
multicore definition;  you should post here it's content.

Java 1.4/5/6 supports nested exceptions.

The root cause of your problem:
java.lang.NoClassDefFoundError: org.apache.solr.core.Config

This exception causes another one:
javax.xml.xpath.XPathFactoryConfigurationException: No XPathFctory
implementation found for the object model:
http://java.sun.com/jaxp/xpath/dom
at javax.xml.xpath.XPathFactory.newInstance(Unknown Source)
at org.apache.solr.core.Config.clinit(Config.java:41)

etc. etc. etc.


NoClassDefFoundError means: classloader didn't have any problem with finding
class def, but it couldn't define it due for instance dependency on
another library and/or dependency on configuration file such as solr.xml
etc. 

XPath should be called on DOM (after Config is properly initialized)


Difficult to explain what is wrong with your mess of files in config (you
are obviously using double-core) - you should do following:
1. Install Tomcat
2. Copy SOLR war file to webapps folder
3. Start Tomcat and verify logs; ensure that you have some clear messages in
it (SOLR should use default home? Verify!)
4. Configure SOLR-home with sample solrconfig.xml and schema.xml, restart,
verify 
... ... ...


Don't go to multicore until you play enough with simplest SOLR installation


-Original Message-
From: Aaron Aberg [mailto:aaronab...@gmail.com] 
Sent: August-19-09 12:28 AM
To: solr-user@lucene.apache.org
Subject: Re: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on
CentOS

Tomcat is running fine. It's solr that is having the issue. I keep
seeing people talk about this:

-Dsolr.solr.home='/some/path'

Should I be putting that somewhere? Or is that already taken care of
when I edited the web.xml file in my solr.war file?

On Tue, Aug 18, 2009 at 7:29 PM, Fuad Efendif...@efendi.ca wrote:
 I forgot to add: compiler is inside tools.jar in some cases if I am
 correct... doesn't matter really... try to access Tomcat default homepage
 before trying to use SOLR!



 

 The only difference between JRE and JDK (from TOMCAT viewpoint) is absence
 of javac compiler for JSPs. But it will complain only if you try to use
JSPs
 (via admin console).

 Have you tried to install SOLR on your localbox and play with samples
 described at many WIKI pages?



 -Original Message-
 From: Aaron Aberg [mailto:aaronab...@gmail.com]
 Sent: August-18-09 9:04 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on
 CentOS

 Marco might be right about the JRE thing.
 Here is my classpath entry when Tomcat starts up
 java.library.path:

/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0/jre/lib/i386/client:/usr/lib/jvm/jav

a-1.6.0-openjdk-1.6.0.0/jre/lib/i386:/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0
 /jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib

 Constantijn,

 Here is my solr home file list with permissions:

 -bash-3.2$ ll /usr/share/solr/*
 -rw-r--r-- 1 tomcat tomcat 2150 Aug 17 22:51 /usr/share/solr/README.txt

 /usr/share/solr/bin:
 total 160
 -rwxr-xr-x 1 tomcat tomcat 4896 Aug 17 22:51 abc
 -rwxr-xr-x 1 tomcat tomcat 4919 Aug 17 22:51 abo
 -rwxr-xr-x 1 tomcat tomcat 2915 Aug 17 22:51 backup
 -rwxr-xr-x 1 tomcat tomcat 3435 Aug 17 22:51 backupcleaner
 -rwxr-xr-x 1 tomcat tomcat 3312 Aug 17 22:51 commit
 -rwxr-xr-x 1 tomcat tomcat 3306 Aug 17 22:51 optimize
 -rwxr-xr-x 1 tomcat tomcat 3163 Aug 17 22:51 readercycle
 -rwxr-xr-x 1 tomcat tomcat 1752

Re: DataImportHandler ignoring most rows

this comment says that
   str name=Total Rows Fetched7/str

the query fetched only 7 rows. If possible open a tool and just run
the same query and see how many rows are returned

On Wed, Aug 19, 2009 at 3:46 AM, Erik Earleerikea...@yahoo.com wrote:
 Using:
 - apache-solr-1.3.0
 - java 1.6
 - tomcat 6
 - sql server 2005 w/ JSQLConnect 4.0 driver

 I have a group table with 3007 rows.  I have confirmed the key is
 unique with select distinct id from group  and it returns 3007.  When i 
 re-index using http://host:port/solr/dataimport?command=full-import  I only 
 get 7 records indexed.  Any insight into what is going on would be really 
 great.

 A partial response:
    lst name=statusMessages
    str name=Total Requests made to DataSource1/str
    str name=Total Rows Fetched7/str
    str name=Total Documents Skipped0/str


 I have other entities that index all the rows without issue.

 There are no errors in the logs.

 I am not using any Transformers (and most of my config is not changed from 
 install)

 My schema.xml contains:

     uniqueKeykey/uniqueKey

 and field defs (not a full list of fields):
   field name=key type=string indexed=true stored=true 
 required=true /
   field name=class type=string indexed=true stored=true 
 required=true /
   field name=id type=string indexed=true stored=true /
   field name=description type=text indexed=true stored=true /
   field name=created type=date indexed=true stored=true /
   field name=updated type=date indexed=true stored=true /

 data-config.xml
 
 dataConfig
    !-- 
 jdbc:JSQLConnect://se-eriearle-lt1/database=SocialSite2/user=SocialSite2/logfile=DB_TRACE.log
  --
    dataSource type=JdbcDataSource
        driver=com.jnetdirect.jsql.JSQLDriver
        
 url=jdbc:JSQLConnect://se-eriearle-lt1/database=SocialSite2/user=SocialSite2
        user=SocialSite2
        password=SocialSite2 /

    document

    

        entity name=Group  pk=key
            query=select 'group.'+id as 'key', 'group' as 'class', name, 
 handle, description, created, updated from group order by created asc
        /entity

        entity name=Message pk=key
            query=...redacted...
        /entity

    /document
 /dataConfig







-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Replication over multi-core solr

On Wed, Aug 19, 2009 at 2:27 AM, vivek sarvivex...@gmail.com wrote:
 Hi,

  We use multi-core setup for Solr, where new cores are added
 dynamically to solr.xml. Only one core is active at a time. My
 question is how can the replication be done for multi-core - so every
 core is replicated on the slave?

replication does not handle new core creation. You will have to issue
the core creation command to each slave separately.

 I went over the wiki, http://wiki.apache.org/solr/SolrReplication,
 and few questions related to that,

 1) How do we replicate solr.xml where we have list of cores? Wiki
 says, Only files in the 'conf' dir of solr instance is replicated. 
 - since, solr.xml is in the home directory how do we replicate that?
solr.xml canot be replicated. even if you did it is not reloaded.

 2) Solrconfig.xml in slave takes a static core url,

    str name=masterUrlhttp://localhost:port/solr/corename/replication/str

put a placeholder like
str 
name=masterUrlhttp://localhost:port/solr/${solr.core.name}/replication/str
so the corename is automatically replaced


 As in our case cores are created dynamically (new core created after
 the active one reaches some capacity), how can we define master core
 dynamically for replication? The only I see it is using fetchIndex
 command and passing new core info there - is it right? If so, does the
 slave application have write code to poll Master periodically and fire
 fetchIndex command, but how would Slave know the Master corename -
 as they are created dynamically on the Master?

 Thanks,
 -vivek




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Is negative boost possible?

2009-08-19 Thread Marc Sturlese

:the only way to negative boost is to positively boost the inverse...
:
: (*:* -field1:value_to_penalize)^10

This will do the job aswell as bq supports pure negative queries (at least
in trunk):
bq=-field1:value_to_penalize^10

http://wiki.apache.org/solr/SolrRelevancyFAQ#head-76e53db8c5fd31133dc3566318d1aad2bb23e07e

hossman wrote:

: Use decimal figure less than 1, e.g. 0.5, to express less importance.

but that's stil la positive boost ... it still increases the scores of
documents that match.

the only way to negative boost is to positively boost the inverse...

(*:* -field1:value_to_penalize)^10

: I am looking for a way to assign negative boost to a term in Solr
query.
: Our use scenario is that we want to boost matching documents that are
: updated recently and penalize those that have not been updated for a
long
: time. There are other terms in the query that would affect the scores
as
: well. For example we construct a query similar to this:
:
: *:* field1:value1^2 field2:value2^2 lastUpdateTime:[NOW/DAY-90DAYS TO
*]^5
: lastUpdateTime:[* TO NOW/DAY-365DAYS]^-3
:
: I notice it's not possible to simply use a negative boosting factor in
the
: query. Is there any way to achieve such result?
:
: Regards,
: Shi Quan He
:
:

-Hoss

--
View this message in context:
http://www.nabble.com/Is-negative-boost-possible--tp25025775p25039059.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Replication over multi-core solr

Hi Vivek,
currently we want to add cores dynamically when the active one reaches
some capacity,
can you give me some hints to achieve such this functionality? (Just
wondering if you have used shell-scripting or you have code some 100%
Java based solution)

Thx


2009/8/19 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 On Wed, Aug 19, 2009 at 2:27 AM, vivek sarvivex...@gmail.com wrote:
 Hi,

  We use multi-core setup for Solr, where new cores are added
 dynamically to solr.xml. Only one core is active at a time. My
 question is how can the replication be done for multi-core - so every
 core is replicated on the slave?

 replication does not handle new core creation. You will have to issue
 the core creation command to each slave separately.

 I went over the wiki, http://wiki.apache.org/solr/SolrReplication,
 and few questions related to that,

 1) How do we replicate solr.xml where we have list of cores? Wiki
 says, Only files in the 'conf' dir of solr instance is replicated. 
 - since, solr.xml is in the home directory how do we replicate that?
 solr.xml canot be replicated. even if you did it is not reloaded.

 2) Solrconfig.xml in slave takes a static core url,

    str 
 name=masterUrlhttp://localhost:port/solr/corename/replication/str

 put a placeholder like
 str 
 name=masterUrlhttp://localhost:port/solr/${solr.core.name}/replication/str
 so the corename is automatically replaced


 As in our case cores are created dynamically (new core created after
 the active one reaches some capacity), how can we define master core
 dynamically for replication? The only I see it is using fetchIndex
 command and passing new core info there - is it right? If so, does the
 slave application have write code to poll Master periodically and fire
 fetchIndex command, but how would Slave know the Master corename -
 as they are created dynamically on the Master?

 Thanks,
 -vivek




 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com




-- 
Lici

Problems importing HTML content contained within XML document

2009-08-19 Thread venn hardy


Hello,

I have just started trying out SOLR to index some XML documents that I receive. 
I am
using the SOLR 1.3 and its HttpDataSource in conjunction with the 
XPathEntityProcessor.

 

I am finding the data import really useful so far, but I am having a few 
problems when
I try and import HTML contained within one of the XML tags BODY. The data 
import just seems
to ignore the textContent silently but it imports everything else.

 

When I do a query through the SOLR admin interface, only the id and author 
fields are displayed.

Any ideas what I am doing wrong?

 

Thanks

 

This is what my dataConfig looks like:
dataConfig
  dataSource type=HttpDataSource /
  document
 entity name=archive pk=id 
url=http://localhost:9080/data/20090817070752.xml; 
processor=XPathEntityProcessor forEach=/document/category 
transformer=DateFormatTransformer stream=true dataSource=dataSource
 field column=id xpath=/document/category/reference /
  field column=textContent xpath=/document/category/BODY /
  field column=author xpath=/document/category/author /
 /entity
  /document
/dataConfig

 

This is how I have specified my schema
fields
   field name=id type=string indexed=true stored=true required=true 
/ 
   field name=author type=string indexed=true stored=true/
   field name=textContent type=text indexed=true stored=true /
/fields

 uniqueKeyid/uniqueKey
 defaultSearchFieldid/defaultSearchField

 

And this is what my XML document looks like:

document
 category
  reference123456/reference
  authorAuthori name/author
  BODY
  PLorem ipsum dolor sit amet, consectetur adipiscing elit.
  Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius 
varius felis ut vestibulum/P
  PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit,
  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
vestibulum/P
  PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit,
  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
vestibulum/P
  /BODY
 /category
/document

_
Looking for a place to rent, share or buy this winter? Find your next place 
with Ninemsn property
http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Edomain%2Ecom%2Eau%2F%3Fs%5Fcid%3DFDMedia%3ANineMSN%5FHotmail%5FTagline_t=774152450_r=Domain_tagline_m=EXT

Re: CorruptIndexException: Unknown format version

It looks like your solr lucene-core version doesn't match with the
lucene version used to generate the index, as Yonik said, looks like
there is a lucene library conflict.

2009/8/19 Chris Hostetter hossman_luc...@fucit.org:

 : how can that happen, it is a new index, and it is already corrupt?
 :
 : Did anybody else something like this?

 Unknown format version doesn't mean your index is corrupt .. it means
 the version of LUcnee parsing the index doesn't recognize the index format
 version ... typically it means you are trying to open an index generated
 by a newer version of lucene then the one you are using.




 -Hoss





-- 
Lici

Re: Replication over multi-core solr

2009-08-19 Thread vivek sar

Licinio,

 Please open a separate thread - as it's a different issue - and I can
respond there.

-vivek

2009/8/19 Licinio Fernández Maurelo licinio.fernan...@gmail.com:
 Hi Vivek,
 currently we want to add cores dynamically when the active one reaches
 some capacity,
 can you give me some hints to achieve such this functionality? (Just
 wondering if you have used shell-scripting or you have code some 100%
 Java based solution)

 Thx


 2009/8/19 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 On Wed, Aug 19, 2009 at 2:27 AM, vivek sarvivex...@gmail.com wrote:
 Hi,

  We use multi-core setup for Solr, where new cores are added
 dynamically to solr.xml. Only one core is active at a time. My
 question is how can the replication be done for multi-core - so every
 core is replicated on the slave?

 replication does not handle new core creation. You will have to issue
 the core creation command to each slave separately.

 I went over the wiki, http://wiki.apache.org/solr/SolrReplication,
 and few questions related to that,

 1) How do we replicate solr.xml where we have list of cores? Wiki
 says, Only files in the 'conf' dir of solr instance is replicated. 
 - since, solr.xml is in the home directory how do we replicate that?
 solr.xml canot be replicated. even if you did it is not reloaded.

 2) Solrconfig.xml in slave takes a static core url,

    str 
 name=masterUrlhttp://localhost:port/solr/corename/replication/str

 put a placeholder like
 str 
 name=masterUrlhttp://localhost:port/solr/${solr.core.name}/replication/str
 so the corename is automatically replaced


 As in our case cores are created dynamically (new core created after
 the active one reaches some capacity), how can we define master core
 dynamically for replication? The only I see it is using fetchIndex
 command and passing new core info there - is it right? If so, does the
 slave application have write code to poll Master periodically and fire
 fetchIndex command, but how would Slave know the Master corename -
 as they are created dynamically on the Master?

 Thanks,
 -vivek




 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com




 --
 Lici

Adding cores dynamically

Hi there,

currently we want to add cores dynamically when the active one reaches
some capacity,
can anyone give me some hints to achieve such this functionality? (Just
wondering if you have used shell-scripting or you have code some 100%
Java based solution)

Thx


-- 
Lici

Re: Replication over multi-core solr

Ok

2009/8/19 vivek sar vivex...@gmail.com:
 Licinio,

  Please open a separate thread - as it's a different issue - and I can
 respond there.

 -vivek

 2009/8/19 Licinio Fernández Maurelo licinio.fernan...@gmail.com:
 Hi Vivek,
 currently we want to add cores dynamically when the active one reaches
 some capacity,
 can you give me some hints to achieve such this functionality? (Just
 wondering if you have used shell-scripting or you have code some 100%
 Java based solution)

 Thx


 2009/8/19 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 On Wed, Aug 19, 2009 at 2:27 AM, vivek sarvivex...@gmail.com wrote:
 Hi,

  We use multi-core setup for Solr, where new cores are added
 dynamically to solr.xml. Only one core is active at a time. My
 question is how can the replication be done for multi-core - so every
 core is replicated on the slave?

 replication does not handle new core creation. You will have to issue
 the core creation command to each slave separately.

 I went over the wiki, http://wiki.apache.org/solr/SolrReplication,
 and few questions related to that,

 1) How do we replicate solr.xml where we have list of cores? Wiki
 says, Only files in the 'conf' dir of solr instance is replicated. 
 - since, solr.xml is in the home directory how do we replicate that?
 solr.xml canot be replicated. even if you did it is not reloaded.

 2) Solrconfig.xml in slave takes a static core url,

    str 
 name=masterUrlhttp://localhost:port/solr/corename/replication/str

 put a placeholder like
 str 
 name=masterUrlhttp://localhost:port/solr/${solr.core.name}/replication/str
 so the corename is automatically replaced


 As in our case cores are created dynamically (new core created after
 the active one reaches some capacity), how can we define master core
 dynamically for replication? The only I see it is using fetchIndex
 command and passing new core info there - is it right? If so, does the
 slave application have write code to poll Master periodically and fire
 fetchIndex command, but how would Slave know the Master corename -
 as they are created dynamically on the Master?

 Thanks,
 -vivek




 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com




 --
 Lici





-- 
Lici

Re: Spanish Stemmer

Hi, take a look at this:

!-- Tipo de campo para Textos (con stemming en español) --
fieldtype name=textTypeWithStemming class=solr.TextField
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.SnowballPorterFilterFactory language=Spanish/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.SnowballPorterFilterFactory language=Spanish/
  /analyzer
/fieldtype

Un saludo

2009/8/19 Robert Muir rcm...@gmail.com:
 hi, it looks like you might just have a simple typo:

  filter class=solr.SnowballPorterFilterFactory languange=Spanish/

 if you change it to language=Spanish it should work.


 --
 Robert Muir
 rcm...@gmail.com




-- 
Lici

Re: Problems importing HTML content contained within XML document

2009-08-19 Thread Martijn v Groningen

Hi Venn,

I think what is happening when the BODY element is being processed by
xpath expressen (/document/category/BODY), is that it does not
retrieve the text content from the P elements inside the body element.
The expression will only retrieve text content that is directly a
child of the BODY element. I do not know the xpath function(s) the
data importhandler currently supports to return the text content of a
node and all its child nodes.

Maybe the expression  /document/category/BODY/* will work.

Cheers,

Martijn

2009/8/19 venn hardy venn.ha...@hotmail.com:

 Hello,

 I have just started trying out SOLR to index some XML documents that I 
 receive. I am
 using the SOLR 1.3 and its HttpDataSource in conjunction with the 
 XPathEntityProcessor.



 I am finding the data import really useful so far, but I am having a few 
 problems when
 I try and import HTML contained within one of the XML tags BODY. The data 
 import just seems
 to ignore the textContent silently but it imports everything else.



 When I do a query through the SOLR admin interface, only the id and author 
 fields are displayed.

 Any ideas what I am doing wrong?



 Thanks



 This is what my dataConfig looks like:
 dataConfig
  dataSource type=HttpDataSource /
  document
  entity name=archive pk=id 
 url=http://localhost:9080/data/20090817070752.xml; 
 processor=XPathEntityProcessor forEach=/document/category 
 transformer=DateFormatTransformer stream=true dataSource=dataSource
         field column=id xpath=/document/category/reference /
  field column=textContent xpath=/document/category/BODY /
  field column=author xpath=/document/category/author /
  /entity
  /document
 /dataConfig



 This is how I have specified my schema
 fields
   field name=id type=string indexed=true stored=true required=true 
 /
   field name=author type=string indexed=true stored=true/
   field name=textContent type=text indexed=true stored=true /
 /fields

  uniqueKeyid/uniqueKey
  defaultSearchFieldid/defaultSearchField



 And this is what my XML document looks like:

 document
  category
  reference123456/reference
  authorAuthori name/author
  BODY
  PLorem ipsum dolor sit amet, consectetur adipiscing elit.
  Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius 
 varius felis ut vestibulum/P
  PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit,
  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
 vestibulum/P
  PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit,
  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
 vestibulum/P
  /BODY
  /category
 /document

 _
 Looking for a place to rent, share or buy this winter? Find your next place 
 with Ninemsn property
 http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Edomain%2Ecom%2Eau%2F%3Fs%5Fcid%3DFDMedia%3ANineMSN%5FHotmail%5FTagline_t=774152450_r=Domain_tagline_m=EXT

Re: Problems importing HTML content contained within XML document

try this
field column=textContent xpath=/document/category/BODY faltten=true/

this should slurp al the tags under body

On Wed, Aug 19, 2009 at 1:44 PM, venn hardyvenn.ha...@hotmail.com wrote:

 Hello,

 I have just started trying out SOLR to index some XML documents that I 
 receive. I am
 using the SOLR 1.3 and its HttpDataSource in conjunction with the 
 XPathEntityProcessor.



 I am finding the data import really useful so far, but I am having a few 
 problems when
 I try and import HTML contained within one of the XML tags BODY. The data 
 import just seems
 to ignore the textContent silently but it imports everything else.



 When I do a query through the SOLR admin interface, only the id and author 
 fields are displayed.

 Any ideas what I am doing wrong?



 Thanks



 This is what my dataConfig looks like:
 dataConfig
  dataSource type=HttpDataSource /
  document
  entity name=archive pk=id 
 url=http://localhost:9080/data/20090817070752.xml; 
 processor=XPathEntityProcessor forEach=/document/category 
 transformer=DateFormatTransformer stream=true dataSource=dataSource
         field column=id xpath=/document/category/reference /
  field column=textContent xpath=/document/category/BODY /
  field column=author xpath=/document/category/author /
  /entity
  /document
 /dataConfig



 This is how I have specified my schema
 fields
   field name=id type=string indexed=true stored=true required=true 
 /
   field name=author type=string indexed=true stored=true/
   field name=textContent type=text indexed=true stored=true /
 /fields

  uniqueKeyid/uniqueKey
  defaultSearchFieldid/defaultSearchField



 And this is what my XML document looks like:

 document
  category
  reference123456/reference
  authorAuthori name/author
  BODY
  PLorem ipsum dolor sit amet, consectetur adipiscing elit.
  Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius 
 varius felis ut vestibulum/P
  PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit,
  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
 vestibulum/P
  PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit,
  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
 vestibulum/P
  /BODY
  /category
 /document

 _
 Looking for a place to rent, share or buy this winter? Find your next place 
 with Ninemsn property
 http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Edomain%2Ecom%2Eau%2F%3Fs%5Fcid%3DFDMedia%3ANineMSN%5FHotmail%5FTagline_t=774152450_r=Domain_tagline_m=EXT



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Problems importing HTML content contained within XML document

sorry
field column=textContent xpath=/document/category/BODY flatten=true/

2009/8/19 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 try this
 field column=textContent xpath=/document/category/BODY faltten=true/

 this should slurp al the tags under body

 On Wed, Aug 19, 2009 at 1:44 PM, venn hardyvenn.ha...@hotmail.com wrote:

 Hello,

 I have just started trying out SOLR to index some XML documents that I 
 receive. I am
 using the SOLR 1.3 and its HttpDataSource in conjunction with the 
 XPathEntityProcessor.



 I am finding the data import really useful so far, but I am having a few 
 problems when
 I try and import HTML contained within one of the XML tags BODY. The data 
 import just seems
 to ignore the textContent silently but it imports everything else.



 When I do a query through the SOLR admin interface, only the id and author 
 fields are displayed.

 Any ideas what I am doing wrong?



 Thanks



 This is what my dataConfig looks like:
 dataConfig
  dataSource type=HttpDataSource /
  document
  entity name=archive pk=id 
 url=http://localhost:9080/data/20090817070752.xml; 
 processor=XPathEntityProcessor forEach=/document/category 
 transformer=DateFormatTransformer stream=true dataSource=dataSource
         field column=id xpath=/document/category/reference /
  field column=textContent xpath=/document/category/BODY /
  field column=author xpath=/document/category/author /
  /entity
  /document
 /dataConfig



 This is how I have specified my schema
 fields
   field name=id type=string indexed=true stored=true 
 required=true /
   field name=author type=string indexed=true stored=true/
   field name=textContent type=text indexed=true stored=true /
 /fields

  uniqueKeyid/uniqueKey
  defaultSearchFieldid/defaultSearchField



 And this is what my XML document looks like:

 document
  category
  reference123456/reference
  authorAuthori name/author
  BODY
  PLorem ipsum dolor sit amet, consectetur adipiscing elit.
  Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus 
 varius varius felis ut vestibulum/P
  PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem 
 elit,
  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
 vestibulum/P
  PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem 
 elit,
  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
 vestibulum/P
  /BODY
  /category
 /document

 _
 Looking for a place to rent, share or buy this winter? Find your next place 
 with Ninemsn property
 http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Edomain%2Ecom%2Eau%2F%3Fs%5Fcid%3DFDMedia%3ANineMSN%5FHotmail%5FTagline_t=774152450_r=Domain_tagline_m=EXT



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Relevant results with DisMaxRequestHandler

2009-08-19 Thread Vincent Pérès


Wow, it's like the 'mm' parameters is just appeared for the first time...
Yes, I read the doc few times, but never understood that the documents who
doesn't match any of the expressions will not be return... my apologize
everything seems more clear now thanks to the min number parameter.

Thank you,
Vincent


hossman wrote:
 
 
 : The 'qf' parameter used in the dismax seems to work with a 'AND'
 separator.
 : I have much more results without dixmax. Is there any way to keep the
 same
 : amount of document and process the 'qf' ?
 
 did you read any of the docs on dismax?
 
   http://wiki.apache.org/solr/DisMaxRequestHandler
 
 did you look at the mm param?
 
   http://wiki.apache.org/solr/DisMaxRequestHandler#mm
 
 
 -Hoss
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Relevant-results-with-DisMaxRequestHandler-tp24716870p25041314.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: JVM Heap utilization Memory leaks with Solr

2009-08-19 Thread Rahul R

Fuad,
We have around 5 million documents and around 3700 fields. All documents
will not have values for all the fields JRockit is not approved for use
within my organization. But thanks for the info anyway.

Regards
Rahul

On Tue, Aug 18, 2009 at 9:41 AM, Funtick f...@efendi.ca wrote:

BTW, you should really prefer JRockit which really rocks!!!

Mission Control has necessary toolongs; and JRockit produces _nice_
exception stacktrace (explaining almost everything) in case of even OOM
which SUN JVN still fails to produce.

SolrServlet still catches Throwable:

} catch (Throwable e) {
SolrException.log(log,e);
sendErr(500, SolrException.toStr(e), request, response);
} finally {

Rahul R wrote:

Otis,
Thank you for your response. I know there are a few variables here but
the
difference in memory utilization with and without shards somehow leads me
to
believe that the leak could be within Solr.

I tried using a profiling tool - Yourkit. The trial version was free for
15
days. But I couldn't find anything of significance.

Regards
Rahul

On Tue, Aug 4, 2009 at 7:35 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com
wrote:

Hi Rahul,

A) There are no known (to me) memory leaks.
I think there are too many variables for a person to tell you what
exactly
is happening, plus you are dealing with the JVM here. :)

Try jmap -histo:live PID-HERE | less and see what's using your memory.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR

- Original Message
From: Rahul R rahul.s...@gmail.com
To: solr-user@lucene.apache.org
Sent: Tuesday, August 4, 2009 1:09:06 AM
Subject: JVM Heap utilization Memory leaks with Solr

I am trying to track memory utilization with my Application that uses
Solr.
Details of the setup :
-3rd party Software : Solaris 10, Weblogic 10, jdk_150_14, Solr 1.3.0
- Hardware : 12 CPU, 24 GB RAM

For testing during PSR I am using a smaller subset of the actual data
that I
want to work with. Details of this smaller sub-set :
- 5 million records, 4.5 GB index size

Observations during PSR:
A) I have allocated 3.2 GB for the JVM(s) that I used. After all users
logout and doing a force GC, only 60 % of the heap is reclaimed. As
part
of
the logout process I am invalidating the HttpSession and doing a
close()
on
CoreContainer. From my application's side, I don't believe I am
holding
on
to any resource. I wanted to know if there are known issues
surrounding
memory leaks with Solr ?
B) To further test this, I tried deploying with shards. 3.2 GB was
allocated
to each JVM. All JVMs had 96 % free heap space after start up. I got
varying
results with this.
Case 1 : Used 6 weblogic domains. My application was deployed one 1
domain.
I split the 5 million index into 5 parts of 1 million each and used
them
as
shards. After multiple users used the system and doing a force GC,
around
94
- 96 % of heap was reclaimed in all the JVMs.
Case 2: Used 2 weblogic domains. My application was deployed on 1
domain.
On
the other, I deployed the entire 5 million part index as one shard.
After
multiple users used the system and doing a gorce GC, around 76 % of
the
heap
was reclaimed in the shard JVM. And 96 % was reclaimed in the JVM
where
my
application was running. This result further convinces me that my
application can be absolved of holding on to memory resources.

I am not sure how to interpret these results ? For searching, I am
using
Without Shards : EmbeddedSolrServer
With Shards :CommonsHttpSolrServer
In terms of Solr objects this is what differs in my code between
normal
search and shards search (distributed search)

After looking at Case 1, I thought that the CommonsHttpSolrServer was
more
memory efficient but Case 2 proved me wrong. Or could there still be
memory
leaks in my application ? Any thoughts, suggestions would be welcome.

Regards
Rahul

--
View this message in context:
http://www.nabble.com/JVM-Heap-utilization---Memory-leaks-with-Solr-tp24802380p25018165.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr-773 (GEO Module) question

2009-08-19 Thread johan . sjoberg

Hi,


we're glancing at the GEO search module known from the jira issue 773
(http://issues.apache.org/jira/browse/SOLR-773).


It seems to us that the issue is still open and not yet included in the
nightly builds.

Is there a release plan for the nightly builds, and is this module
considered core or contrib?



Regards,
Johan

Re: MultiCore Queries? are they possible

On Tue, Aug 18, 2009 at 5:47 PM, Ninad Raut hbase.user.ni...@gmail.comwrote:

 Hi,
 Can we create a Join query between two indexes on two cores? Is this
 possible in Solr?
 I have a index which stores author profiles and other index which stores
 content and a author id as a reference. Can I query as
 select Content,AuthorName
 from Core0,Core1
 where core0.authorid = core1.authorid and authorid=A123


No but you can always make two calls and join it yourself. However, Solr
supports multi-valued fields so it is best to de-normalize the data if you
need to show both kinds of information in one query.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Strange error with shards

On Tue, Aug 18, 2009 at 9:01 PM, ahammad ahmed.ham...@gmail.com wrote:

 HTTP Status 500 - null java.lang.NullPointerException at

 org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:437)
 at

 The way I created this shard was to copy an existing one, erasing all the
 data files/folders, and modifying my schema/data-config files. So the core
 settings are pretty much the same.


What did you modify in the schema? All the shards should have the same
schema. That exception can come if the uniqueKey is missing/null.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Passing a Cookie in SolrJ

On Tue, Aug 18, 2009 at 10:18 PM, Ramirez, Paul M (388J) 
paul.m.rami...@jpl.nasa.gov wrote:

 Hi All,

 The project I am working on is using Solr and OpenSSO (Sun's single sign on
 service). I need to write some sample code for our users that shows them how
 to query Solr and I would just like to point them to the SolrJ documentation
 but I can't see an easy way to be able to pass a cookie with the request.
 The cookie is needed to be able to get through the SSO layer but will just
 be ignored by Solr. I see that you are using Apache Commons Http Client and
 with that I would be able to write the cookie if I had access to the
 HttpMethod being used (GetMethod or PostMethod). However, I can not find an
 easy way to get access to this with SolrJ and thought I would ask before
 rewriting a simple example using only an ApacheHttpClient without the SolJ
 library. Thanks in advance for any pointers you may have.


There's no easy way I think. You can extend CommonsHttpSolrServer and
override the request method. Copy/paste the code from
CommonsHttpSolrServer#request and make the changes. It is not an elegant way
but it will work.

-- 
Regards,
Shalin Shekhar Mangar.

Re: How to boost fields with many terms against single-term?

On Wed, Aug 19, 2009 at 12:32 AM, Fuad Efendi f...@efendi.ca wrote:

 I don't want single-term docs such as home to appear in top for simple
 search for a home; I need home improvement made easy in top... How to
 implement it at query time?


If you always want home improvement made easy on top for home, see if
the QueryElevationComponent can help.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Strange error with shards

2009-08-19 Thread ahammad


Each core has a different database as a datasource, which means that they
have different DB structures and fields. That is why the schemas are
different.

I figured out the cause of this problem. You were right, it was the
uniqueKey field. All of my cores have that field set to id but for this
new core, it is set to threadID. Changing that to id fixed the problem.




Shalin Shekhar Mangar wrote:
 
 On Tue, Aug 18, 2009 at 9:01 PM, ahammad ahmed.ham...@gmail.com wrote:
 
 HTTP Status 500 - null java.lang.NullPointerException at

 org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:437)
 at

 The way I created this shard was to copy an existing one, erasing all the
 data files/folders, and modifying my schema/data-config files. So the
 core
 settings are pretty much the same.

 
 What did you modify in the schema? All the shards should have the same
 schema. That exception can come if the uniqueKey is missing/null.
 
 If all the shards should have the same schema, then what is the point of
 sharding in the first place? I thought that it was used to combine
 different cores with different index structures...Right now, every core I
 have is unique, and every schema is different...
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/Strange-error-with-shards-tp25027486p25043859.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Strange error with shards

On Wed, Aug 19, 2009 at 6:44 PM, ahammad ahmed.ham...@gmail.com wrote:


 Each core has a different database as a datasource, which means that they
 have different DB structures and fields. That is why the schemas are
 different.


  If all the shards should have the same schema, then what is the point of
  sharding in the first place? I thought that it was used to combine
  different cores with different index structures...Right now, every core I
  have is unique, and every schema is different...
 


Index is sharded when it becomes too much for one box to keep the whole
index. Distributed Search in Solr can merge these multiple indexes running
on different boxes into one result set. It is not meant for combining
different cores or different schemas. If many shards have a document with
the same uniqueKey value, any one can be returned. Typically, shards have
the same schema, with each having a disjoint subset of the complete set of
documents.

-- 
Regards,
Shalin Shekhar Mangar.

RE: JVM Heap utilization Memory leaks with Solr


Hi Rahul,

JRockit could be used at least in a test environment to monitor JVM (and
troubleshoot SOLR, licensed for-free for developers!); they have even
Eclipse plugin now, and it is licensed by Oracle (BEA)... But, of course, in
large companies test environment is in hands of testers :)


But... 3700 fields will create (over time) 3700 arrays  each of size
5,000,000!!! Even if most of fields are empty for most of documents...
Applicable to non-tokenized single-valued non-boolean fields only, Lucene
internals, FieldCache... and it won't be GC-collected after user log-off...
prefer dedicated box for SOLR.

-Fuad


-Original Message-
From: Rahul R [mailto:rahul.s...@gmail.com] 
Sent: August-19-09 6:19 AM
To: solr-user@lucene.apache.org
Subject: Re: JVM Heap utilization  Memory leaks with Solr

Fuad,
We have around 5 million documents and around 3700 fields. All documents
will not have values for all the fields JRockit is not approved for use
within my organization. But thanks for the info anyway.

Regards
Rahul

On Tue, Aug 18, 2009 at 9:41 AM, Funtick f...@efendi.ca wrote:


 BTW, you should really prefer JRockit which really rocks!!!

 Mission Control has necessary toolongs; and JRockit produces _nice_
 exception stacktrace (explaining almost everything) in case of even OOM
 which SUN JVN still fails to produce.


 SolrServlet still catches Throwable:

} catch (Throwable e) {
  SolrException.log(log,e);
  sendErr(500, SolrException.toStr(e), request, response);
} finally {





 Rahul R wrote:
 
  Otis,
  Thank you for your response. I know there are a few variables here but
 the
  difference in memory utilization with and without shards somehow leads
me
  to
  believe that the leak could be within Solr.
 
  I tried using a profiling tool - Yourkit. The trial version was free for
  15
  days. But I couldn't find anything of significance.
 
  Regards
  Rahul
 
 
  On Tue, Aug 4, 2009 at 7:35 PM, Otis Gospodnetic
  otis_gospodne...@yahoo.com
  wrote:
 
  Hi Rahul,
 
  A) There are no known (to me) memory leaks.
  I think there are too many variables for a person to tell you what
  exactly
  is happening, plus you are dealing with the JVM here. :)
 
  Try jmap -histo:live PID-HERE | less and see what's using your memory.
 
  Otis
  --
  Sematext is hiring -- http://sematext.com/about/jobs.html?mls
  Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
 
 
 
  - Original Message 
   From: Rahul R rahul.s...@gmail.com
   To: solr-user@lucene.apache.org
   Sent: Tuesday, August 4, 2009 1:09:06 AM
   Subject: JVM Heap utilization  Memory leaks with Solr
  
   I am trying to track memory utilization with my Application that uses
  Solr.
   Details of the setup :
   -3rd party Software : Solaris 10, Weblogic 10, jdk_150_14, Solr 1.3.0
   - Hardware : 12 CPU, 24 GB RAM
  
   For testing during PSR I am using a smaller subset of the actual data
  that I
   want to work with. Details of this smaller sub-set :
   - 5 million records, 4.5 GB index size
  
   Observations during PSR:
   A) I have allocated 3.2 GB for the JVM(s) that I used. After all
users
   logout and doing a force GC, only 60 % of the heap is reclaimed. As
  part
  of
   the logout process I am invalidating the HttpSession and doing a
  close()
  on
   CoreContainer. From my application's side, I don't believe I am
 holding
  on
   to any resource. I wanted to know if there are known issues
 surrounding
   memory leaks with Solr ?
   B) To further test this, I tried deploying with shards. 3.2 GB was
  allocated
   to each JVM. All JVMs had 96 % free heap space after start up. I got
  varying
   results with this.
   Case 1 : Used 6 weblogic domains. My application was deployed one 1
  domain.
   I split the 5 million index into 5 parts of 1 million each and used
  them
  as
   shards. After multiple users used the system and doing a force GC,
  around
  94
   - 96 % of heap was reclaimed in all the JVMs.
   Case 2: Used 2 weblogic domains. My application was deployed on 1
  domain.
  On
   the other, I deployed the entire 5 million part index as one shard.
  After
   multiple users used the system and doing a gorce GC, around 76 % of
 the
  heap
   was reclaimed in the shard JVM. And 96 % was reclaimed in the JVM
 where
  my
   application was running. This result further convinces me that my
   application can be absolved of holding on to memory resources.
  
   I am not sure how to interpret these results ? For searching, I am
  using
   Without Shards : EmbeddedSolrServer
   With Shards :CommonsHttpSolrServer
   In terms of Solr objects this is what differs in my code between
 normal
   search and shards search (distributed search)
  
   After looking at Case 1, I thought that the CommonsHttpSolrServer was
  more
   memory efficient but Case 2 proved me wrong. Or could there still be
  memory
   leaks in my application ? Any thoughts, suggestions would be welcome.
  
   Regards
   Rahul
 
 
 
 

 --

multi words synonyms

2009-08-19 Thread Jae Joo

Hi,

I would like to make the synonym for internal medicine to physician or
doctor. but it is not working properly. Anyone help me?

synonym.index.txt
internal medicine  = physician

synonyms.query.txt
physician, internal medicine  = physician, doctor

In the Analysis tool, I can see clearly that internal medicine is converted
to physician and doctor in index and querying times, but when actual query,
it is not converted (with debugQuery=true paprameter).

lst name=debug
str name=rawquerystringinternal medicine/str
str name=querystringinternal medicine/str
str name=parsedqueryjob:intern job:medicin/str
str name=parsedquery_toStringjob:intern job:medicin/str

It returns
doc
float name=score1.3963256/float
str name=job874878_INTERNATIONAL CONSULTANTS/str
/doc

Here is what I have in schema.xml
   analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.SynonymFilterFactory synonyms=synonyms.index.txt
ignoreCase=true expand=false/

   analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.SynonymFilterFactory synonyms=synonyms.index.txt
ignoreCase=true expand=false/

Shutdown Solr

2009-08-19 Thread Miller, Michael P.

Does anyone know a graceful way to shutdown Solr?  (other than killing
the process with Ctrl-C)

Re: Shutdown Solr

2009-08-19 Thread Tobias Brennecke

it catches the kill signal and shuts down as it should, I guess :) because
it writes stuff to the log after pressing ^c

2009/8/19 Miller, Michael P. m.mil...@radium.ncsc.mil

 Does anyone know a graceful way to shutdown Solr?  (other than killing
 the process with Ctrl-C)

Re: Data Modeling

2009-08-19 Thread Smiley, David W.

This is the sort of Solr fundamentals question my book (chapter 2) will help 
you with.

Think about what your user interface is.  What are users searching for?  That 
is, what exactly comes back from search results?  It's not clear from your 
description what your search scenario is.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/19/09 10:31 AM, Vladimir Landman v...@northernautoparts.com wrote:

Hi,

I am trying to create a schema for Solr.   Here is a relational model of what 
our data might look like:

Inventory
-
Sku
Price
Weight

Attributes
---
AttributeName
AttributeValue

Applications
--
Id (Auto-Incrementing)
Sku
VehicleYear
VehicleMake
VehicleModel
VehicleEngine

There can be multiple Application(s) records.  Also, Attributes can also have 
duplicates.  Basically I want to store basic information about our inventory, 
attributes, and applications.  If I didn't have the applications,
I would simply have:
field name=id ...
field name=sku ...
field name=price ...
field name=weight ...
!-- Attributes --
field name=OilPumpVolume ...
field name=FuelType ...

Since one part might have 3 or 4 attributes, but 100 applications, I want to 
try to avoid having 400 records, but maybe that is just what I will have to do.

I appreciate any help.
--
Vladimir Landman
Northern Auto Parts

Re: Solr-773 (GEO Module) question

2009-08-19 Thread Ryan McKinley



On Aug 19, 2009, at 6:45 AM, johan.sjob...@findwise.se wrote:


Hi,


we're glancing at the GEO search module known from the jira issue 773
(http://issues.apache.org/jira/browse/SOLR-773).


It seems to us that the issue is still open and not yet included in  
the

nightly builds.


correct



Is there a release plan for the nightly builds, and is this module
considered core or contrib?



activity on the nightly builds is winding down as we gear up for the  
1.4 release.


After 1.4 is out, I expect progress on the geo stuff.  It will be in  
contrib (not core) and will likely be marked experimental for a  
while.  That is, stuff will be added without the expectation that the  
interfaces will be set in stone.


best
ryan

RE: Shutdown Solr

catalina.sh stop


But SolrServlet catches everything and forgets to implement destroy()!

I am absolutely unsure about Ctrl-C and even have many concerns regarding
catalina.sh stop... J2EE/JEE does not specify any support for threads
outside than container-managed... 

I hope SolrServlet closes Lucene index (and other resources) and everything
follows Servlet specs... but I can't find dummies' method _destroy()_ in
SolrServlet!!! It shold gracefully close Lucene index and other resources.

WHY?


-Original Message-
From: Tobias Brennecke [mailto:t.bu...@gmail.com] 
Sent: August-19-09 11:39 AM
To: solr-user@lucene.apache.org
Subject: Re: Shutdown Solr

it catches the kill signal and shuts down as it should, I guess :) because
it writes stuff to the log after pressing ^c

2009/8/19 Miller, Michael P. m.mil...@radium.ncsc.mil

 Does anyone know a graceful way to shutdown Solr?  (other than killing
 the process with Ctrl-C)

RE: Shutdown Solr

Most probably Ctrl-C is graceful for Tomcat, and kill -9 too... Tomcat is
smart... I prefer /etc/init.d/my_tomcat wrapper around catalina.sh (su
tomcat, /var/lock etc...) - ok then, Graceful Shutdown depends on how you
started Tomcat.

strange sorting results: each word in field is sorted

2009-08-19 Thread Paul Rosen

I'm trying to sort, but I am not always getting the correct results and 
I'm not sure where to start tracking down the problem.


You can see the problem here (at least until it's fixed!): 
http://nines.performantsoftware.com/search/saved?user=paulname=poem


If you sort by Title/Ascending, you get partially sorted results, but it 
seems to be using a random word to sort on instead of sorting on the 
entire title.


Page one starts good with:

(blank)
Adieu
Advertisement
Afterwards
etc

but by page 6 it starts to break down:

Elizabeth Barrett Browning
Albert and Elweena
Emerson and Bacon
etc...
Errata
Anne Bannerman: Biographical Essay
Aboringines (Estonia)
etc...

I notice in the above list that there is SOME word that is sorted, just 
not the first one. (In fact, it seems to be the word that appears 
greatest in the sort order.)


Then at the end, for instance page 336, it sorts some titles with 
diacritical marks:


Roman à Clef
The Forgotten Reaping-Hook: Sex in My Ántonia
Social (Re)Visioning in the Fields of My Ántonia
etc...

I'm not sure what info would be useful to help debug. In my schema.xml 
file, I've clipped what seems to be the relevant part:


fieldtype name=text_lu class=solr.TextField positionIncrementGap=100
  analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldtype

field name=title type=text_lu indexed=true stored=true 
multiValued=true/


Thanks,
Paul

Re: Shutdown Solr

2009-08-19 Thread Paul Tomblin

On Wed, Aug 19, 2009 at 2:43 PM, Fuad Efendif...@efendi.ca wrote:
 Most probably Ctrl-C is graceful for Tomcat, and kill -9 too... Tomcat is
 smart... I prefer /etc/init.d/my_tomcat wrapper around catalina.sh (su
 tomcat, /var/lock etc...) - ok then, Graceful Shutdown depends on how you
 started Tomcat.

*No* application is graceful for kill -9.  The whole point of kill
-9 is that it's uncatchable.


-- 
http://www.linkedin.com/in/paultomblin

Re: strange sorting results: each word in field is sorted

2009-08-19 Thread Erik Hatcher



On Aug 19, 2009, at 2:45 PM, Paul Rosen wrote:

You can see the problem here (at least until it's fixed!): 
http://nines.performantsoftware.com/search/saved?user=paulname=poem


Hi Paul - that project looks familiar!  :)

If you sort by Title/Ascending, you get partially sorted results,  
but it seems to be using a random word to sort on instead of sorting  
on the entire title.


I'm not sure what info would be useful to help debug. In my  
schema.xml file, I've clipped what seems to be the relevant part:


fieldtype name=text_lu class=solr.TextField  
positionIncrementGap=100

 analyzer
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.StandardFilterFactory/
   filter class=solr.LowerCaseFilterFactory/
 /analyzer
/fieldtype

field name=title type=text_lu indexed=true stored=true  
multiValued=true/


I'm surprised you're not seeing an exception when trying to sort on  
title given this configuration.  Sorting must be done on single valued  
indexed fields, that have at most a single term indexed per document.   
I recommend you use copyField to copy title to title_sort and  
configure a title_sort field as a string or a field type that  
analyzes only to a single term (like simply keyword tokenizing -  
lower case filter.


Erik

RE: Shutdown Solr

Thanks... kill should be / can be graceful; kill -9 should kill
immediately... no any hang, whole point...

http://www.nabble.com/Is-kill--9-safe-or-not--td24866506.html




-Original Message-
From: ptomb...@gmail.com [mailto:ptomb...@gmail.com] On Behalf Of Paul
Tomblin
Sent: August-19-09 2:49 PM
To: solr-user@lucene.apache.org
Subject: Re: Shutdown Solr

On Wed, Aug 19, 2009 at 2:43 PM, Fuad Efendif...@efendi.ca wrote:
 Most probably Ctrl-C is graceful for Tomcat, and kill -9 too... Tomcat is
 smart... I prefer /etc/init.d/my_tomcat wrapper around catalina.sh (su
 tomcat, /var/lock etc...) - ok then, Graceful Shutdown depends on how you
 started Tomcat.

*No* application is graceful for kill -9.  The whole point of kill
-9 is that it's uncatchable.


-- 
http://www.linkedin.com/in/paultomblin

WordDelimiterFilter = MultiPhraseQuery?

2009-08-19 Thread jOhn

My issue is with the use of WordDelimiterFilter and how the QueryParser
(Dismax) converts the query into a MultiPhraseQuery.

This is on solr 1.3 / lucene 2.4.1.

For example:

1. yuma - 3:10 to Yuma
2. yUma - no results

For #2 it gets split into y + uma and becomes a MultiPhraseQuery requiring
both terms thus no results vs. requiring either one with a preference on
both (or a preference on joining the terms or at least an OR query).

1. joker-man - Joker-Man Goes For Gold
2. joKerman - no results
3. jo-kerman - no results

1. prom night - Prom Night
2. PromNight - Prom Night
3. promnight - no results
4. pRomnIght - no results

Is there a way to configure this behavior.  I need to support all the above
use-cases.

I have a brute force solution using a copyField and a
non-WordDelimiterFilter analyzer (whitespacetoken, lowercase, patternreplace
punctuation, edgengram) and basically drop into solrconfig.xml a 2nd field
for this (titleNameSubstring2).  Those two combined is pretty much what I
need, but that costs a memory hit + performance hit whereas some tuning to
avoid MultiPhraseQuery would be a better fit.

Here are the schema.xml + solrconfig.xml bits that are not working.

[schema.xml]

fieldType name=textSubstring class=solr.TextField
positionIncrementGap=100 omitNorms=true
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.ISOLatin1AccentFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
filter class=solr.EdgeNGramFilterFactory minGramSize=2
maxGramSize=12/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.ISOLatin1AccentFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

[solrconfig.xml]

requestHandler name=stuff_title class=solr.SearchHandler 
lst name=invariants
str name=defTypedismax/str
str name=echoParamsexplicit/str
str name=sortscore desc/str
str name=qf
titleNameSubstring^200.0
/str
str name=pf
titleNameSubstring^2.0
/str
str name=bf
product(releaseYear,0.1)
/str
str name=mm1/str
/lst
lst name=appends
str name=fqsearchable:true/str
/lst
/requestHandler

Any ideas?

-netcam

Re: strange sorting results: each word in field is sorted

2009-08-19 Thread Paul Rosen


Erik Hatcher wrote:


On Aug 19, 2009, at 2:45 PM, Paul Rosen wrote:
You can see the problem here (at least until it's fixed!): 
http://nines.performantsoftware.com/search/saved?user=paulname=poem


Hi Paul - that project looks familiar!  :)


Hi Erik! I should hope so! And I've gone a year without having to delve 
into solr much since it has just plain worked.


Thanks for the speedy reply.

I'm surprised you're not seeing an exception when trying to sort on 
title given this configuration.  Sorting must be done on single valued 
indexed fields, that have at most a single term indexed per document.  I 
recommend you use copyField to copy title to title_sort and configure a 
title_sort field as a string or a field type that analyzes only to a 
single term (like simply keyword tokenizing - lower case filter.


Erik


I want to double check this (since you probably remember how long it 
takes to recreate the indexes). I think you're saying to add these two 
lines, then re-index:


field name=title_sort type=string indexed=true stored=true/
copyField source=title dest=title_sort/

Now, this is case-sensitive, right? So would this make it case-insensitive?

fieldtype name=sort_stringclass=solr.StrField sortMissingLast=true
  analyzer
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldtype
field name=title_sort type=sort_string indexed=true stored=true/
copyField source=title dest=title_sort/

Also, I'm guessing from seeing the current results that this wouldn't 
collate the characters with diacritical marks correctly. Is there a way 
to indicate that, for instance, A-grave would sort next to A?


And, while I'm on the subject, I have to do the same thing with the 
Author field, but unfortunately, that is sometimes First Last and 
sometimes Last, First. Is there any way to sort those by last name, or 
do I just have to encourage the index people to be more consistent?


I can think of a fairly simple algorithm, but am not sure where to 
implement it:


- if the word and or  appears, just look at the left side of the 
field (in other words, sort by the first name that appears.)
- if there is a comma, but it is part of , jr. or some other common 
suffixes like that, ignore it.
- otherwise, if there is no comma, sort by the last word, unless it is 
jr, sr, III, etc., then sort by the word before that.

- otherwise, sort by the first word.

That would get most of the cases.

Thanks,
Paul

FW: Data Modeling

2009-08-19 Thread Vladimir Landman

I hit reply and sent this to just David, but I think it should go to the whole 
list:

Hi David,

I want to do 2 kinds of things with Solr  Maybe 3 in the future

1. I want to use  it on our website so that a customer can filter down products 
by different attributes.  So suppose we have:

Inventory
---
ABC, 10
DEF, 15
s
Attributes

ABC,Brand,ACME Brand
ABC,Water Pump Style,Short
DEF,Brand,Engine Builders
DEF,Water Pump Style, Long


Vehicle Applicatins
ABC, 1999, Toyota, Camry, 3.1L
ABC, 2000, Toyota, Camry, 3.1L
DEF, 1997, Ford, Focus, 2.5L
DEF, 1998, Ford, Focus, 2.5L

I would like to be able to handle two things:

1. Give the person a list of all the unique years.  When they pick one, show 
them all the Makes for that year.  When they pick that, show all the Models.

Alternatively:
1. Give them a list of makes, then models, then engine, etc...

Also, it would be nice to if I could give Solr a Part#(Sku) and have it get all 
the attributes for that sku, alternatively, I'd love to be able to drill-down 
by the attributes such as Brand, Water Pump Style, etc.

Please let me know if this email is still not clear...



--
Vladimir Landman
Northern Auto Parts
 

From: Smiley, David W. [mailto:dsmi...@mitre.org] 
Sent: 2009-08-19 10:42 AM
To: solr; Vladimir Landman
Subject: Re: Data Modeling

This is the sort of Solr fundamentals question my book (chapter 2) will help 
you with.

Think about what your user interface is.  What are users searching for?  That 
is, what exactly comes back from search results?  It's not clear from your 
description what your search scenario is.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/19/09 10:31 AM, Vladimir Landman v...@northernautoparts.com wrote:
Hi,

I am trying to create a schema for Solr.   Here is a relational model of what 
our data might look like:

Inventory
-
Sku
Price
Weight

Attributes
---
AttributeName
AttributeValue

Applications
--
Id (Auto-Incrementing)
Sku
VehicleYear
VehicleMake
VehicleModel
VehicleEngine

There can be multiple Application(s) records.  Also, Attributes can also have 
duplicates.  Basically I want to store basic information about our inventory, 
attributes, and applications.  If I didn't have the applications,
I would simply have:
field name=id ...
field name=sku ...
field name=price ...
field name=weight ...
!-- Attributes --
field name=OilPumpVolume ...
field name=FuelType ...

Since one part might have 3 or 4 attributes, but 100 applications, I want to 
try to avoid having 400 records, but maybe that is just what I will have to do.

I appreciate any help.
--
Vladimir Landman
Northern Auto Parts

Re: Adding cores dynamically

2009-08-19 Thread vivek sar

Lici,

  We're doing similar thing with multi-core - when a core reaches
capacity (in our case 200 million records) we start a new core. We are
doing this via web service call (Create web service),

  http://wiki.apache.org/solr/CoreAdmin

This is all done in java code - before writing we check the number of
records in core - if reached it's capacity we create a new core and
then index there.

-vivek



2009/8/19 Licinio Fernández Maurelo licinio.fernan...@gmail.com:
 Hi there,

 currently we want to add cores dynamically when the active one reaches
 some capacity,
 can anyone give me some hints to achieve such this functionality? (Just
 wondering if you have used shell-scripting or you have code some 100%
 Java based solution)

 Thx


 --
 Lici

Re: strange sorting results: each word in field is sorted

2009-08-19 Thread Erik Hatcher



On Aug 19, 2009, at 3:50 PM, Paul Rosen wrote:
I'm surprised you're not seeing an exception when trying to sort on  
title given this configuration.  Sorting must be done on single  
valued indexed fields, that have at most a single term indexed per  
document.  I recommend you use copyField to copy title to  
title_sort and configure a title_sort field as a string or a  
field type that analyzes only to a single term (like simply keyword  
tokenizing - lower case filter.

   Erik


I want to double check this (since you probably remember how long it  
takes to recreate the indexes). I think you're saying to add these  
two lines, then re-index:


field name=title_sort type=string indexed=true stored=true/
copyField source=title dest=title_sort/


For the simplest case, yes.  You do have to be careful the sort field  
is not multiValued - and I believe the NINES model allowed for  
multiple titles.  So it might be necessary for your indexing client to  
specify the single sort field value instead of leveraging copyField.


Now, this is case-sensitive, right? So would this make it case- 
insensitive?


Yes, the above would be case sensitive.

fieldtype name=sort_stringclass=solr.StrField  
sortMissingLast=true

 analyzer
   filter class=solr.LowerCaseFilterFactory/
 /analyzer
/fieldtype
field name=title_sort type=sort_string indexed=true  
stored=true/

copyField source=title dest=title_sort/


That analyzer definition isn't quite right - you must have at least  
a tokenizer.  The KeywordTokenizer tokenizes the entire string into  
a single token, though.  In Solr's example schema there is a field  
type like this:


fieldType name=alphaOnlySort class=solr.TextField  
sortMissingLast=true omitNorms=true

  analyzer
!-- KeywordTokenizer does no actual tokenizing, so the entire
 input string is preserved as a single token
  --
tokenizer class=solr.KeywordTokenizerFactory/
!-- The LowerCase TokenFilter does what you expect, which  
can be

 when you want your sorting to be case insensitive
  --
filter class=solr.LowerCaseFilterFactory /
!-- The TrimFilter removes any leading or trailing  
whitespace --

filter class=solr.TrimFilterFactory /
!-- The PatternReplaceFilter gives you the flexibility to use
 Java Regular expression to replace any sequence of  
characters

 matching a pattern with an arbitrary replacement string,
 which may include back references to portions of the  
original

 string matched by the pattern.

 See the Java Regular Expression documentation for more
 information on pattern and replacement string syntax.

 
http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/package-summary.html
  --
filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z]) replacement= replace=all
/
  /analyzer
/fieldType

Also, I'm guessing from seeing the current results that this  
wouldn't collate the characters with diacritical marks correctly. Is  
there a way to indicate that, for instance, A-grave would sort next  
to A?


Yes, you can incorporate the diacritic normalizing filter into the  
analyzer definition above.  AsciiFoldingFilter or the ISO Latin1 one.


And, while I'm on the subject, I have to do the same thing with the  
Author field, but unfortunately, that is sometimes First Last and  
sometimes Last, First. Is there any way to sort those by last  
name, or do I just have to encourage the index people to be more  
consistent?


Good luck with getting consistency in your domain!  :)

But it certainly makes sense to request that from the data providers,  
in at least some form that can be turned into the sortable value.


I can think of a fairly simple algorithm, but am not sure where to  
implement it:


- if the word and or  appears, just look at the left side of  
the field (in other words, sort by the first name that appears.)
- if there is a comma, but it is part of , jr. or some other  
common suffixes like that, ignore it.
- otherwise, if there is no comma, sort by the last word, unless it  
is jr, sr, III, etc., then sort by the word before that.

- otherwise, sort by the first word.


Probably best to implement that in the indexing client code, but  
simple transformations could be implemented using the  
PatternReplaceFilter like above.


Erik

Re: Passing a Cookie in SolrJ

2009-08-19 Thread Lance Norskog

SolrJ uses the Apache Commons HTTP client. This describes the authentication
system:
http://hc.apache.org/httpclient-3.x/authentication.html

http://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/auth/package-frame.html
*This has code to use authentication*

https://issues.apache.org/jira/browse/SOLR-1238

You might be able to find an openSSO implementation for this. Or hack up a
simple one.

On Wed, Aug 19, 2009 at 5:48 AM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:

On Tue, Aug 18, 2009 at 10:18 PM, Ramirez, Paul M (388J)
paul.m.rami...@jpl.nasa.gov wrote:

Hi All,

The project I am working on is using Solr and OpenSSO (Sun's single sign
on
service). I need to write some sample code for our users that shows them
how
to query Solr and I would just like to point them to the SolrJ
documentation
but I can't see an easy way to be able to pass a cookie with the request.
The cookie is needed to be able to get through the SSO layer but will
just
be ignored by Solr. I see that you are using Apache Commons Http Client
and
with that I would be able to write the cookie if I had access to the
HttpMethod being used (GetMethod or PostMethod). However, I can not find
an
easy way to get access to this with SolrJ and thought I would ask before
rewriting a simple example using only an ApacheHttpClient without the
SolJ
library. Thanks in advance for any pointers you may have.

There's no easy way I think. You can extend CommonsHttpSolrServer and
override the request method. Copy/paste the code from
CommonsHttpSolrServer#request and make the changes. It is not an elegant
way
but it will work.

--
Regards,
Shalin Shekhar Mangar.

--
Lance Norskog
goks...@gmail.com

Re: Shutdown Solr

2009-08-19 Thread Lance Norskog

In production systems I have done a three-stage technique. First, use the
container's standard shutdown tool. Tomcat, JBoss, Jetty all have their
own. Then, sleep for maybe 60 seconds. Then do kill, sleep more, then 'kill
-9'.
On Wed, Aug 19, 2009 at 12:21 PM, Fuad Efendi f...@efendi.ca wrote:

 Thanks... kill should be / can be graceful; kill -9 should kill
 immediately... no any hang, whole point...

 http://www.nabble.com/Is-kill--9-safe-or-not--td24866506.html




 -Original Message-
 From: ptomb...@gmail.com [mailto:ptomb...@gmail.com] On Behalf Of Paul
 Tomblin
 Sent: August-19-09 2:49 PM
 To: solr-user@lucene.apache.org
  Subject: Re: Shutdown Solr

 On Wed, Aug 19, 2009 at 2:43 PM, Fuad Efendif...@efendi.ca wrote:
  Most probably Ctrl-C is graceful for Tomcat, and kill -9 too... Tomcat is
  smart... I prefer /etc/init.d/my_tomcat wrapper around catalina.sh (su
  tomcat, /var/lock etc...) - ok then, Graceful Shutdown depends on how
 you
  started Tomcat.

 *No* application is graceful for kill -9.  The whole point of kill
 -9 is that it's uncatchable.


 --
 http://www.linkedin.com/in/paultomblin





-- 
Lance Norskog
goks...@gmail.com

Re: DataImportHandler ignoring most rows

2009-08-19 Thread Lance Norskog

It usually helps to make a database view of your query, and then load the
DIH from that view. There are cases where some query syntaxes are mangled on
the way to the DB.

2009/8/18 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 this comment says that
   str name=Total Rows Fetched7/str

 the query fetched only 7 rows. If possible open a tool and just run
 the same query and see how many rows are returned

 On Wed, Aug 19, 2009 at 3:46 AM, Erik Earleerikea...@yahoo.com wrote:
  Using:
  - apache-solr-1.3.0
  - java 1.6
  - tomcat 6
  - sql server 2005 w/ JSQLConnect 4.0 driver
 
  I have a group table with 3007 rows.  I have confirmed the key is
  unique with select distinct id from group  and it returns 3007.  When i
 re-index using http://host:port/solr/dataimport?command=full-import  I
 only get 7 records indexed.  Any insight into what is going on would be
 really great.
 
  A partial response:
 lst name=statusMessages
 str name=Total Requests made to DataSource1/str
 str name=Total Rows Fetched7/str
 str name=Total Documents Skipped0/str
 
 
  I have other entities that index all the rows without issue.
 
  There are no errors in the logs.
 
  I am not using any Transformers (and most of my config is not changed
 from install)
 
  My schema.xml contains:
 
  uniqueKeykey/uniqueKey
 
  and field defs (not a full list of fields):
field name=key type=string indexed=true stored=true
 required=true /
field name=class type=string indexed=true stored=true
 required=true /
field name=id type=string indexed=true stored=true /
field name=description type=text indexed=true stored=true /
field name=created type=date indexed=true stored=true /
field name=updated type=date indexed=true stored=true /
 
  data-config.xml
  
  dataConfig
 !--
 jdbc:JSQLConnect://se-eriearle-lt1/database=SocialSite2/user=SocialSite2/logfile=DB_TRACE.log
 --
 dataSource type=JdbcDataSource
 driver=com.jnetdirect.jsql.JSQLDriver
 
  
 url=jdbc:JSQLConnect://se-eriearle-lt1/database=SocialSite2/user=SocialSite2
 user=SocialSite2
 password=SocialSite2 /
 
 document
 
 
 
 entity name=Group  pk=key
 query=select 'group.'+id as 'key', 'group' as 'class', name,
 handle, description, created, updated from group order by created asc
 /entity
 
 entity name=Message pk=key
 query=...redacted...
 /entity
 
 /document
  /dataConfig
 
 
 
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com




-- 
Lance Norskog
goks...@gmail.com

Re: DataImportHandler ignoring most rows

2009-08-19 Thread erikea...@yahoo.com

I switched to the ms driver an now all is well.  Must be an  
incompatibility with the JSQLConnect driver.

Sent from my iPhone

On Aug 18, 2009, at 11:47 PM, Noble Paul നോബിള്‍  नो 
ब्ळ् noble.p...@corp.aol.com wrote:

 this comment says that
   str name=Total Rows Fetched7/str

 the query fetched only 7 rows. If possible open a tool and just run
 the same query and see how many rows are returned

 On Wed, Aug 19, 2009 at 3:46 AM, Erik Earleerikea...@yahoo.com  
 wrote:
 Using:
 - apache-solr-1.3.0
 - java 1.6
 - tomcat 6
 - sql server 2005 w/ JSQLConnect 4.0 driver

 I have a group table with 3007 rows.  I have confirmed the key is
 unique with select distinct id from group  and it returns 3007.   
 When i re-index using http://host:port/solr/dataimport?command=full-import 
   I only get 7 records indexed.  Any insight into what is going on  
 would be really great.

 A partial response:
lst name=statusMessages
str name=Total Requests made to DataSource1/str
str name=Total Rows Fetched7/str
str name=Total Documents Skipped0/str


 I have other entities that index all the rows without issue.

 There are no errors in the logs.

 I am not using any Transformers (and most of my config is not  
 changed from install)

 My schema.xml contains:

 uniqueKeykey/uniqueKey

 and field defs (not a full list of fields):
   field name=key type=string indexed=true stored=true  
 required=true /
   field name=class type=string indexed=true stored=true  
 required=true /
   field name=id type=string indexed=true stored=true /
   field name=description type=text indexed=true  
 stored=true /
   field name=created type=date indexed=true stored=true /
   field name=updated type=date indexed=true stored=true /

 data-config.xml
 
 dataConfig
!-- jdbc:JSQLConnect://se-eriearle-lt1/database=SocialSite2/ 
 user=SocialSite2/logfile=DB_TRACE.log --
dataSource type=JdbcDataSource
driver=com.jnetdirect.jsql.JSQLDriver
url=jdbc:JSQLConnect://se-eriearle-lt1/database=SocialSite2/ 
 user=SocialSite2
user=SocialSite2
password=SocialSite2 /

document



entity name=Group  pk=key
query=select 'group.'+id as 'key', 'group' as 'class',  
 name, handle, description, created, updated from group order by  
 created asc
/entity

entity name=Message pk=key
query=...redacted...
/entity

/document
 /dataConfig







 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com

RE: Data Modeling

2009-08-19 Thread Smiley, David W.

It's getting clearer Vladimir.  So fundamentally your users are searching for 
products (apparently auto parts) and the different attributes would become 
navigation filters.  If this is right, then your initial schema (the first 
email) is a start, although it's a little ambigous to interpert it because id 
and sku are over-loaded.  Your schema would contain a part id, the part's 
sku, and for each attribute you mentioned it would have a field as well.  I 
recommend using Solr's dynamic fields to define those so that you don't have to 
explicitly define every attribute you'll ever think of for every part 
explicitly in the schema.   The word application was totally throwing me but 
now I believe you mean to say that this is a vehicle, and an auto part is going 
to work on multiple vehicles.  In Solr, you're going to denormalize this 
related data by inlining the auto information (aka application) into the each 
document which is an auto part. ...

I think you have a couple approaches on that.

Firstly, I observe that when I'm shopping for autos or for auto parts, I am 
guided through a user interface to pick my precise vehicle.  THEN I see related 
products.  This is straight forward -- you would not use Solr; put this 
information in your database and build an easy app to navigate to a specific 
vehicle to get the vehicle identifier.  You *could* use Solr for this but it'd 
be in a separate index/core or you would have to use multiple document types in 
your schema (my book has more info on these approaches).  So once you have the 
vehicle identifier, you would look up documents in Solr (aka auto parts) that 
have have this vehicle identifier.  It's be a multi-valued untokenized field 
and this would be the only vehicle info needed in your schema.

The other approach would be necessary to dynamically filter a list of parts by 
*partial* vehicle choices like picking Porsche and 2001 would give you 
parts that will work on a Boxster and a Carerra made in 2001.  Doing this 
correctly is tricky for solr and it's non-relational schema because there are 
multiple vehicle attributes and an auto part is associated to multiple 
vehicles.  I'll advise more if you need to do this but hopefully you won't need 
to.  It's a bit advanced and complicated.

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server

From: Vladimir Landman [v...@northernautoparts.com]
Sent: Wednesday, August 19, 2009 4:01 PM
To: solr-user@lucene.apache.org
Subject: FW: Data Modeling

I hit reply and sent this to just David, but I think it should go to the whole 
list:

Hi David,

I want to do 2 kinds of things with Solr  Maybe 3 in the future

1. I want to use  it on our website so that a customer can filter down products 
by different attributes.  So suppose we have:

Inventory
---
ABC, 10
DEF, 15
s
Attributes

ABC,Brand,ACME Brand
ABC,Water Pump Style,Short
DEF,Brand,Engine Builders
DEF,Water Pump Style, Long


Vehicle Applicatins
ABC, 1999, Toyota, Camry, 3.1L
ABC, 2000, Toyota, Camry, 3.1L
DEF, 1997, Ford, Focus, 2.5L
DEF, 1998, Ford, Focus, 2.5L

I would like to be able to handle two things:

1. Give the person a list of all the unique years.  When they pick one, show 
them all the Makes for that year.  When they pick that, show all the Models.

Alternatively:
1. Give them a list of makes, then models, then engine, etc...

Also, it would be nice to if I could give Solr a Part#(Sku) and have it get all 
the attributes for that sku, alternatively, I'd love to be able to drill-down 
by the attributes such as Brand, Water Pump Style, etc.

Please let me know if this email is still not clear...



--
Vladimir Landman
Northern Auto Parts


From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: 2009-08-19 10:42 AM
To: solr; Vladimir Landman
Subject: Re: Data Modeling

This is the sort of Solr fundamentals question my book (chapter 2) will help 
you with.

Think about what your user interface is.  What are users searching for?  That 
is, what exactly comes back from search results?  It's not clear from your 
description what your search scenario is.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/19/09 10:31 AM, Vladimir Landman v...@northernautoparts.com wrote:
Hi,

I am trying to create a schema for Solr.   Here is a relational model of what 
our data might look like:

Inventory
-
Sku
Price
Weight

Attributes
---
AttributeName
AttributeValue

Applications
--
Id (Auto-Incrementing)
Sku
VehicleYear
VehicleMake
VehicleModel
VehicleEngine

There can be multiple Application(s) records.  Also, Attributes can also have 
duplicates.  Basically I want to store basic information about our inventory, 
attributes, and applications.  If I didn't have the applications,
I would simply have:
field name=id ...
field name=sku ...
field name=price

Re: dynamic changes to schema

2009-08-19 Thread Marco Westermann


Hi, thanks for your answers, I think I have to go more in deatail.

we are talking about a shop-application which have products I want to 
search for. This products normally have the standard attributes like 
sku, a name, a price and so on. But the user can add attributes to the 
product. So for example if he sells books, he could add the author as 
attribute. Lets say he name this field my_author (but he is free to name 
it as he wants) and he tells this field over  the configuration, that it 
is searchable. So I need a field in solr for the author. Cause I cant 
restrict the user to prefix every field with something like my_ dynamic 
fields doesn't work, do they?


best,
Marco

Constantijn Visinescu schrieb:

huh? I think I lost you :)
You want to use a multivalued field to list what dynamic fields you have in
your document?

Also if you program your application correctly you should be able to
restrict your users from doing anything you please (or don't please in this
case).


On Tue, Aug 18, 2009 at 11:38 PM, Marco Westermann m...@intersales.de wrote:

  

hi,

thanks for the advise but the problem with dynamic fields is, that i cannot
restrict how the user calls the field in the application. So there isn't a
pattern I can use. But I thought about using mulitvalued fields for the
dynamically added fields. Good Idea?

thanks,
Marco

Constantijn Visinescu schrieb:



use a dynamic field ?

On Tue, Aug 18, 2009 at 5:09 PM, Marco Westermann m...@intersales.de
wrote:



  

Hi there,

is there a possibility to change the solr-schema over php dynamically.
The
web-application I want to index at the moment has the feature to add
fields
to entitys and you can tell this fields that they are searchable. To
realize
this with solr the schema has to change when a searchable field is added
or
removed.

Any suggestions,

Thanks a lot,

Marco Westermann

--
++ Business-Software aus einer Hand ++
++ Internet, Warenwirtschaft, Linux, Virtualisierung ++
http://www.intersales.de
http://www.eisxen.org
http://www.tarantella-partner.de
http://www.medisales.de
http://www.eisfair.net

interSales AG Internet Commerce
Subbelrather Str. 247
50825 Köln

Tel  02 21 - 27 90 50
Fax  02 21 - 27 90 517
Mail i...@intersales.de
Mail m...@intersales.de
Web  www.intersales.de

Handelsregister Köln HR B 30904
Ust.-Id.: DE199672015
Finanzamt Köln-Nord. UstID: nicht vergeben
Aufsichtsratsvorsitzender: Michael Morgenstern
Vorstand: Andrej Radonic, Peter Zander






__ Hinweis von ESET NOD32 Antivirus, Signaturdatenbank-Version
4346 (20090818) __

E-Mail wurde geprüft mit ESET NOD32 Antivirus.

http://www.eset.com




  

--
++ Business-Software aus einer Hand ++
++ Internet, Warenwirtschaft, Linux, Virtualisierung ++
http://www.intersales.de
http://www.eisxen.org
http://www.tarantella-partner.de
http://www.medisales.de
http://www.eisfair.net

interSales AG Internet Commerce
Subbelrather Str. 247
50825 Köln

Tel  02 21 - 27 90 50
Fax  02 21 - 27 90 517
Mail i...@intersales.de
Mail m...@intersales.de
Web  www.intersales.de

Handelsregister Köln HR B 30904
Ust.-Id.: DE199672015
Finanzamt Köln-Nord. UstID: nicht vergeben
Aufsichtsratsvorsitzender: Michael Morgenstern
Vorstand: Andrej Radonic, Peter Zander






__ Hinweis von ESET NOD32 Antivirus, Signaturdatenbank-Version 4346 
(20090818) __

E-Mail wurde geprüft mit ESET NOD32 Antivirus.

http://www.eset.com


  



--
++ Business-Software aus einer Hand ++
++ Internet, Warenwirtschaft, Linux, Virtualisierung ++
http://www.intersales.de
http://www.eisxen.org
http://www.tarantella-partner.de
http://www.medisales.de
http://www.eisfair.net

interSales AG Internet Commerce
Subbelrather Str. 247
50825 Köln

Tel  02 21 - 27 90 50
Fax  02 21 - 27 90 517
Mail i...@intersales.de
Mail m...@intersales.de
Web  www.intersales.de

Handelsregister Köln HR B 30904
Ust.-Id.: DE199672015
Finanzamt Köln-Nord. UstID: nicht vergeben
Aufsichtsratsvorsitzender: Michael Morgenstern
Vorstand: Andrej Radonic, Peter Zander

Re: dynamic changes to schema

2009-08-19 Thread Erik Hatcher

However, you can have a dynamic * field mapping that catches all  
field names that aren't already defined - though all of the fields  
will be the same field type.


Erik



On Aug 19, 2009, at 5:48 PM, Marco Westermann wrote:


Hi, thanks for your answers, I think I have to go more in deatail.

we are talking about a shop-application which have products I want  
to search for. This products normally have the standard attributes  
like sku, a name, a price and so on. But the user can add attributes  
to the product. So for example if he sells books, he could add the  
author as attribute. Lets say he name this field my_author (but he  
is free to name it as he wants) and he tells this field over  the  
configuration, that it is searchable. So I need a field in solr for  
the author. Cause I cant restrict the user to prefix every field  
with something like my_ dynamic fields doesn't work, do they?


best,
Marco

Constantijn Visinescu schrieb:

huh? I think I lost you :)
You want to use a multivalued field to list what dynamic fields you  
have in

your document?

Also if you program your application correctly you should be able to
restrict your users from doing anything you please (or don't please  
in this

case).


On Tue, Aug 18, 2009 at 11:38 PM, Marco Westermann  
m...@intersales.de wrote:




hi,

thanks for the advise but the problem with dynamic fields is, that  
i cannot
restrict how the user calls the field in the application. So there  
isn't a
pattern I can use. But I thought about using mulitvalued fields  
for the

dynamically added fields. Good Idea?

thanks,
Marco

Constantijn Visinescu schrieb:



use a dynamic field ?

On Tue, Aug 18, 2009 at 5:09 PM, Marco Westermann  
m...@intersales.de

wrote:





Hi there,

is there a possibility to change the solr-schema over php  
dynamically.

The
web-application I want to index at the moment has the feature to  
add

fields
to entitys and you can tell this fields that they are  
searchable. To

realize
this with solr the schema has to change when a searchable field  
is added

or
removed.

Any suggestions,

Thanks a lot,

Marco Westermann

【solr DIH】A problem about solr delta-imports

2009-08-19 Thread huenzhao


Hi all,

There is a problem when I use solr delta-imports to update the index. I have
added the last_modified column in the table. After I use the full-import
command to index the database data, the dataimport.properties file
contains nothing, and when I use the delta-import command to update index,
the solr list all the data in database not the lasted data. My
db-data-config.xml: 

dataConfig
dataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost:3306/funguide user=root password=root/  
 document name=shopinfo
entity name=shop pk=shop_id query=select
shop_id,title,description,tel,address,longitude,latitude from shop
deltaQuery=select shop_id from shop where last_modified 
'${dataimporter.last_index_time}'

field column=shop_id name=id /
field column=title name=title /   
field column=description name=description /   

field column=tel name=tel /   
field column=address name=address /   

field column=longitude name=longitude /   

field column=latitude name=latitude / 


/entity
/document
/dataConfig

Anyboby know how to solve the problem? Thanks!

enzhao...@gmail.com


-- 
View this message in context: 
http://www.nabble.com/%E3%80%90solr-DIH%E3%80%91A-problem-about-solr-delta-imports-tp25055788p25055788.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: 【solr DIH】A problem about solr delta-imports