Re: Type converters for DocumentObjectBinder

2009-11-13 Thread paulhyo

Hi Paul,

it's working for Query, but not for Updating (Add Bean). The getter method
is returning a Calendar (GregorianCalendar instance)

On the indexer side, a toString() or something equivalent is done and an
error is thrown

Caused by: java.text.ParseException: Unparseable date:
java.util.GregorianCalendar:java.util.GregorianCalendar[time=1258100168327,areFieldsSet=
rue,areAllFieldsSet=true,lenient=true,zone=sun.util.calendar.ZoneInfo[id=Europe/Berlin,offset=360,dstSavings=360,useDaylight=true,tran
itions=143,lastRule=java.util.SimpleTimeZone[id=Europe/Berlin,offset=360,dstSavings=360,useDaylight=true,startYear=0,startMode=2,startMo
th=2,startDay=-1,startDayOfWeek=1,startTime=360,startTimeMode=2,endMode=2,endMonth=9,endDay=-1,endDayOfWeek=1,endTime=360,endTimeMode=2]
,firstDayOfWeek=2,minimalDaysInFirstWeek=4,ERA=1,YEAR=2009,MONTH=10,WEEK_OF_YEAR=46,WEEK_OF_MONTH=2,DAY_OF_MONTH=13,DAY_OF_YEAR=317,DAY_OF_WEEK=
,DAY_OF_WEEK_IN_MONTH=2,AM_PM=0,HOUR=9,HOUR_OF_DAY=9,MINUTE=16,SECOND=8,MILLISECOND=327,ZONE_OFFSET=360,DST_OFFSET=0]


public Calendar getValidFrom() {
return validFrom;
}

public void setValidFrom(Calendar validFrom) {
this.validFrom = validFrom;
}

@Field
public void setValidFrom(String validFrom) {
Calendar cal = Calendar.getInstance();
try {
cal.setTime(dateFormat.parse(validFrom));
} catch (ParseException e) {
e.printStackTrace();
}
this.validFrom = cal;
}






Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
 
 create a setter method for the field which take s a Stringand apply
 the annotation there
 
 example
 
 
 private Calendar validFrom;
 
 @Field
 public void setvalidFrom(String s){
 //convert to Calendar object and set the field
 }
 
 
 On Fri, Nov 13, 2009 at 12:24 PM, paulhyo st...@ouestil.ch wrote:

 Hi,

 I would like to know if there is a way to add type converters when using
 getBeans. I need convertion when Updating (Calendar - String) and when
 Searching (String - Calendar)


 The Bean class defines :
 @Field
 private Calendar validFrom;

 but the recieved type within Query Response is a String (2009-11-13)...

 Actually I get this error :

 java.lang.RuntimeException: Exception while setting value : 2009-09-16 on
 private java.util.Calendar
 ch.mycompany.access.solr.impl.SoNatPersonImpl.validFrom
        at
 org.apache.solr.client.solrj.beans.DocumentObjectBinder$DocField.set(DocumentObjectBinder.java:360)
        at
 org.apache.solr.client.solrj.beans.DocumentObjectBinder$DocField.inject(DocumentObjectBinder.java:342)
        at
 org.apache.solr.client.solrj.beans.DocumentObjectBinder.getBeans(DocumentObjectBinder.java:55)
        at
 org.apache.solr.client.solrj.response.QueryResponse.getBeans(QueryResponse.java:324)
        at
 ch.mycompany.access.solr.impl.result.NatPersonPartnerResultBuilder.buildBeanListResult(NatPersonPartnerResultBuilder.java:38)
        at
 ch.mycompany.access.solr.impl.SoQueryManagerImpl.searchNatPersons(SoQueryManagerImpl.java:41)
        at
 ch.mycompany.access.solr.impl.SolrQueryManagerTest.testQueryFamilyNameRigg(SolrQueryManagerTest.java:36)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at junit.framework.TestCase.runTest(TestCase.java:164)
        at junit.framework.TestCase.runBare(TestCase.java:130)
        at junit.framework.TestResult$1.protect(TestResult.java:106)
        at junit.framework.TestResult.runProtected(TestResult.java:124)
        at junit.framework.TestResult.run(TestResult.java:109)
        at junit.framework.TestCase.run(TestCase.java:120)
        at junit.framework.TestSuite.runTest(TestSuite.java:230)
        at junit.framework.TestSuite.run(TestSuite.java:225)
        at
 org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
        at
 org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
        at
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
        at
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
        at
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
        at
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
 Caused by: java.lang.IllegalArgumentException: Can not set
 java.util.Calendar field
 ch.mycompany.access.solr.impl.SoNatPersonImpl.validFrom to
 java.lang.String
        at
 sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:146)
        at
 

highlighting issue lst.name is a leaf node

2009-11-13 Thread Chuck Mysak
Hello list,

I'm new to solr but from what I'm experimenting, it's awesome.
I have a small issue regarding the highlighting feature.

It finds stuff (as I see from the query analyzer), but the highlight list
looks something like this:

lst name=highlighting
lst name=c:\0596520107.pdf/
lst name=c:\0470511389.pdf/
/lst

(the files were added using  ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest(/update/extract); and I set the literal.id to
the filename)

My solrconfig.xml requesthandler looks like:

  requestHandler name=standard class=solr.SearchHandler default=true
!-- default values for query parameters --
 lst name=defaults
   str name=echoParamsexplicit/str
   !--
   int name=rows10/int
   str name=fl*/str
   str name=version2.1/str
--
   bool name=hltrue/bool
   int name=hl.snippets3/int
   int name=hl.fragsize30/int
   str name=hl.simple.pre![CDATA[span]]/str
   str name=hl.simple.post![CDATA[/span]]/str
   str name=hl.fl*/str
   bool name=hl.requireFieldMatchtrue/bool
   float name=hl.regex.slop0.5/float
   str name=hl.regex.pattern[-\w ,/\n\']{20,200}/str
   bool name=hl.usePhraseHighlightertrue/bool
 /lst
  /requestHandler

The schema.xml is untouched and downloaded yesterday from the latest stable
build.

At first, I thought it had something to do with the extraction of the pdf,
but I tried the demo xml docs also and got the same result.

I'm new to this, so please help.

Thank you,

Chuck


Re: Stop solr without losing documents

2009-11-13 Thread gwk

Michael wrote:

I've got a process external to Solr that is constantly feeding it new
documents, retrying if Solr is nonresponding.  What's the right way to
stop Solr (running in Tomcat) so no documents are lost?

Currently I'm committing all cores and then running catalina's stop
script, but between my commit and the stop, more documents can come in
that would need *another* commit...

Lots of people must have had this problem already, so I know the
answer is simple; I just can't find it!

Thanks.
Michael
  
I don't know if this is the best solution, or even if it's applicable to 
your situation but we do incremental updates from a database based on a 
timestamp, (from a simple seperate sql table filled by triggers so 
deletes are measures correctly as well). We store this timestamp in solr 
as well. Our index script first does a simple Solr request to request 
the newest timestamp and basically selects the documents to update with 
a SELECT * FROM document_updates WHERE timestamp = X where X is the 
timestamp returned from Solr (We use = for the hopefully extremely rare 
case when two updates are at the same time and also at the same time the 
index script is run where it only retrieved one of the updates, this 
will cause some documents to be updates multiple times but as document 
updates are idempotent this is no real problem.)


Regards,

gwk


Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Jan-Eirik B . Nævdal
Some extra for the pros list:

- Full control over which content to be searchable and not.
- Posibility to make pages searchable almost instant after publication
- Control over when the site is indexed


Friendly

Jan-Eirik

On Fri, Nov 13, 2009 at 10:52 AM, Lukáš Vlček lukas.vl...@gmail.com wrote:

 Hi,

 I am looking for good arguments to justify implementation a search for
 sites
 which are available on the public internet. There are many sites in
 powered
 by Solr section which are indexed by Google and other search engines but
 still they decided to invest resources into building and maintenance of
 their own search functionality and not to go with [user_query site:
 my_site.com] google search. Why?

 By no mean I am saying it makes not sense to implement Solr! But I want to
 put together list of reasons and possibly with examples. Your help would be
 much appreciated!

 Let's narrow the scope of this discussion to the following:
 - the search should cover several community sites running open source CMSs,
 JIRAs, Bugillas ... and the like
 - all documents use open formats (no need to parse Word or Excel)
 (maybe something close to what LucidImagination does for mailing lists of
 Lucene and Solr)

 My initial kick off list would be:

 pros:
 - considering we understand the content (we understand the domain scope) we
 can fine tune the search engine to provide more accurate results
 - Solr can give us facets
 - we have user search logs (valuable for analysis)
 - implementing Solr is a fun

 cons:
 - requires resources (but the cost is relatively low depending on the query
 traffic, index size and frequency of updates)

 Regards,
 Lukas

 http://blog.lukas-vlcek.com/




-- 
Jan Eirik B. Nævdal
Solutions Engineer | +47 982 65 347
Iterate AS | www.iterate.no
The Lean Software Development Consultancy


Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Markus Jelsma - Buyways B.V.
Next to the faceting engine:
- MoreLikeThis
- Highlighting
- Spellchecker

But also more flexible querying using the DisMax handler which is
clearly superior. Solr can also be used to store data which can be
retrieved in an instant! We have used this technique in a site and it is
obviously much faster than multiple large and complex SQL statements.


On Fri, 2009-11-13 at 10:52 +0100, Lukáš Vlček wrote:

 pros:
 - considering we understand the content (we understand the domain scope) we
 can fine tune the search engine to provide more accurate results
 - Solr can give us facets
 - we have user search logs (valuable for analysis)
 - implementing Solr is a fun
 
 cons:
 - requires resources (but the cost is relatively low depending on the query
 traffic, index size and frequency of updates)
 
 Regards,
 Lukas
 
 http://blog.lukas-vlcek.com/


Re: highlighting issue lst.name is a leaf node

2009-11-13 Thread Chuck Mysak
I found the solution.
If somebody will run into the same problem, here is how I solved it.

- while uploading the document:

req.setParam(uprefix, attr_);
req.setParam(fmap.content, attr_content);
req.setParam(overwrite, true);
req.setParam(commit, true);

- in the query:
http://localhost:8983/solr/select?q=attr_content:%22Django%22rows=4
- edit the solrconfig.xml in the requesthandler params

   str name=flid,title/str
so that you won't get the whole text content inside the response.

Regards,
Chuck

On Fri, Nov 13, 2009 at 11:21 AM, Chuck Mysak chuck.my...@gmail.com wrote:

 Hello list,

 I'm new to solr but from what I'm experimenting, it's awesome.
 I have a small issue regarding the highlighting feature.

 It finds stuff (as I see from the query analyzer), but the highlight list
 looks something like this:

 lst name=highlighting
 lst name=c:\0596520107.pdf/
 lst name=c:\0470511389.pdf/
 /lst

 (the files were added using  ContentStreamUpdateRequest req = new
 ContentStreamUpdateRequest(/update/extract); and I set the literal.id
 to the filename)

 My solrconfig.xml requesthandler looks like:

   requestHandler name=standard class=solr.SearchHandler
 default=true
 !-- default values for query parameters --
  lst name=defaults
str name=echoParamsexplicit/str
!--
int name=rows10/int
str name=fl*/str
str name=version2.1/str
 --
bool name=hltrue/bool
int name=hl.snippets3/int
int name=hl.fragsize30/int
str name=hl.simple.pre![CDATA[span]]/str
str name=hl.simple.post![CDATA[/span]]/str
str name=hl.fl*/str
bool name=hl.requireFieldMatchtrue/bool
float name=hl.regex.slop0.5/float
str name=hl.regex.pattern[-\w ,/\n\']{20,200}/str
bool name=hl.usePhraseHighlightertrue/bool
  /lst
   /requestHandler

 The schema.xml is untouched and downloaded yesterday from the latest stable
 build.

 At first, I thought it had something to do with the extraction of the pdf,
 but I tried the demo xml docs also and got the same result.

 I'm new to this, so please help.

 Thank you,

 Chuck








Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Chantal Ackermann



Jan-Eirik B. Nævdal schrieb:

Some extra for the pros list:

- Full control over which content to be searchable and not.
- Posibility to make pages searchable almost instant after publication
- Control over when the site is indexed


+1 expecially the last point
you can also add a robot.txt and prohibit spidering of the site to 
reduce traffic. google won't index any highly dynamic content, then.





Friendly

Jan-Eirik

On Fri, Nov 13, 2009 at 10:52 AM, Lukáš Vlček lukas.vl...@gmail.com wrote:


Hi,

I am looking for good arguments to justify implementation a search for
sites
which are available on the public internet. There are many sites in
powered
by Solr section which are indexed by Google and other search engines but
still they decided to invest resources into building and maintenance of
their own search functionality and not to go with [user_query site:
my_site.com] google search. Why?

By no mean I am saying it makes not sense to implement Solr! But I want to
put together list of reasons and possibly with examples. Your help would be
much appreciated!

Let's narrow the scope of this discussion to the following:
- the search should cover several community sites running open source CMSs,
JIRAs, Bugillas ... and the like
- all documents use open formats (no need to parse Word or Excel)
(maybe something close to what LucidImagination does for mailing lists of
Lucene and Solr)

My initial kick off list would be:

pros:
- considering we understand the content (we understand the domain scope) we
can fine tune the search engine to provide more accurate results
- Solr can give us facets
- we have user search logs (valuable for analysis)
- implementing Solr is a fun

cons:
- requires resources (but the cost is relatively low depending on the query
traffic, index size and frequency of updates)

Regards,
Lukas

http://blog.lukas-vlcek.com/





--
Jan Eirik B. Nævdal
Solutions Engineer | +47 982 65 347
Iterate AS | www.iterate.no
The Lean Software Development Consultancy


Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Andrew Clegg


Lukáš Vlček wrote:
 
 I am looking for good arguments to justify implementation a search for
 sites
 which are available on the public internet. There are many sites in
 powered
 by Solr section which are indexed by Google and other search engines but
 still they decided to invest resources into building and maintenance of
 their own search functionality and not to go with [user_query site:
 my_site.com] google search. Why?
 

You're assuming that Solr is just used in these cases to index discrete web
pages which Google etc. would be able to access via following navigational
links.

I would imagine that in a lot of cases, Solr is used to index database
entities which are used to build [parts of] pages dynamically, and which
might be viewable in different forms in various different pages.

Plus, with stored fields, you have the option of actually driving a website
off Solr instead of directly off a database, which might make sense from a
speed perspective in some cases.

And further, going back to page-only indexing -- you have no guarantee when
Google will decide to recrawl your site, so there may be a delay before
changes show up in their index. With an in-house search engine you can
reindex as often as you like.

Andrew.

-- 
View this message in context: 
http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Lukáš Vlček
Hi,

thanks for inputs so far... however, let's put it this way:

When you need to search for something Lucene or Solr related, which one do
you use:
- generic Google
- go to a particular mail list web site and search from here (if there is
any search form at all)
- go to LucidImagination.com and use its search capability

Regards,
Lukas


On Fri, Nov 13, 2009 at 11:50 AM, Andrew Clegg andrew.cl...@gmail.comwrote:



 Lukáš Vlček wrote:
 
  I am looking for good arguments to justify implementation a search for
  sites
  which are available on the public internet. There are many sites in
  powered
  by Solr section which are indexed by Google and other search engines but
  still they decided to invest resources into building and maintenance of
  their own search functionality and not to go with [user_query site:
  my_site.com] google search. Why?
 

 You're assuming that Solr is just used in these cases to index discrete web
 pages which Google etc. would be able to access via following navigational
 links.

 I would imagine that in a lot of cases, Solr is used to index database
 entities which are used to build [parts of] pages dynamically, and which
 might be viewable in different forms in various different pages.

 Plus, with stored fields, you have the option of actually driving a website
 off Solr instead of directly off a database, which might make sense from a
 speed perspective in some cases.

 And further, going back to page-only indexing -- you have no guarantee when
 Google will decide to recrawl your site, so there may be a delay before
 changes show up in their index. With an in-house search engine you can
 reindex as often as you like.

 Andrew.

 --
 View this message in context:
 http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Data import problem with child entity from different database

2009-11-13 Thread Andrew Clegg

Morning all,

I'm having problems with joining child a child entity from one database to a
parent from another...

My entity definitions look like this (names changed for brevity):

entity name=parent dataSource=db1 query=select a, b, c from
parent_table

  entity name=child dataSource=db2 onError=continue query=select c,
d from child_table where c = '${parent.c}' /

/entity

c is getting indexed fine (it's stored, I can see field 'c' in the search
results) but child.d isn't. I know the child table has data for the
corresponding parent rows, and I've even watched the SQL queries against the
child table appearing in Oracle's sqldeveloper as the DataImportHandler
runs. But no content for child.d gets into the index.

My schema contains a definition for a field called d like so:

field name=d type=keywords_ids indexed=true stored=true
multiValued=true termVectors=true /

(keywords_ids is a conservatively-analyzed text type which has worked fine
in other contexts.)

Two things occur to me.

1. db1 is PostgreSQL and db2 is Oracle, although the d field in both tables
is just a char(4), nothing fancy. Could something weird with character
encodings be happening?

2. d isn't a primary key in either parent or child, but this shouldn't
matter should it?

Additional data points -- I also tried using the CachedSqlEntityProcessor to
do in-memory table caching of child, but it didn't work then either. I got a
lot of error messages like this:

No value available for the cache key : d in the entity : child

If anyone knows whether this is a known limitation (if so I can work round
it), or an unexpected case (if so I'll file a bug report), please shout. I'm
using 1.4.

Yet again, many thanks :-)

Andrew.

-- 
View this message in context: 
http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26334948.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Andrew Clegg


Lukáš Vlček wrote:
 
 When you need to search for something Lucene or Solr related, which one do
 you use:
 - generic Google
 - go to a particular mail list web site and search from here (if there is
 any search form at all)
 

Both of these (Nabble in the second case) in case any recent posts have
appeared which Google hasn't picked up.

Andrew.

-- 
View this message in context: 
http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334980.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Jon Baer
For this list I usually end up @ http://solr.markmail.org (which I believe also 
uses Lucene under the hood)

Google is such a black box ... 

Pros:
+ 1 Open Source (enough said :-)

There also seems to always be the notion that crawling leads itself to 
produce the best results but that is rarely the case.  And unless you are a 
special type of site Google will not overlay your results w/ some type of 
context in the search (ie news or sports, etc).  

What I think really needs to happen is Solr (and is a bit missing @ the moment) 
is there needs to be a common interface to reindexing another index (if that 
makes sense) ... something akin or like OpenSearch 
(http://www.opensearch.org/Community/OpenSearch_software)

For example what I would like to do is have my site, have my search index, and 
connect Google to indexing just to my search index (and not crawl the site) ... 
the only current option for something like that are sitemaps which I think Solr 
(templates) should have a contrib project for (but you would have to generate 
these offline for sure).

- Jon  

On Nov 13, 2009, at 6:00 AM, Lukáš Vlček wrote:

 Hi,
 
 thanks for inputs so far... however, let's put it this way:
 
 When you need to search for something Lucene or Solr related, which one do
 you use:
 - generic Google
 - go to a particular mail list web site and search from here (if there is
 any search form at all)
 - go to LucidImagination.com and use its search capability
 
 Regards,
 Lukas
 
 
 On Fri, Nov 13, 2009 at 11:50 AM, Andrew Clegg andrew.cl...@gmail.comwrote:
 
 
 
 Lukáš Vlček wrote:
 
 I am looking for good arguments to justify implementation a search for
 sites
 which are available on the public internet. There are many sites in
 powered
 by Solr section which are indexed by Google and other search engines but
 still they decided to invest resources into building and maintenance of
 their own search functionality and not to go with [user_query site:
 my_site.com] google search. Why?
 
 
 You're assuming that Solr is just used in these cases to index discrete web
 pages which Google etc. would be able to access via following navigational
 links.
 
 I would imagine that in a lot of cases, Solr is used to index database
 entities which are used to build [parts of] pages dynamically, and which
 might be viewable in different forms in various different pages.
 
 Plus, with stored fields, you have the option of actually driving a website
 off Solr instead of directly off a database, which might make sense from a
 speed perspective in some cases.
 
 And further, going back to page-only indexing -- you have no guarantee when
 Google will decide to recrawl your site, so there may be a delay before
 changes show up in their index. With an in-house search engine you can
 reindex as often as you like.
 
 Andrew.
 
 --
 View this message in context:
 http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 



Re: Selection of terms for MoreLikeThis

2009-11-13 Thread Andrew Clegg

Any ideas on this? Is it worth sending a bug report?

Those links are live, by the way, in case anyone wants to verify that MLT is
returning suggestions with very low tf.idf.

Cheers,

Andrew.


Andrew Clegg wrote:
 
 Hi,
 
 If I run a MoreLikeThis query like the following:
 
 http://www.cathdb.info/solr/mlt?q=id:3.40.50.720rows=0mlt.interestingTerms=listmlt.match.include=falsemlt.fl=keywordsmlt.mintf=1mlt.mindf=1
 
 one of the hits in the results is and (I don't do any stopword removal
 on this field).
 
 However if I look inside that document with the TermVectorComponent:
 
 http://www.cathdb.info/solr/select/?q=id:3.40.50.720tv=truetv.all=truetv.fl=keywords
 
 I see that and has a measly tf.idf of 7.46E-4. But there are other terms
 with *much* higher tf.idf scores, e.g.:
 
 lst name=aquaspirillum
 int name=tf1/int
 int name=df10/int
 double name=tf-idf0.1/double
 /lst
 
 that *don't* appear in the MoreLikeThis list. (I tried adding
 mlt.maxwl=999 to the end of the MLT query but it makes no difference.)
 
 What's going on? Surely something with tf.idf = 0.1 is a far better
 candidate for a MoreLikeThis query than something with tf.idf = 1.46E-4?
 Or does MoreLikeThis do some other heuristic magic to select good
 candidates, and sometimes get it wrong?
 
 BTW the keywords field is indexed, stored, multi-valued and term-vectored.
 
 Thanks,
 
 Andrew.
 
 -- 
 :: http://biotext.org.uk/ ::
 
 

-- 
View this message in context: 
http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26335061.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Data import problem with child entity from different database

2009-11-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
no obvious issues.
you may post your entire data-config.xml

do w/o CachedSqlEntityProcessor first and then apply that later


On Fri, Nov 13, 2009 at 4:38 PM, Andrew Clegg andrew.cl...@gmail.com wrote:

 Morning all,

 I'm having problems with joining child a child entity from one database to a
 parent from another...

 My entity definitions look like this (names changed for brevity):

 entity name=parent dataSource=db1 query=select a, b, c from
 parent_table

  entity name=child dataSource=db2 onError=continue query=select c,
 d from child_table where c = '${parent.c}' /

 /entity

 c is getting indexed fine (it's stored, I can see field 'c' in the search
 results) but child.d isn't. I know the child table has data for the
 corresponding parent rows, and I've even watched the SQL queries against the
 child table appearing in Oracle's sqldeveloper as the DataImportHandler
 runs. But no content for child.d gets into the index.

 My schema contains a definition for a field called d like so:

 field name=d type=keywords_ids indexed=true stored=true
 multiValued=true termVectors=true /

 (keywords_ids is a conservatively-analyzed text type which has worked fine
 in other contexts.)

 Two things occur to me.

 1. db1 is PostgreSQL and db2 is Oracle, although the d field in both tables
 is just a char(4), nothing fancy. Could something weird with character
 encodings be happening?

 2. d isn't a primary key in either parent or child, but this shouldn't
 matter should it?

 Additional data points -- I also tried using the CachedSqlEntityProcessor to
 do in-memory table caching of child, but it didn't work then either. I got a
 lot of error messages like this:

 No value available for the cache key : d in the entity : child

 If anyone knows whether this is a known limitation (if so I can work round
 it), or an unexpected case (if so I'll file a bug report), please shout. I'm
 using 1.4.

 Yet again, many thanks :-)

 Andrew.

 --
 View this message in context: 
 http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26334948.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


exclude some fields from copying dynamic fields | schema.xml

2009-11-13 Thread Vicky_Dev

Hi, 
we are using the following entry in schema.xml to make a copy of one type of
dynamic field to another : 
copyField source=*_s dest=*_str_s / 

Is it possible to exclude some fields from copying.

We are using Solr1.3

~Vikrant

-- 
View this message in context: 
http://old.nabble.com/exclude-some-fields-from-copying-dynamic-fields-%7C-schema.xml-tp26335109p26335109.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Data import problem with child entity from different database

2009-11-13 Thread Andrew Clegg



Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
 
 no obvious issues.
 you may post your entire data-config.xml
 

Here it is, exactly as last attempt but with usernames etc. removed.

Ignore the comments and the unused FileDataSource...

http://old.nabble.com/file/p26335171/dataimport.temp.xml dataimport.temp.xml 


Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
 
 do w/o CachedSqlEntityProcessor first and then apply that later
 

Yep, that was just a bit of a wild stab in the dark to see if it made any
difference.

Thanks,

Andrew.

-- 
View this message in context: 
http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26335171.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Michael McCandless
I think we sorely need a Directory impl that down-prioritizes IO
performed by merging.

It would be wonderful if from Java we could simply set a per-thread
IO priority, but, it'll be a looong time until that's possible.

So I think for now we should make a Directory impl that emulates such
behavior, eg Lucene could state the context (merge, flush, search,
nrt-reopen, etc.) whenever it opens an IndexInput / IndexOutput, and
then the Directory could hack in pausing the merge IO whenever
search/nrt-reopen IO is active.

Mike

On Thu, Nov 12, 2009 at 7:18 PM, Mark Miller markrmil...@gmail.com wrote:
 Jerome L Quinn wrote:
 Hi, everyone, this is a problem I've had for quite a while,
 and have basically avoided optimizing because of it.  However,
 eventually we will get to the point where we must delete as
 well as add docs continuously.

 I have a Solr 1.3 index with ~4M docs at around 90G.  This is a single
 instance running inside tomcat 6, so no replication.  Merge factor is the
 default 10.  ramBufferSizeMB is 32.  maxWarmingSearchers=4.
 autoCommit is set at 3 sec.

 We continually push new data into the index, at somewhere between 1-10 docs
 every 10 sec or so.  Solr is running on a quad-core 3.0GHz server.
 under IBM java 1.6.  The index is sitting on a local 15K scsi disk.
 There's nothing
 else of substance running on the box.

 Optimizing the index takes about 65 min.

 As long as I'm not optimizing, search and indexing times are satisfactory.

 When I start the optimize, I see massive problems with timeouts pushing new
 docs
 into the index, and search times balloon.  A typical search while
 optimizing takes
 about 1 min instead of a few seconds.

 Can anyone offer me help with fixing the problem?

 Thanks,
 Jerry Quinn

 Ah, the pains of optimization. Its kind of just how it is. One solution
 is to use two boxes and replication - optimize on the master, and then
 queries only hit the slave. Out of reach for some though, and adds many
 complications.

 Another kind of option is to use the partial optimize feature:

  optimize maxOptimizeSegments=5/

 Using this, you can optimize down to n segments and take a shorter hit
 each time.

 Also, if optimizing is so painful, you might lower the merge factor
 amortize that pain better. Thats another way to slowly get there - if
 you lower the merge factor, as merging takes place, the new merge factor
 will be respected, and semgents will merge down. A merge factor of 2
 (the lowest) will make it so you only ever have 2 segments. Sometimes
 that works reasonably well - you could try 3-6 or something as well.
 Then when you do your partial optimizes (and eventually a full optimize
 perhaps), you want have so far to go.

 --
 - Mark

 http://www.lucidimagination.com






Re: javabin in .NET?

2009-11-13 Thread Mauricio Scheffer
Nope. It has to be manually ported. Not so much because of the language
itself but because of differences in the libraries.


2009/11/13 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 Is there any tool to directly port java to .Net? then we can etxract
 out the client part of the javabin code and convert it.

 On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher erik.hatc...@gmail.com
 wrote:
  Has anyone looked into using the javabin response format from .NET
 (instead
  of SolrJ)?
 
  It's mainly a curiosity.
 
  How much better could performance/bandwidth/throughput be?  How difficult
  would it be to implement some .NET code (C#, I'd guess being the best
  choice) to handle this response format?
 
  Thanks,
 Erik
 
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com



Re: Selection of terms for MoreLikeThis

2009-11-13 Thread Chantal Ackermann

Hi Andrew,

no idea, I'm afraid - but could you sent the output of 
interestingTerms=details?
This at least would show what MoreLikeThis uses, in comparison to the 
TermVectorComponent you've already pasted.


Chantal

Andrew Clegg schrieb:

Any ideas on this? Is it worth sending a bug report?

Those links are live, by the way, in case anyone wants to verify that MLT is
returning suggestions with very low tf.idf.

Cheers,

Andrew.


Andrew Clegg wrote:

Hi,

If I run a MoreLikeThis query like the following:

http://www.cathdb.info/solr/mlt?q=id:3.40.50.720rows=0mlt.interestingTerms=listmlt.match.include=falsemlt.fl=keywordsmlt.mintf=1mlt.mindf=1

one of the hits in the results is and (I don't do any stopword removal
on this field).

However if I look inside that document with the TermVectorComponent:

http://www.cathdb.info/solr/select/?q=id:3.40.50.720tv=truetv.all=truetv.fl=keywords

I see that and has a measly tf.idf of 7.46E-4. But there are other terms
with *much* higher tf.idf scores, e.g.:

lst name=aquaspirillum
int name=tf1/int
int name=df10/int
double name=tf-idf0.1/double
/lst

that *don't* appear in the MoreLikeThis list. (I tried adding
mlt.maxwl=999 to the end of the MLT query but it makes no difference.)

What's going on? Surely something with tf.idf = 0.1 is a far better
candidate for a MoreLikeThis query than something with tf.idf = 1.46E-4?
Or does MoreLikeThis do some other heuristic magic to select good
candidates, and sometimes get it wrong?

BTW the keywords field is indexed, stored, multi-valued and term-vectored.

Thanks,

Andrew.

--
:: http://biotext.org.uk/ ::




--
View this message in context: 
http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26335061.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Michael McCandless
Another thing to try, is reducing the maxThreadCount for
ConcurrentMergeScheduler.

It defaults to 3, which I think is too high -- we should change this
default to 1 (I'll open a Lucene issue).

Mike

On Thu, Nov 12, 2009 at 6:30 PM, Jerome L Quinn jlqu...@us.ibm.com wrote:

 Hi, everyone, this is a problem I've had for quite a while,
 and have basically avoided optimizing because of it.  However,
 eventually we will get to the point where we must delete as
 well as add docs continuously.

 I have a Solr 1.3 index with ~4M docs at around 90G.  This is a single
 instance running inside tomcat 6, so no replication.  Merge factor is the
 default 10.  ramBufferSizeMB is 32.  maxWarmingSearchers=4.
 autoCommit is set at 3 sec.

 We continually push new data into the index, at somewhere between 1-10 docs
 every 10 sec or so.  Solr is running on a quad-core 3.0GHz server.
 under IBM java 1.6.  The index is sitting on a local 15K scsi disk.
 There's nothing
 else of substance running on the box.

 Optimizing the index takes about 65 min.

 As long as I'm not optimizing, search and indexing times are satisfactory.

 When I start the optimize, I see massive problems with timeouts pushing new
 docs
 into the index, and search times balloon.  A typical search while
 optimizing takes
 about 1 min instead of a few seconds.

 Can anyone offer me help with fixing the problem?

 Thanks,
 Jerry Quinn


Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Michael McCandless
On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 I think we sorely need a Directory impl that down-prioritizes IO
 performed by merging.

Presumably this prioritizing Directory impl could wrap/decorate any
existing Directory.

Mike


Re: javabin in .NET?

2009-11-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
The javabin format does not have many dependencies. it may have 3-4
classes an that is it.

On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer
mauricioschef...@gmail.com wrote:
 Nope. It has to be manually ported. Not so much because of the language
 itself but because of differences in the libraries.


 2009/11/13 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 Is there any tool to directly port java to .Net? then we can etxract
 out the client part of the javabin code and convert it.

 On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher erik.hatc...@gmail.com
 wrote:
  Has anyone looked into using the javabin response format from .NET
 (instead
  of SolrJ)?
 
  It's mainly a curiosity.
 
  How much better could performance/bandwidth/throughput be?  How difficult
  would it be to implement some .NET code (C#, I'd guess being the best
  choice) to handle this response format?
 
  Thanks,
         Erik
 
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Selection of terms for MoreLikeThis

2009-11-13 Thread Andrew Clegg


Chantal Ackermann wrote:
 
 no idea, I'm afraid - but could you sent the output of 
 interestingTerms=details?
 This at least would show what MoreLikeThis uses, in comparison to the 
 TermVectorComponent you've already pasted.
 

I can, but I'm afraid they're not very illuminating!

http://www.cathdb.info/solr/mlt?q=id:3.40.50.720rows=0mlt.interestingTerms=detailsmlt.match.include=falsemlt.fl=keywordsmlt.mintf=1mlt.mindf=1

response
lst name=responseHeader
 int name=status0/int
 int name=QTime59/int
/lst
result name=response numFound=280227 start=0/
lst name=interestingTerms
 float name=keywords:dehydrogenase1.0/float
 float name=keywords:reductase1.0/float
 float name=keywords:metabolism1.0/float
 float name=keywords:activity1.0/float
 float name=keywords:process1.0/float
 float name=keywords:alcohol1.0/float
 float name=keywords:and1.0/float
 float name=keywords:malate1.0/float
 float name=keywords:biosynthesis1.0/float
 float name=keywords:biosynthetic1.0/float
 float name=keywords:degradation1.0/float
 float name=keywords:precursor1.0/float
 float name=keywords:metabolic1.0/float
 float name=keywords:protein1.0/float
 float name=keywords:synthase1.0/float
 float name=keywords:acid1.0/float
 float name=keywords:enzyme1.0/float
 float name=keywords:succinyl-coa1.0/float
 float name=keywords:putative1.0/float
 float name=keywords:(nadp+)1.0/float
 float name=keywords:4,6-dehydratase1.0/float
 float name=keywords:fatty1.0/float
 float name=keywords:chloroplast1.0/float
 float name=keywords:lactobacillus1.0/float
 float name=keywords:glyoxylate1.0/float
/lst
/response

Cheers,

Andrew.

-- 
View this message in context: 
http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26336558.html
Sent from the Solr - User mailing list archive at Nabble.com.



non english languages

2009-11-13 Thread Chuck Mysak
Hello all,

is there support for non-english language content indexing in Solr?
I'm interested in Bulgarian, Hungarian, Romanian and Russian.

Best regards,

Chuck


Re: non english languages

2009-11-13 Thread Robert Muir
the included snowball filters support hungarian, romanian, and russian.

On Fri, Nov 13, 2009 at 9:03 AM, Chuck Mysak chuck.my...@gmail.com wrote:

 Hello all,

 is there support for non-english language content indexing in Solr?

I'm interested in Bulgarian, Hungarian, Romanian and Russian.

 Best regards,

 Chuck




-- 
Robert Muir
rcm...@gmail.com


Re: Selection of terms for MoreLikeThis

2009-11-13 Thread Chantal Ackermann

Hi Andrew,

your URL does not include the parameter mlt.boost. Setting that to 
true made a noticeable difference for my queries.


If not, there is also the parameter
 mlt.minwl
minimum word length below which words will be ignored.

All your other terms seem longer than 3, so it would help in this case? 
But seems a bit like work around.


Cheers,
Chantal

Andrew Clegg schrieb:


Chantal Ackermann wrote:

no idea, I'm afraid - but could you sent the output of
interestingTerms=details?
This at least would show what MoreLikeThis uses, in comparison to the
TermVectorComponent you've already pasted.



I can, but I'm afraid they're not very illuminating!

http://www.cathdb.info/solr/mlt?q=id:3.40.50.720rows=0mlt.interestingTerms=detailsmlt.match.include=falsemlt.fl=keywordsmlt.mintf=1mlt.mindf=1

response
lst name=responseHeader
 int name=status0/int
 int name=QTime59/int
/lst
result name=response numFound=280227 start=0/
lst name=interestingTerms
 float name=keywords:dehydrogenase1.0/float
 float name=keywords:reductase1.0/float
 float name=keywords:metabolism1.0/float
 float name=keywords:activity1.0/float
 float name=keywords:process1.0/float
 float name=keywords:alcohol1.0/float
 float name=keywords:and1.0/float
 float name=keywords:malate1.0/float
 float name=keywords:biosynthesis1.0/float
 float name=keywords:biosynthetic1.0/float
 float name=keywords:degradation1.0/float
 float name=keywords:precursor1.0/float
 float name=keywords:metabolic1.0/float
 float name=keywords:protein1.0/float
 float name=keywords:synthase1.0/float
 float name=keywords:acid1.0/float
 float name=keywords:enzyme1.0/float
 float name=keywords:succinyl-coa1.0/float
 float name=keywords:putative1.0/float
 float name=keywords:(nadp+)1.0/float
 float name=keywords:4,6-dehydratase1.0/float
 float name=keywords:fatty1.0/float
 float name=keywords:chloroplast1.0/float
 float name=keywords:lactobacillus1.0/float
 float name=keywords:glyoxylate1.0/float
/lst
/response

Cheers,

Andrew.

--
View this message in context: 
http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26336558.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: javabin in .NET?

2009-11-13 Thread Mauricio Scheffer
I meant the standard IO libraries. They are different enough that the code
has to be manually ported. There were some automated tools back when
Microsoft introduced .Net, but IIRC they never really worked.

Anyway it's not a big deal, it should be a straightforward job. Testing it
thoroughly cross-platform is another thing though.

2009/11/13 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 The javabin format does not have many dependencies. it may have 3-4
 classes an that is it.

 On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer
 mauricioschef...@gmail.com wrote:
  Nope. It has to be manually ported. Not so much because of the language
  itself but because of differences in the libraries.
 
 
  2009/11/13 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
 
  Is there any tool to directly port java to .Net? then we can etxract
  out the client part of the javabin code and convert it.
 
  On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher erik.hatc...@gmail.com
  wrote:
   Has anyone looked into using the javabin response format from .NET
  (instead
   of SolrJ)?
  
   It's mainly a curiosity.
  
   How much better could performance/bandwidth/throughput be?  How
 difficult
   would it be to implement some .NET code (C#, I'd guess being the best
   choice) to handle this response format?
  
   Thanks,
  Erik
  
  
 
 
 
  --
  -
  Noble Paul | Principal Engineer| AOL | http://aol.com
 
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com



Reseting doc boosts

2009-11-13 Thread Jon Baer
Hi,

Im trying to figure out if there is an easy way to basically reset all of any 
doc boosts which you have made (for analytical purposes) ... for example if I 
run an index, gather report, doc boost on the report, and reset the boosts @ 
time of next index ... 

It would seem to be from just knowing how Lucene works that I would really need 
to reindex since its a attrib on the doc itself which would have to be 
modified, but there is no easy way to query for docs which have been boosted 
either.  Any insight?

Thanks.

- Jon

Re: Question about the message Indexing failed. Rolled back all changes.

2009-11-13 Thread yountod

I'm getting the same thing.  The process runs, seemingly successfully, and I
can even go to other SOLR pages pointing to the same server and pull queries
against the index with these just-added entires.  But the response to the
original import says failed and rollback both through the XML response
and also in the logs.

Why is the process reporting failure and saying it did not commit/rolled
back, when it actually succeeded in importing and indexing?  If it rolled
back, as the logs say, I would expect to not be able to pull those rows out
with new queries against the index.



Avlesh Singh wrote:
 

 But  even after I successfully index data using
 http://host:port/solr-example/dataimport?command=full-importcommit=trueclean=true,
 do solr search which returns meaningful results

 I am not sure what meaningful means. The full-import command starts an
 asynchronous process to start re-indexing. The response that you get in
 return to the above mentioned URL, (always) indicates that a full-import
 has
 been started. It does NOT know about anything that might go wrong with the
 process itself.
 
 and then visit http://host:port/solr-example/dataimport?command=status, I
 can see thefollowing result ...

 The status URL is the one which tells you what is going on with the
 process.
 The message - Indexing failed. Rolled back all changes can come because
 of
 multiple reasons - missing database drivers, incorrect sql queries,
 runtime
 errors in custom transformers etc.
 
 Start the full-import once more. Keep a watch on the Solr server log. If
 you
 can figure out what's going wrong, great; otherwise, copy-paste the
 exception stack-trace from the log file for specific answers.
 
 Cheers
 Avlesh
 
 On Tue, Nov 10, 2009 at 1:32 PM, Bertie Shen bertie.s...@gmail.com
 wrote:
 
 No. I did not check the logs.

 But  even after I successfully index data using
 http://host:port
 /solr-example/dataimport?command=full-importcommit=trueclean=true,
 do solr search which returns meaningful results, and then visit
 http://host:port/solr-example/dataimport?command=status, I can see the
 following result

 response
 -
 lst name=responseHeader
 int name=status0/int
 int name=QTime1/int
 /lst
 -
 lst name=initArgs
 -
 lst name=defaults
 str name=configdata-config.xml/str
 /lst
 /lst
 str name=commandstatus/str
 str name=statusidle/str
 str name=importResponse/
 -
 lst name=statusMessages
 str name=Time Elapsed0:2:11.426/str
 str name=Total Requests made to DataSource584/str
 str name=Total Rows Fetched1538/str
 str name=Total Documents Skipped0/str
 str name=Full Dump Started2009-11-09 23:54:41/str
 *str name=Indexing failed. Rolled back all changes./str*
 str name=Committed2009-11-09 23:54:42/str
 str name=Optimized2009-11-09 23:54:42/str
 str name=Rolledback2009-11-09 23:54:42/str
 /lst
 -
 str name=WARNING
 This response format is experimental.  It is likely to change in the
 future.
 /str
 /response

 On Mon, Nov 9, 2009 at 7:39 AM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

  On Sat, Nov 7, 2009 at 1:10 PM, Bertie Shen bertie.s...@gmail.com
 wrote:
 
  
When I use
   http://localhost:8180/solr/admin/dataimport.jsp?handler=/dataimport
 to
   debug
   the indexing config file, I always see the status message on the
 right
  part
   str name=Indexing failed. Rolled back all changes./str, even
 the
   indexing process looks to be successful. I am not sure whether you
 guys
   have
   seen the same phenomenon or not.  BTW, I usually check the checkbox
 Clean
   and sometimes check Commit box, and then click Debug Now button.
  
  
  Do you see any exceptions in the logs?
 
  --
  Regards,
  Shalin Shekhar Mangar.
 

 
 

-- 
View this message in context: 
http://old.nabble.com/Question-about-the-message-%22Indexing-failed.-Rolled-back-all--changes.%22-tp26242714p26338287.html
Sent from the Solr - User mailing list archive at Nabble.com.



scanning folders recursively / Tika

2009-11-13 Thread Peter Gabriel
Hello.

I am on work with Tika 0.5 and want to scan a folder system about 10GB. 
Is there a comfortable way to scan folders recursively with an existing class 
or have i to write it myself? 

Any tips for best practise?

Greetings, Peter
-- 
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser


Re: scanning folders recursively / Tika

2009-11-13 Thread Glen Newton
Have one thread recursing depth first down the directories  adding to
a queue (fixed size).
Have many threads reading off of the queue and doing the work.

-glen
http://zzzoot.blogspot.com/

2009/11/13 Peter Gabriel zarato...@gmx.net:
 Hello.

 I am on work with Tika 0.5 and want to scan a folder system about 10GB.
 Is there a comfortable way to scan folders recursively with an existing class 
 or have i to write it myself?

 Any tips for best practise?

 Greetings, Peter
 --
 Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
 sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser




-- 

-


Re: Stop solr without losing documents

2009-11-13 Thread Michael
On Fri, Nov 13, 2009 at 4:32 AM, gwk g...@eyefi.nl wrote:
 I don't know if this is the best solution, or even if it's applicable to
 your situation but we do incremental updates from a database based on a
 timestamp, (from a simple seperate sql table filled by triggers so deletes

Thanks, gwk!  This doesn't exactly meet our needs, but helped us get
to a solution.  In short, we are manually committing in our outside
updater process (instead of letting Solr autocommit), and marking
which documents have been updated before a successful commit.  Now
stopping solr is as easy as kill -9.

Michael


how to search against multiple attributes in the index

2009-11-13 Thread javaxmlsoapdev

I want to build AND search query against field1 AND field2 etc. Both these
fields are stored in an index. I am migrating lucene code to Solr. Following
is my existing lucene code

BooleanQuery currentSearchingQuery = new BooleanQuery();

currentSearchingQuery.add(titleDescQuery,Occur.MUST);
highlighter = new Highlighter( new QueryScorer(titleDescQuery));

TermQuery searchTechGroupQyery = new TermQuery(new Term
(techGroup,searchForm.getTechGroup()));
currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST);
TermQuery searchProgramQyery = new TermQuery(new
Term(techProgram,searchForm.getTechProgram()));
currentSearchingQuery.add(searchProgramQyery, Occur.MUST);
}

What's the equivalent Solr code for above Luce code. Any samples would be
appreciated.

Thanks,
-- 
View this message in context: 
http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html
Sent from the Solr - User mailing list archive at Nabble.com.



The status of Local/Geo/Spatial/Distance Solr

2009-11-13 Thread Bertie Shen
Hey,

   I am interested in using LocalSolr to go Local/Geo/Spatial/Distance
search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr)
points to pretty old documentation. Is there a better document I refer to
for the setting up of LocalSolr and some performance analysis?

   Just sync-ed Solr codebase and found LocalSolr is still NOT in the
contrib package. Do we have a plan to incorporate it? I download a LocalSolr
lib localsolr-1.5.jar from
http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and notice
that the namespace is com.pjaol.search. blah blah, while LocalLucene package
is in Lucene codebase and the package name is org.apache.lucene.spatial blah
blah.

   But localsolr-1.5.jar from from
http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/  does not
work with lucene-spatial-3.0-dev.jar I build from Lucene codebase directly.
After I restart tomcat, I could not load solr admin page. The error is as
follows. It looks solr is still looking for
old named classes.

  Thanks.

HTTP Status 500 - Severe errors in solr configuration. Check your log files
for more detailed information on what may be wrong. If you want solr to
continue after configuration errors, change:
abortOnConfigurationErrorfalse/abortOnConfigurationError in null
-
java.lang.NoClassDefFoundError:
com/pjaol/search/geo/utils/DistanceFilter at java.lang.Class.forName0(Native
Method) at java.lang.Class.forName(Class.java:247) at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at
org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at
org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:833) at
org.apache.solr.core.SolrCore.init(SolrCore.java:551) at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:221)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:302)
at
org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:78)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3635)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4222)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760)
at org.apache.catalina.core.ContainerBase.access$0(ContainerBase.java:744)
at
org.apache.catalina.core.ContainerBase$PrivilegedAddChild.run(ContainerBase.java:144)
at java.security.AccessController.doPrivileged(Native Method) at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:738) at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:544) at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:626)
at
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488) at
org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138) at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311)
at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1022) at
org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1014) at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at
org.apache.catalina.core.StandardService.start(StandardService.java:448) at
org.apache.catalina.core.StandardServer.start(StandardServer.java:700) at
org.apache.catalina.startup.Catalina.start(Catalina.java:552) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:177)
Caused by: java.lang.ClassNotFoundException:
com.pjaol.search.geo.utils.DistanceFilter at
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1362)
at

Re: how to search against multiple attributes in the index

2009-11-13 Thread Avlesh Singh
Dive in - http://wiki.apache.org/solr/Solrj

Cheers
Avlesh

On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev vika...@yahoo.com wrote:


 I want to build AND search query against field1 AND field2 etc. Both these
 fields are stored in an index. I am migrating lucene code to Solr.
 Following
 is my existing lucene code

 BooleanQuery currentSearchingQuery = new BooleanQuery();

 currentSearchingQuery.add(titleDescQuery,Occur.MUST);
 highlighter = new Highlighter( new QueryScorer(titleDescQuery));

 TermQuery searchTechGroupQyery = new TermQuery(new Term
 (techGroup,searchForm.getTechGroup()));
currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST);
 TermQuery searchProgramQyery = new TermQuery(new
 Term(techProgram,searchForm.getTechProgram()));
currentSearchingQuery.add(searchProgramQyery, Occur.MUST);
 }

 What's the equivalent Solr code for above Luce code. Any samples would be
 appreciated.

 Thanks,
 --
 View this message in context:
 http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: The status of Local/Geo/Spatial/Distance Solr

2009-11-13 Thread Ryan McKinley

It looks like solr+spatial will get some attention in 1.5, check:
https://issues.apache.org/jira/browse/SOLR-1561

Depending on your needs, that may be enough.  More robust/scaleable  
solutions will hopefully work their way into 1.5 (any help is always  
appreciated!)



On Nov 13, 2009, at 11:12 AM, Bertie Shen wrote:


Hey,

  I am interested in using LocalSolr to go Local/Geo/Spatial/Distance
search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr 
)
points to pretty old documentation. Is there a better document I  
refer to

for the setting up of LocalSolr and some performance analysis?

  Just sync-ed Solr codebase and found LocalSolr is still NOT in the
contrib package. Do we have a plan to incorporate it? I download a  
LocalSolr

lib localsolr-1.5.jar from
http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and  
notice
that the namespace is com.pjaol.search. blah blah, while LocalLucene  
package
is in Lucene codebase and the package name is  
org.apache.lucene.spatial blah

blah.

  But localsolr-1.5.jar from from
http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/   
does not
work with lucene-spatial-3.0-dev.jar I build from Lucene codebase  
directly.
After I restart tomcat, I could not load solr admin page. The error  
is as

follows. It looks solr is still looking for
old named classes.

 Thanks.

HTTP Status 500 - Severe errors in solr configuration. Check your  
log files
for more detailed information on what may be wrong. If you want solr  
to

continue after configuration errors, change:
abortOnConfigurationErrorfalse/abortOnConfigurationError in null
-
java.lang.NoClassDefFoundError:
com/pjaol/search/geo/utils/DistanceFilter at  
java.lang.Class.forName0(Native

Method) at java.lang.Class.forName(Class.java:247) at
org 
.apache 
.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357)

at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at
org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at
org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java: 
833) at

org.apache.solr.core.SolrCore.init(SolrCore.java:551) at
org.apache.solr.core.CoreContainer 
$Initializer.initialize(CoreContainer.java:137)

at
org 
.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 
83)

at
org 
.apache 
.catalina 
.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java: 
221)

at
org 
.apache 
.catalina 
.core 
.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java: 
302)

at
org 
.apache 
.catalina 
.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:78)

at
org 
.apache 
.catalina.core.StandardContext.filterStart(StandardContext.java:3635)
at  
org.apache.catalina.core.StandardContext.start(StandardContext.java: 
4222)

at
org 
.apache 
.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760)
at org.apache.catalina.core.ContainerBase.access 
$0(ContainerBase.java:744)

at
org.apache.catalina.core.ContainerBase 
$PrivilegedAddChild.run(ContainerBase.java:144)

at java.security.AccessController.doPrivileged(Native Method) at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java: 
738) at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java: 
544) at
org 
.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java: 
626)

at
org 
.apache 
.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java: 
488) at

org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138) at
org 
.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java: 
311)

at
org 
.apache 
.catalina 
.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 
1022) at

org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 
1014) at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java: 
443) at
org.apache.catalina.core.StandardService.start(StandardService.java: 
448) at
org.apache.catalina.core.StandardServer.start(StandardServer.java: 
700) at

org.apache.catalina.startup.Catalina.start(Catalina.java:552) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun 
.reflect 
.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at
sun 
.reflect 
.DelegatingMethodAccessorImpl 
.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun 
.reflect 

Re: The status of Local/Geo/Spatial/Distance Solr

2009-11-13 Thread Ryan McKinley

Also:
https://issues.apache.org/jira/browse/SOLR-1302


On Nov 13, 2009, at 11:12 AM, Bertie Shen wrote:


Hey,

  I am interested in using LocalSolr to go Local/Geo/Spatial/Distance
search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr 
)
points to pretty old documentation. Is there a better document I  
refer to

for the setting up of LocalSolr and some performance analysis?

  Just sync-ed Solr codebase and found LocalSolr is still NOT in the
contrib package. Do we have a plan to incorporate it? I download a  
LocalSolr

lib localsolr-1.5.jar from
http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and  
notice
that the namespace is com.pjaol.search. blah blah, while LocalLucene  
package
is in Lucene codebase and the package name is  
org.apache.lucene.spatial blah

blah.

  But localsolr-1.5.jar from from
http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/   
does not
work with lucene-spatial-3.0-dev.jar I build from Lucene codebase  
directly.
After I restart tomcat, I could not load solr admin page. The error  
is as

follows. It looks solr is still looking for
old named classes.

 Thanks.

HTTP Status 500 - Severe errors in solr configuration. Check your  
log files
for more detailed information on what may be wrong. If you want solr  
to

continue after configuration errors, change:
abortOnConfigurationErrorfalse/abortOnConfigurationError in null
-
java.lang.NoClassDefFoundError:
com/pjaol/search/geo/utils/DistanceFilter at  
java.lang.Class.forName0(Native

Method) at java.lang.Class.forName(Class.java:247) at
org 
.apache 
.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357)

at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at
org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at
org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java: 
833) at

org.apache.solr.core.SolrCore.init(SolrCore.java:551) at
org.apache.solr.core.CoreContainer 
$Initializer.initialize(CoreContainer.java:137)

at
org 
.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 
83)

at
org 
.apache 
.catalina 
.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java: 
221)

at
org 
.apache 
.catalina 
.core 
.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java: 
302)

at
org 
.apache 
.catalina 
.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:78)

at
org 
.apache 
.catalina.core.StandardContext.filterStart(StandardContext.java:3635)
at  
org.apache.catalina.core.StandardContext.start(StandardContext.java: 
4222)

at
org 
.apache 
.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760)
at org.apache.catalina.core.ContainerBase.access 
$0(ContainerBase.java:744)

at
org.apache.catalina.core.ContainerBase 
$PrivilegedAddChild.run(ContainerBase.java:144)

at java.security.AccessController.doPrivileged(Native Method) at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java: 
738) at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java: 
544) at
org 
.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java: 
626)

at
org 
.apache 
.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java: 
488) at

org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138) at
org 
.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java: 
311)

at
org 
.apache 
.catalina 
.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 
1022) at

org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 
1014) at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java: 
443) at
org.apache.catalina.core.StandardService.start(StandardService.java: 
448) at
org.apache.catalina.core.StandardServer.start(StandardServer.java: 
700) at

org.apache.catalina.startup.Catalina.start(Catalina.java:552) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun 
.reflect 
.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at
sun 
.reflect 
.DelegatingMethodAccessorImpl 
.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun 
.reflect 
.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at
sun 
.reflect 
.DelegatingMethodAccessorImpl 
.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597) 

Obtaining list of dynamic fields beind available in index

2009-11-13 Thread Eugene Dzhurinsky
Hi there!

How can we retrieve the complete list of dynamic fields, which are currently
available in index?

Thank you in advance!
-- 
Eugene N Dzhurinsky


pgpKftn1PiY0K.pgp
Description: PGP signature


Re: how to search against multiple attributes in the index

2009-11-13 Thread Avlesh Singh
For a starting point, this might be a good read -
http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query

Cheers
Avlesh

On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev vika...@yahoo.com wrote:


 I already did  dive in before. I am using solrj API and SolrQuery object to
 build query. but its not clear/written how to build booleanQuery ANDing
 bunch of different attributes in the index. Any samples please?

 Avlesh Singh wrote:
 
  Dive in - http://wiki.apache.org/solr/Solrj
 
  Cheers
  Avlesh
 
  On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev vika...@yahoo.com
 wrote:
 
 
  I want to build AND search query against field1 AND field2 etc. Both
  these
  fields are stored in an index. I am migrating lucene code to Solr.
  Following
  is my existing lucene code
 
  BooleanQuery currentSearchingQuery = new BooleanQuery();
 
  currentSearchingQuery.add(titleDescQuery,Occur.MUST);
  highlighter = new Highlighter( new QueryScorer(titleDescQuery));
 
  TermQuery searchTechGroupQyery = new TermQuery(new Term
  (techGroup,searchForm.getTechGroup()));
 currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST);
  TermQuery searchProgramQyery = new TermQuery(new
  Term(techProgram,searchForm.getTechProgram()));
 currentSearchingQuery.add(searchProgramQyery, Occur.MUST);
  }
 
  What's the equivalent Solr code for above Luce code. Any samples would
 be
  appreciated.
 
  Thanks,
  --
  View this message in context:
 
 http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Return doc if one or more query keywords occur multiple times

2009-11-13 Thread gistolero
Anyone?

 Original-Nachricht 
 Datum: Thu, 12 Nov 2009 13:29:20 +0100
 Von: gistol...@gmx.de
 An: solr-user@lucene.apache.org
 Betreff: Return doc if one or more query keywords occur multiple times

 Hello,
 
 I am using Dismax request handler for queries:
 
 ...select?q=foo bar foo2 bar2qt=dismaxmm=2...
 
 With parameter mm=2 I configure that at least 2 of the optional clauses
 must match, regardless of how many clauses there are.
 
 But now I want change this to the following:
 
 List all documents that have at least 2 of the optional clauses OR that
 have at least one of the query terms (e.g. foo) more than once.
 
 Is this possible?
 Thanks,
 Gisto
 
 -- 
 DSL-Preisknaller: DSL Komplettpakete von GMX schon für 
 16,99 Euro mtl.!* Hier klicken: http://portal.gmx.net/de/go/dsl02

-- 
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/chbrowser


Re: Obtaining list of dynamic fields beind available in index

2009-11-13 Thread Avlesh Singh
Luke Handler? - http://wiki.apache.org/solr/LukeRequestHandler
/admin/luke?numTerms=0

Cheers
Avlesh

On Fri, Nov 13, 2009 at 10:05 PM, Eugene Dzhurinsky b...@redwerk.comwrote:

 Hi there!

 How can we retrieve the complete list of dynamic fields, which are
 currently
 available in index?

 Thank you in advance!
 --
 Eugene N Dzhurinsky



Re: how to search against multiple attributes in the index

2009-11-13 Thread javaxmlsoapdev

I think I found the answer. needed to read more API documentation :-)

you can do it using 
solrQuery.setFilterQueries() and build AND queries of multiple parameters.


Avlesh Singh wrote:
 
 For a starting point, this might be a good read -
 http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query
 
 Cheers
 Avlesh
 
 On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev vika...@yahoo.com
 wrote:
 

 I already did  dive in before. I am using solrj API and SolrQuery object
 to
 build query. but its not clear/written how to build booleanQuery ANDing
 bunch of different attributes in the index. Any samples please?

 Avlesh Singh wrote:
 
  Dive in - http://wiki.apache.org/solr/Solrj
 
  Cheers
  Avlesh
 
  On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev vika...@yahoo.com
 wrote:
 
 
  I want to build AND search query against field1 AND field2 etc. Both
  these
  fields are stored in an index. I am migrating lucene code to Solr.
  Following
  is my existing lucene code
 
  BooleanQuery currentSearchingQuery = new BooleanQuery();
 
  currentSearchingQuery.add(titleDescQuery,Occur.MUST);
  highlighter = new Highlighter( new QueryScorer(titleDescQuery));
 
  TermQuery searchTechGroupQyery = new TermQuery(new Term
  (techGroup,searchForm.getTechGroup()));
 currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST);
  TermQuery searchProgramQyery = new TermQuery(new
  Term(techProgram,searchForm.getTechProgram()));
 currentSearchingQuery.add(searchProgramQyery, Occur.MUST);
  }
 
  What's the equivalent Solr code for above Luce code. Any samples would
 be
  appreciated.
 
  Thanks,
  --
  View this message in context:
 
 http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339903.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: how to search against multiple attributes in the index

2009-11-13 Thread Avlesh Singh

 you can do it using
 solrQuery.setFilterQueries() and build AND queries of multiple parameters.

Nope. You would need to read more -
http://wiki.apache.org/solr/FilterQueryGuidance

For your impatience, here's a quick starter -

#and between two fields
solrQuery.setQuery(+field1:foo +field2:bar);

#or between two fields
solrQuery.setQuery(field1:foo field2:bar);

Cheers
Avlesh

On Fri, Nov 13, 2009 at 10:35 PM, javaxmlsoapdev vika...@yahoo.com wrote:


 I think I found the answer. needed to read more API documentation :-)

 you can do it using
 solrQuery.setFilterQueries() and build AND queries of multiple parameters.


 Avlesh Singh wrote:
 
  For a starting point, this might be a good read -
 
 http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query
 
  Cheers
  Avlesh
 
  On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev vika...@yahoo.com
  wrote:
 
 
  I already did  dive in before. I am using solrj API and SolrQuery object
  to
  build query. but its not clear/written how to build booleanQuery ANDing
  bunch of different attributes in the index. Any samples please?
 
  Avlesh Singh wrote:
  
   Dive in - http://wiki.apache.org/solr/Solrj
  
   Cheers
   Avlesh
  
   On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev vika...@yahoo.com
  wrote:
  
  
   I want to build AND search query against field1 AND field2 etc. Both
   these
   fields are stored in an index. I am migrating lucene code to Solr.
   Following
   is my existing lucene code
  
   BooleanQuery currentSearchingQuery = new BooleanQuery();
  
   currentSearchingQuery.add(titleDescQuery,Occur.MUST);
   highlighter = new Highlighter( new QueryScorer(titleDescQuery));
  
   TermQuery searchTechGroupQyery = new TermQuery(new Term
   (techGroup,searchForm.getTechGroup()));
  currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST);
   TermQuery searchProgramQyery = new TermQuery(new
   Term(techProgram,searchForm.getTechProgram()));
  currentSearchingQuery.add(searchProgramQyery, Occur.MUST);
   }
  
   What's the equivalent Solr code for above Luce code. Any samples
 would
  be
   appreciated.
  
   Thanks,
   --
   View this message in context:
  
 
 http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  
  
  
  
 
  --
  View this message in context:
 
 http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339903.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Reseting doc boosts

2009-11-13 Thread Avlesh Singh
AFAIK there is no way to reset the doc boost. You would need to re-index.
Moreover, there is no way to search by boost.

Cheers
Avlesh

On Fri, Nov 13, 2009 at 8:17 PM, Jon Baer jonb...@gmail.com wrote:

 Hi,

 Im trying to figure out if there is an easy way to basically reset all of
 any doc boosts which you have made (for analytical purposes) ... for example
 if I run an index, gather report, doc boost on the report, and reset the
 boosts @ time of next index ...

 It would seem to be from just knowing how Lucene works that I would really
 need to reindex since its a attrib on the doc itself which would have to be
 modified, but there is no easy way to query for docs which have been boosted
 either.  Any insight?

 Thanks.

 - Jon


Re: The status of Local/Geo/Spatial/Distance Solr

2009-11-13 Thread Bertie Shen
Hi Ian and Ryan,

  Thanks for the reply.

  Ian, I checked your pasted config, I am using the same one except the
values of int name=startTier4/int int name=endTier25/int.
Basically I use the set up specified at http://www.gissearch.com/localsolr.
 But there are still the same error I pasted in previous email.

  Ryan, I just checked out the lib lucene-spatial-2.9.1.jar Grant checked in
today.  Previously I built lucene-spatial-3.0-dev.jar from Lucene java code
base directly. There is still no luck after the lib replacement.  I do not
think other lib matters in this case.





On Fri, Nov 13, 2009 at 8:34 AM, Ian Ibbotson iani...@googlemail.comwrote:

 Heya.. could it be a problem with your solr config files? I seem to
 recall a change from the docs as they were to get this working.. I
 have...

  updateRequestProcessorChain
  processor
 class=com.pjaol.search.solr.update.LocalUpdateProcessorFactory
str name=latFieldlat/str
str name=lngFieldlng/str
int name=startTier4/int
int name=endTier25/int
  /processor
   processor class=solr.RunUpdateProcessorFactory /
   processor class=solr.LogUpdateProcessorFactory /
  /updateRequestProcessorChain

  searchComponent name=localsolr
 class=com.pjaol.search.solr.component.LocalSolrQueryComponent /
  requestHandler name=geo
 class=org.apache.solr.handler.component.SearchHandler
arr name=components
  strlocalsolr/str
  strfacet/str
  strmlt/str
  strhighlight/str
  strdebug/str
/arr
  /requestHandler

 That tie up with your config/ I'd bascially interpreted the current
 packaging as... What used to be locallucene has deffo merged into
 lucene-spatial in this build, no more locallucene. However, you still
 need to build localsolr for now...

 My solr jars are:

 commons-beanutils-1.8.0.jar   commons-logging-1.1.1.jar
 localsolr-1.5.2-rc1.jar  lucene-misc-2.9.1-ki-rc3.jar
serializer-2.7.1.jar   stax-1.2.0.jar
  xml-apis-1.3.04.jar
 commons-codec-1.4.jar commons-pool-1.5.3.jar
 log4j-1.2.13.jar lucene-queries-2.9.1-ki-rc3.jar
slf4j-api-1.5.5.jarstax-api-1.0.jar
  xpp3-1.1.3.4.O.jar
 commons-dbcp-1.2.2.jargeoapi-nogenerics-2.1M2.jar
 lucene-analyzers-2.9.1-ki-rc3.jarlucene-snowball-2.9.1-ki-rc3.jar
slf4j-log4j12-1.5.5.jarstax-utils-20040917.jar
 commons-fileupload-1.2.1.jar  geronimo-stax-api_1.0_spec-1.0.1.jar
 lucene-core-2.9.1-ki-rc3.jar lucene-spatial-2.9.1-ki-rc3.jar
solr-commons-csv-1.4.0-ki-rc1.jar  woodstox-wstx-asl-3.2.7.jar
 commons-httpclient-3.1.jargt2-referencing-2.3.1.jar
 lucene-highlighter-2.9.1-ki-rc3.jar
 lucene-spellchecker-2.9.1-ki-rc3.jar  solr-core-1.4.0-ki-rc1.jar
  xalan-2.7.1.jar
 commons-io-1.3.2.jar  jsr108-0.01.jar
 lucene-memory-2.9.1-ki-rc3.jar
 org.codehaus.woodstox-wstx-asl-3.2.7.jar  solr-solrj-1.4.0-ki-rc1.jar
  xercesImpl-2.9.1.jar

 Sorry for dumping the info at you... hope it helps tho

 Ian.

 2009/11/13 Bertie Shen bertie.s...@gmail.com:
  Hey,
 
I am interested in using LocalSolr to go Local/Geo/Spatial/Distance
  search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr)
  points to pretty old documentation. Is there a better document I refer to
  for the setting up of LocalSolr and some performance analysis?
 
Just sync-ed Solr codebase and found LocalSolr is still NOT in the
  contrib package. Do we have a plan to incorporate it? I download a
 LocalSolr
  lib localsolr-1.5.jar from
  http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and
 notice
  that the namespace is com.pjaol.search. blah blah, while LocalLucene
 package
  is in Lucene codebase and the package name is org.apache.lucene.spatial
 blah
  blah.
 
But localsolr-1.5.jar from from
  http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/  does
 not
  work with lucene-spatial-3.0-dev.jar I build from Lucene codebase
 directly.
  After I restart tomcat, I could not load solr admin page. The error is as
  follows. It looks solr is still looking for
  old named classes.
 
   Thanks.
 
  HTTP Status 500 - Severe errors in solr configuration. Check your log
 files
  for more detailed information on what may be wrong. If you want solr to
  continue after configuration errors, change:
  abortOnConfigurationErrorfalse/abortOnConfigurationError in null
  -
  java.lang.NoClassDefFoundError:
  com/pjaol/search/geo/utils/DistanceFilter at
 java.lang.Class.forName0(Native
  Method) at java.lang.Class.forName(Class.java:247) at
 
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357)
  at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at
  org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at
  org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at
  

Re: Question about the message Indexing failed. Rolled back all changes.

2009-11-13 Thread yountod

The process initially completes with:

  str name=Full Dump Started2009-11-13 09:40:46/str 
  str name=Indexing completed. Added/Updated: 20 documents. Deleted
0 documents./str 


...but then it fails with:

  str name=Full Dump Started2009-11-13 09:40:46/str 
  str name=Indexing failed. Rolled back all changes./str 
  str name=Committed2009-11-13 09:41:10/str 
  str name=Optimized2009-11-13 09:41:10/str 
  str name=Rolledback2009-11-13 09:41:10/str 



I think it may have something to do with this, which I found by using the
DataImport.jsp:

(Thread.java:636) Caused by: java.sql.SQLException: Illegal value for
setFetchSize(). at
com.mysql.jdbc.Statement.setFetchSize(Statement.java:1864) at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:242)
... 28 more/str 



-- 
View this message in context: 
http://old.nabble.com/Question-about-the-message-%22Indexing-failed.-Rolled-back-all--changes.%22-tp26242714p26340360.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: how to search against multiple attributes in the index

2009-11-13 Thread javaxmlsoapdev

great. thanks. that was helpful

Avlesh Singh wrote:
 

 you can do it using
 solrQuery.setFilterQueries() and build AND queries of multiple
 parameters.

 Nope. You would need to read more -
 http://wiki.apache.org/solr/FilterQueryGuidance
 
 For your impatience, here's a quick starter -
 
 #and between two fields
 solrQuery.setQuery(+field1:foo +field2:bar);
 
 #or between two fields
 solrQuery.setQuery(field1:foo field2:bar);
 
 Cheers
 Avlesh
 
 On Fri, Nov 13, 2009 at 10:35 PM, javaxmlsoapdev vika...@yahoo.com
 wrote:
 

 I think I found the answer. needed to read more API documentation :-)

 you can do it using
 solrQuery.setFilterQueries() and build AND queries of multiple
 parameters.


 Avlesh Singh wrote:
 
  For a starting point, this might be a good read -
 
 http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query
 
  Cheers
  Avlesh
 
  On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev vika...@yahoo.com
  wrote:
 
 
  I already did  dive in before. I am using solrj API and SolrQuery
 object
  to
  build query. but its not clear/written how to build booleanQuery
 ANDing
  bunch of different attributes in the index. Any samples please?
 
  Avlesh Singh wrote:
  
   Dive in - http://wiki.apache.org/solr/Solrj
  
   Cheers
   Avlesh
  
   On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev vika...@yahoo.com
  wrote:
  
  
   I want to build AND search query against field1 AND field2 etc.
 Both
   these
   fields are stored in an index. I am migrating lucene code to Solr.
   Following
   is my existing lucene code
  
   BooleanQuery currentSearchingQuery = new BooleanQuery();
  
   currentSearchingQuery.add(titleDescQuery,Occur.MUST);
   highlighter = new Highlighter( new QueryScorer(titleDescQuery));
  
   TermQuery searchTechGroupQyery = new TermQuery(new Term
   (techGroup,searchForm.getTechGroup()));
  currentSearchingQuery.add(searchTechGroupQyery, Occur.MUST);
   TermQuery searchProgramQyery = new TermQuery(new
   Term(techProgram,searchForm.getTechProgram()));
  currentSearchingQuery.add(searchProgramQyery, Occur.MUST);
   }
  
   What's the equivalent Solr code for above Luce code. Any samples
 would
  be
   appreciated.
  
   Thanks,
   --
   View this message in context:
  
 
 http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339025.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  
  
  
  
 
  --
  View this message in context:
 
 http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339402.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26339903.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://old.nabble.com/how-to-search-against-multiple-attributes-in-the-index-tp26339025p26340776.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: scanning folders recursively / Tika

2009-11-13 Thread Otis Gospodnetic
Peter - if you want, download the code from Lucene in Action 1 or 2, it has 
index traversal and indexing.  2nd edition uses Tika.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: Peter Gabriel zarato...@gmx.net
 To: solr-user@lucene.apache.org
 Sent: Fri, November 13, 2009 10:26:48 AM
 Subject: scanning folders recursively / Tika
 
 Hello.
 
 I am on work with Tika 0.5 and want to scan a folder system about 10GB. 
 Is there a comfortable way to scan folders recursively with an existing class 
 or 
 have i to write it myself? 
 
 Any tips for best practise?
 
 Greetings, Peter
 -- 
 Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
 sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser



Re: Customizing Field Score (Multivalued Field)

2009-11-13 Thread Stephen Duncan Jr
On Thu, Nov 12, 2009 at 3:00 PM, Stephen Duncan Jr stephen.dun...@gmail.com
 wrote:

 On Thu, Nov 12, 2009 at 2:54 PM, Chris Hostetter hossman_luc...@fucit.org
  wrote:


 oh man, so you were parsing the Stored field values of every matching doc
 at query time? ouch.

 Assuming i'm understanding your goal, the conventional way to solve this
 type of problem is payloads ... you'll find lots of discussion on it in
 the various Lucene mailing lists, and if you look online Michael Busch has
 various slides that talk about using them.  they let you say things
 like in this document, at this postion of field 'x' the word 'microsoft'
 is worth 37.4, but at this other position (or in this other document)
 'microsoft' is only worth 17.2

 The simplest way to use them in Solr (as i understand it) is to use
 soemthing like the DelimitedPayloadTokenFilterFactory when indexing, and
 then write yourself
 a simple little custom QParser that generates a BoostingTermQuery on your
 field.

 should be a lot simpler to implement then the Query you are describing,
 and much faster.


 -Hoss


 Thanks. I finally got around to looking at this again today and was looking
 at a similar path, so I appreciate the confirmation.


 --
 Stephen Duncan Jr
 www.stephenduncanjr.com


For posterity, here's the rest of what I discovered trying to implement
this:

You'll need to write a PayloadSimilarity as described here:
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/(here's
my updated version due to deprecation of the method mentioned in
that article):

@Override
public float scorePayload(
int docId,
String fieldName,
int start,
int end,
byte[] payload,
int offset,
int length)
{
// can ignore length here, because we know it is encoded as 4 bytes
return PayloadHelper.decodeFloat(payload, offset);
}

You'll need to register that similarity in your Solr schema.xml (was hard to
figure out, as I didn't realize that the similarity has to be applied
globally to the writer/search used generally, even though I only care about
payloads on one field, so I wasted time trying to figure out how to plug in
the similarity in my query parser).

You'll want to use the payloads type or something based on it that's in
the example schema.xml.

The latest and greatest query type to use is PayloadTermQuery.  I use it in
my custom query parser class, overriding getFieldQuery, checking for my
field name, and then:

 return new PayloadTermQuery(new Term(field, queryText),
new AveragePayloadFunction());

Due to the global nature of the Similarity, I guess you'd have to modify it
to look at the field name and base behavior on that if you wanted different
kinds of payloads on different fields in one schema.

Also, whereas in my original implementation, I controlled the score
completely, and therefore if I set a score of 0.8, the doc came back as
score of 0.8, in this technique the payload is just used as a boost/addition
to the score, so my scores came out higher than before.  Since they're still
in the same relative order, that still satisfied my needs, but did require
updating my test cases.

-- 
Stephen Duncan Jr
www.stephenduncanjr.com


Making search results more stable as index is updated

2009-11-13 Thread Chris Harris
If documents are being added to and removed from an index (and commits
are being issued) while a user is searching, then the experience of
paging through search results using the obvious solr mechanism
(start=100Rows=10) may be disorienting for the user. For one
example, by the time the user clicks next page for the first time, a
document that they saw on page 1 may have been pushed onto page 2.
(This may be especially pronounced if docs are being sorted by date.)

I'm wondering what are the best options available for presenting a
more stable set of search results to users in such cases. The obvious
candidates to me are:

#1: Cache results in the user session of the web tier. (In particular,
maybe just cache the uniqueKey of each maching document.)

  Pro: Simple
  Con: May require capping the # of search results in order to make
the initial query (which now has Solr numRows param  web pageSize)
fast enough. For example, maybe it's only practical to cache the first
500 records.

#2: Create some kind of per-user results cache in Solr. (One simple
implementation idea: You could make your Solr search handler take a
userid parameter, and cache each user's last search in a special
per-user results cache. You then also provide an API that says, give
me records n through m of userid #1334's last search. For your
subsequent queries, you consult the latter API rather than redoing
your search. Because Lucene docids are unstable across commits and
such, I think this means caching the uniqueKey of each maching
document. This in turn means looking up the uniqueKey of each maching
document at search time. It also means you can't use the existing Solr
caches, but need to make a new one.)

  Pro: Maybe faster than #1?? (Saves on data transfer between Solr and
web tier, at least during the initial query.)
  Con: More complicated than #1.

#3: Use filter queries to attempt to make your subsequent queries (for
page 2, page 3, etc.) return results consistent with your original
query. (One idea is to give each document a docAddedTimestamp field,
which would have precision down to the millisecond or something. On
your initial query, you could note the current time, T. Then for the
subsequent queries you add a filter query for docAddedTimestamp=T.
Hopefully with a trie date field this would be fast. This should
hopefully keep any docs newly added after T from showing up in the
user's search results as they page through them. However, it won't
necessarily protect you from docs that were *reindexed* (i.e. re-add a
doc with the same uniqueKey as an existing doc) or docs that were
deleted.)

  Pro: Doesn't require a new cache, and no cap on # of search results
  Con: Maybe doesn't provide total stability.

Any feedback on these options? Are there other ideas to consider?

Thanks,
Chris


Re: having solr generate and execute other related queries automatically

2009-11-13 Thread gdeconto


tpunder wrote:
 
 Maybe I misunderstand what you are trying to do (or the facet.query
 feature).  If I did an initial query on my data-set that left me with the
 following questions:
 ...
 http://localhost:8983/solr/select/?q=*%3A*start=0rows=0facet=onfacet.query=brand_id:1facet.query=brand_id:2facet.query=+%2Bbrand_id:5+%2Bcategory_id:4051
 ...
 

Thanks for the reply Tim.

I can't provide you with an example as I dont have anything prototyped as
yet; I am still trying to work things thru in my head.  The +20 queries
would allow us to suggest other possibilities to users in a facet-like way
(but not returning the exact same info as facets).

With the technique you mention I would have to specify the list of query
params for each facet.query.  That would work for relatively simple queries. 
Unfortunately, the queries I was looking at doing would be fairly long (say
hundreds of AND/OR statements).   That said, I dont think solr would be able
to handle the query size I would end up with (at least not efficiently),
because the resulting query would consist of thousands of AND/OR statements
(isnt there a limit of sorts in Solr?)

I think that my best bet would be to extend the SearchComponent and perform
the additional query generation and execution in the extension.  That
approach should also allow me to have access to the facet values that the
base query would generate (which would allow me to generate and execute the
other queries).

thx again.
-- 
View this message in context: 
http://old.nabble.com/having-solr-generate-and-execute-other-related-queries-automatically-tp26327032p26343409.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Multicore solr.xml schemaName parameter not being recognized

2009-11-13 Thread Chris Hostetter

: On the CoreAdmin wiki page.  thanks 

FWIW: The only time the string schemaName appears on the CoreAdmin wiki 
page is when it mentions that solr.core.schemaName is a property that is 
available to cores by default.

the documentation for core specificly says...

 The core tag accepts the following attributes:
   ...
  * schema - The schema file name for a given core. The default is 
   ...

So the documentation is correct.


-Hoss



Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Jerome L Quinn

Mark Miller markrmil...@gmail.com wrote on 11/12/2009 07:18:03 PM:
 Ah, the pains of optimization. Its kind of just how it is. One solution
 is to use two boxes and replication - optimize on the master, and then
 queries only hit the slave. Out of reach for some though, and adds many
 complications.

Yes, in my use case 2 boxes isn't a great option.


 Another kind of option is to use the partial optimize feature:

  optimize maxOptimizeSegments=5/

 Using this, you can optimize down to n segments and take a shorter hit
 each time.

Is this a 1.4 feature?  I'm planning to migrate to 1.4, but it'll take a
while since
I have to port custom code forward, including a query parser.


 Also, if optimizing is so painful, you might lower the merge factor
 amortize that pain better. Thats another way to slowly get there - if
 you lower the merge factor, as merging takes place, the new merge factor
 will be respected, and semgents will merge down. A merge factor of 2
 (the lowest) will make it so you only ever have 2 segments. Sometimes
 that works reasonably well - you could try 3-6 or something as well.
 Then when you do your partial optimizes (and eventually a full optimize
 perhaps), you want have so far to go.

So this will slow down indexing but speed up optimize somewhat?
Unfortunately
right now I lose docs I'm indexing, as well slowing searching to a crawl.
Ugh.

I've got plenty of CPU horsepower.  This is where having the ability to
optimize
on another filesystem would be useful.

Would it perhaps make sense to set up a master/slave on the same machine?
Then
I suppose I can have an index being optimized that might not clobber the
search.
Would new indexed items still be dropped on the floor?

Thanks,
Jerry

Re: Stop solr without losing documents

2009-11-13 Thread Chris Hostetter

: which documents have been updated before a successful commit.  Now
: stopping solr is as easy as kill -9.

please don't kill -9 ... it's grossly overkill, and doesn't give your 
servlet container a fair chance to cleanthings up.  A lot of work has been 
done to make Lucene indexes robust to hard terminations of the JVM (or 
physical machine) but there's no reason to go out of your way to try and 
stab it in the heart when you could just shut it down cleanly.

that's not to say your appraoch isn't a good one -- if you only have one 
client sending updates/commits then having it keep track of what was 
indexed prior to the lasts successful commit is a viable way to dela with 
what happens if solr stops responding (either because you shut it down, or 
because it crashed for some other reason).

Alternately, you could take advantage of the enabled feature from your 
client (just have it test the enabled url ever N updates or so) and when 
it sees that you have disabled the port it can send one last commit and 
then stop sending updates until it sees the enabled URL work againg -- as 
soon as you see the updates stop, you can safely shutdown hte port.


-Hoss



Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Jerome L Quinn
ysee...@gmail.com wrote on 11/13/2009 09:06:29 AM:

 On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
  I think we sorely need a Directory impl that down-prioritizes IO
  performed by merging.

 It's unclear if this case is caused by IO contention, or the OS cache
 of the hot parts of the index being lost by that extra IO activity.
 Of course the latter would lead to the former, but without that OS
 disk cache, the searches may be too slow even w/o the extra IO.

Is there a way to configure things so that search and new data indexing
get cached under the control of solr/lucene?  Then we'd be less reliant
on the OS behavior.

Alternatively if there are OS params I can tweak (RHEL/Centos 5)
to solve the problem, that's an option for me.

Would you know if 1.4 is better behaved than 1.3?

Thanks,
Jerry

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Jerome L Quinn
ysee...@gmail.com wrote on 11/13/2009 09:06:29 AM:

 On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
  I think we sorely need a Directory impl that down-prioritizes IO
  performed by merging.

 It's unclear if this case is caused by IO contention, or the OS cache
 of the hot parts of the index being lost by that extra IO activity.
 Of course the latter would lead to the former, but without that OS
 disk cache, the searches may be too slow even w/o the extra IO.

On linux there's the ionice command to try to throttle processes.  Would it
be possible and make sense to have a separate process for optimizing that
had ionice set it to idle?  Can the index be shared this way?

Thanks,
Jerry

Re: NPE when trying to view a specific document via Luke

2009-11-13 Thread Chris Hostetter

: I'm seeing this stack trace when I try to view a specific document, e.g. 
: /admin/luke?id=1 but luke appears to be working correctly when I just 

FWIW: I was able to reproduce this using the example setup (i picked a 
doc id at random)  suspecting it was a bug in docFreq when using multiple 
segments, i tried optimizing and still got an NPE, but then my entire 
computer crashed (unrelated) before i could look any deeper.

I have to go out now, but i'll try to dig into this more when i get back 
... given where it happens in the code, it seems like a potentially 
serious lucene bug (either that: or LukeRequestHandler is doing something 
it really shouldn't be, but i can't imagine how it could trigger an NPE 
that deep in the lucene code)



: view /admin/luke. Does this look familiar to anyone? Our sysadmin just 
: upgraded us to the 1.4 release, I'm not sure if this occurred before 
: that.
: 
: Thanks,
: Jake
: 
: 1. java.lang.NullPointerException
: 2. at org.apache.lucene.index.TermBuffer.set(TermBuffer.java:95)
: 3. at 
org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:158)
: 4. at 
org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:232)
: 5. at 
org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:179)
: 6. at 
org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:975)
: 7. at 
org.apache.lucene.index.DirectoryReader.docFreq(DirectoryReader.java:627)
: 8. at 
org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:308)
: 9. at 
org.apache.solr.handler.admin.LukeRequestHandler.getDocumentFieldsInfo(LukeRequestHandler.java:248)
: 10.at 
org.apache.solr.handler.admin.LukeRequestHandler.handleRequestBody(LukeRequestHandler.java:124)
: 11.at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
: 12.at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
: 13.at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
: 14.at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
: 15.at 
com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:76)
: 16.at 
com.caucho.server.cache.CacheFilterChain.doFilter(CacheFilterChain.java:158)
: 17.at 
com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:178)
: 18.at 
com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:241)
: 19.at 
com.caucho.server.hmux.HmuxRequest.handleRequest(HmuxRequest.java:435)
: 20.at com.caucho.server.port.TcpConnection.run(TcpConnection.java:586)
: 21.at com.caucho.util.ThreadPool$Item.runTasks(ThreadPool.java:690)
: 22.at com.caucho.util.ThreadPool$Item.run(ThreadPool.java:612)
: 23.at java.lang.Thread.run(Thread.java:619)
: 24.
: 25. Date: Fri, 13 Nov 2009 02:19:54 GMT
: 26. Server: Apache/2.2.3 (Red Hat)
: 27. Cache-Control: no-cache, no-store
: 28. Pragma: no-cache
: 29. Expires: Sat, 01 Jan 2000 01:00:00 GMT
: 30. Content-Type: text/html; charset=UTF-8
: 31. Vary: Accept-Encoding,User-Agent
: 32. Content-Encoding: gzip
: 33. Content-Length: 1066
: 34. Connection: close
: 35.
: 



-Hoss



Re: Request assistance with distributed search multi shard/core setup and configuration

2009-11-13 Thread Lance Norskog
DS requires a bunch of shard names in the url. That's all. Note that a
ds does not use the data of the solr you call.

You can create an entry point for your distributed search by adding a
new requestHandler element in solrconfig.xml. You would add the
shard list parameter to the defaults list. Do not have it call the
same requesthandler path- you'll get an infinite loop.

On Tue, Nov 10, 2009 at 6:44 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Hm, I don't follow.  You don't need to create a custom (request) handler to 
 make use of Solr's distributed search.

 Otis
 --
 Sematext is hiring -- http://sematext.com/about/jobs.html?mls
 Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



 - Original Message 
 From: Turner, Robbin J robbin.j.tur...@boeing.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Tue, November 10, 2009 6:41:32 PM
 Subject: RE: Request assistance with distributed search multi shard/core   
 setup and configuration

 Thanks, I had already read through this url.  I guess my request was is 
 there a
 way to setup something that is already part of solr itself to pass the
 URL[shard...] then having create a custom handler.

 thanks

 -Original Message-
 From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
 Sent: Tuesday, November 10, 2009 6:09 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Request assistance with distributed search multi shard/core 
 setup
 and configuration

 Right, that's http://wiki.apache.org/solr/DistributedSearch

 Otis
 --
 Sematext is hiring -- http://sematext.com/about/jobs.html?mls
 Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



 - Original Message 
  From: Turner, Robbin J
  To: solr-user@lucene.apache.org
  Sent: Tue, November 10, 2009 6:05:19 PM
  Subject: RE: Request assistance with distributed search multi
  shard/core  setup and configuration
 
  I've already done the single Solr, that's why my request.  I read on
  some site that there is a way to setup the configuration so I can send
  a query to one solr instance and have it pass it on or distribute it across
 all the instances?
 
  Btw, thanks for the quick reply.
  RJ
 
  -Original Message-
  From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
  Sent: Tuesday, November 10, 2009 6:02 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Request assistance with distributed search multi
  shard/core setup and configuration
 
  RJ,
 
  You may want to take a simpler step - single Solr core (no solr.xml
  needed) per machine.  Then distributed search really only requires
  that you specify shard URLs in the URL of the search requests.  In
  practice/production you rarely benefit from distributed search against
  multiple cores on the same server anyway.
 
  Otis
  --
  Sematext is hiring -- http://sematext.com/about/jobs.html?mls
  Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
 
 
 
 
  
  From: Turner, Robbin J
  To: solr-user@lucene.apache.org
  Sent: Tue, November 10, 2009 5:58:52 PM
  Subject: Request assistance with distributed search multi shard/core
  setup and configuration
 
  I've been looking through all the documentation.  I've set up a single
  solr instance, and one multicore instance.  If someone would be
  willing to share some configuration examples and/or advise for setting
  up solr for distributing the search, I would really appreciate it.
  I've read that there is a way to do it, but most of the current
  documentation doesn't provide enough example on what to do with
  solr.xml, and the solrconfig.xml.  Also, I'm using tomcat 6 for the servlet
 container.  I deployed the solr 1.4.0 released yesterday.
 
  Thanks
  RJ





-- 
Lance Norskog
goks...@gmail.com


Re: NPE when trying to view a specific document via Luke

2009-11-13 Thread Yonik Seeley
On Fri, Nov 13, 2009 at 5:41 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
 : I'm seeing this stack trace when I try to view a specific document, e.g.
 : /admin/luke?id=1 but luke appears to be working correctly when I just

 FWIW: I was able to reproduce this using the example setup (i picked a
 doc id at random)  suspecting it was a bug in docFreq

Probably just a null being passed in the text part of the term.
I bet Luke expects all field values to be strings, but some are binary.

-Yonik
http://www.lucidimagination.com


Fwd: Lucene MMAP Usage with Solr

2009-11-13 Thread ST ST
Folks,

I am trying to get Lucene MMAP to work in solr.

I am assuming that when I configure MMAP the entire index will be loaded
into RAM.
Is that the right assumption ?

I have tried the following ways for using MMAP:

Option 1. Using the solr config below for MMAP configuration

-Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.MMapDirectory

   With this config, when I start solr with a 30G index, I expected that the
RAM usage should go up, but it did not.

Option 2. By Code Change
I made the following code change :

   Changed org.apache.solr.core.StandardDirectoryFactory to use MMAP instead
of FSDirectory.
   Code snippet pasted below.


Could you help me to understand if these are the right way to use MMAP?

Thanks much
/ST.

Code SNippet for Option 2:

package org.apache.solr.core;
/**
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the License); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an AS IS BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

import java.io.File;
import java.io.IOException;

import org.apache.lucene.store.Directory;
import org.apache.lucene.store.MMapDirectory;

/**
 * Directory provider which mimics original Solr FSDirectory based behavior.
 *
 */
public class StandardDirectoryFactory extends DirectoryFactory {

  public Directory open(String path) throws IOException {
return MMapDirectory.open(new File(path));
  }
}


Re: any docs on solr.EdgeNGramFilterFactory?

2009-11-13 Thread Peter Wolanin
Thanks for the link - there doesn't seem a be a fix version specified,
so I guess this will not officially ship with lucene 2.9?

-Peter

On Wed, Nov 11, 2009 at 10:36 PM, Robert Muir rcm...@gmail.com wrote:
 Peter, here is a project that does this:
 http://issues.apache.org/jira/browse/LUCENE-1488


 That's kind of interesting - in general can I build a custom tokenizer
 from existing tokenizers that treats different parts of the input
 differently based on the utf-8 range of the characters?  E.g. use a
 porter stemmer for stretches of Latin text and n-gram or something
 else for CJK?

 -Peter

 On Tue, Nov 10, 2009 at 9:21 PM, Otis Gospodnetic
 otis_gospodne...@yahoo.com wrote:
  Yes, that's the n-gram one.  I believe the existing CJK one in Lucene is
 really just an n-gram tokenizer, so no different than the normal n-gram
 tokenizer.
 
  Otis
  --
  Sematext is hiring -- http://sematext.com/about/jobs.html?mls
  Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
 
 
 
  - Original Message 
  From: Peter Wolanin peter.wola...@acquia.com
  To: solr-user@lucene.apache.org
  Sent: Tue, November 10, 2009 7:34:37 PM
  Subject: Re: any docs on solr.EdgeNGramFilterFactory?
 
  So, this is the normal N-gram one?  NGramTokenizerFactory
 
  Digging deeper - there are actualy CJK and Chinese tokenizers in the
  Solr codebase:
 
 
 http://lucene.apache.org/solr/api/org/apache/solr/analysis/CJKTokenizerFactory.html
 
 http://lucene.apache.org/solr/api/org/apache/solr/analysis/ChineseTokenizerFactory.html
 
  The CJK one uses the lucene CJKTokenizer
 
 http://lucene.apache.org/java/2_9_1/api/contrib-analyzers/org/apache/lucene/analysis/cjk/CJKTokenizer.html
 
  and there seems to be another one even that no one has wrapped into
 Solr:
 
 http://lucene.apache.org/java/2_9_1/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/package-summary.html
 
  So seems like the existing options are a little better than I thought,
  though it would be nice to have some docs on properly configuring
  these.
 
  -Peter
 
  On Tue, Nov 10, 2009 at 6:05 PM, Otis Gospodnetic
  wrote:
   Peter,
  
   For CJK and n-grams, I think you don't want the *Edge* n-grams, but
 just
  n-grams.
   Before you take the n-gram route, you may want to look at the smart
 Chinese
  analyzer in Lucene contrib (I think it works only for Simplified
 Chinese) and
  Sen (on java.net).  I also spotted a Korean analyzer in the wild a few
 months
  back.
  
   Otis
   --
   Sematext is hiring -- http://sematext.com/about/jobs.html?mls
   Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
  
  
  
   - Original Message 
   From: Peter Wolanin
   To: solr-user@lucene.apache.org
   Sent: Tue, November 10, 2009 4:06:52 PM
   Subject: any docs on solr.EdgeNGramFilterFactory?
  
   This fairly recent blog post:
  
  
 
 http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
  
   describes the use of the solr.EdgeNGramFilterFactory as the tokenizer
   for the index.  I don't see any mention of that tokenizer on the Solr
   wiki - is it just waiting to be added, or is there any other
   documentation in addition to the blog post?  In particular, there was
   a thread last year about using an N-gram tokenizer to enable
   reasonable (if not ideal) searching of CJK text, so I'd be curious to
   know how people are configuring their schema (with this tokenizer?)
   for that use case.
  
   Thanks,
  
   Peter
  
   --
   Peter M. Wolanin, Ph.D.
   Momentum Specialist,  Acquia. Inc.
   peter.wola...@acquia.com
  
  
 
 
 
  --
  Peter M. Wolanin, Ph.D.
  Momentum Specialist,  Acquia. Inc.
  peter.wola...@acquia.com
 
 



 --
 Peter M. Wolanin, Ph.D.
 Momentum Specialist,  Acquia. Inc.
 peter.wola...@acquia.com





 --
 Robert Muir
 rcm...@gmail.com




-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com


Re: Reseting doc boosts

2009-11-13 Thread Koji Sekiguchi

I'm not sure this is what you are looking for,
but there is FieldNormModifier tool in Lucene.

Koji

--

http://www.rondhuit.com/en/


Avlesh Singh wrote:

AFAIK there is no way to reset the doc boost. You would need to re-index.
Moreover, there is no way to search by boost.

Cheers
Avlesh

On Fri, Nov 13, 2009 at 8:17 PM, Jon Baer jonb...@gmail.com wrote:

  

Hi,

Im trying to figure out if there is an easy way to basically reset all of
any doc boosts which you have made (for analytical purposes) ... for example
if I run an index, gather report, doc boost on the report, and reset the
boosts @ time of next index ...

It would seem to be from just knowing how Lucene works that I would really
need to reindex since its a attrib on the doc itself which would have to be
modified, but there is no easy way to query for docs which have been boosted
either.  Any insight?

Thanks.

- Jon



  




Re: any docs on solr.EdgeNGramFilterFactory?

2009-11-13 Thread Robert Muir
ah, thanks, i'll tentatively set one in the future, but definitely not 2.9.x

more just to show you the idea, you can do different things depending on
different runs of writing systems in text.
but it doesnt solve everything: you only know its Latin script, not english,
so you can't safely automatically do anything like stemming.

say your content is only chinese, english:

the analyzer won't know your latin script text is english, versus say,
french from the unicode, so it won't stem it.
but that analyzer will lowercase it. it won't know if your ideographs are
chinese or japanese, but it will use n-gram tokenization, you get the drift.

in that impl, it puts the script code in the flags so downstream you could
do something like stemming if you happen to know more than is evident from
the unicode.

On Fri, Nov 13, 2009 at 6:23 PM, Peter Wolanin peter.wola...@acquia.comwrote:

 Thanks for the link - there doesn't seem a be a fix version specified,
 so I guess this will not officially ship with lucene 2.9?

 -Peter

 On Wed, Nov 11, 2009 at 10:36 PM, Robert Muir rcm...@gmail.com wrote:
  Peter, here is a project that does this:
  http://issues.apache.org/jira/browse/LUCENE-1488
 
 
  That's kind of interesting - in general can I build a custom tokenizer
  from existing tokenizers that treats different parts of the input
  differently based on the utf-8 range of the characters?  E.g. use a
  porter stemmer for stretches of Latin text and n-gram or something
  else for CJK?
 
  -Peter
 
  On Tue, Nov 10, 2009 at 9:21 PM, Otis Gospodnetic
  otis_gospodne...@yahoo.com wrote:
   Yes, that's the n-gram one.  I believe the existing CJK one in Lucene
 is
  really just an n-gram tokenizer, so no different than the normal n-gram
  tokenizer.
  
   Otis
   --
   Sematext is hiring -- http://sematext.com/about/jobs.html?mls
   Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
  
  
  
   - Original Message 
   From: Peter Wolanin peter.wola...@acquia.com
   To: solr-user@lucene.apache.org
   Sent: Tue, November 10, 2009 7:34:37 PM
   Subject: Re: any docs on solr.EdgeNGramFilterFactory?
  
   So, this is the normal N-gram one?  NGramTokenizerFactory
  
   Digging deeper - there are actualy CJK and Chinese tokenizers in the
   Solr codebase:
  
  
 
 http://lucene.apache.org/solr/api/org/apache/solr/analysis/CJKTokenizerFactory.html
  
 
 http://lucene.apache.org/solr/api/org/apache/solr/analysis/ChineseTokenizerFactory.html
  
   The CJK one uses the lucene CJKTokenizer
  
 
 http://lucene.apache.org/java/2_9_1/api/contrib-analyzers/org/apache/lucene/analysis/cjk/CJKTokenizer.html
  
   and there seems to be another one even that no one has wrapped into
  Solr:
  
 
 http://lucene.apache.org/java/2_9_1/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/package-summary.html
  
   So seems like the existing options are a little better than I
 thought,
   though it would be nice to have some docs on properly configuring
   these.
  
   -Peter
  
   On Tue, Nov 10, 2009 at 6:05 PM, Otis Gospodnetic
   wrote:
Peter,
   
For CJK and n-grams, I think you don't want the *Edge* n-grams, but
  just
   n-grams.
Before you take the n-gram route, you may want to look at the smart
  Chinese
   analyzer in Lucene contrib (I think it works only for Simplified
  Chinese) and
   Sen (on java.net).  I also spotted a Korean analyzer in the wild a
 few
  months
   back.
   
Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
   
   
   
- Original Message 
From: Peter Wolanin
To: solr-user@lucene.apache.org
Sent: Tue, November 10, 2009 4:06:52 PM
Subject: any docs on solr.EdgeNGramFilterFactory?
   
This fairly recent blog post:
   
   
  
 
 http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
   
describes the use of the solr.EdgeNGramFilterFactory as the
 tokenizer
for the index.  I don't see any mention of that tokenizer on the
 Solr
wiki - is it just waiting to be added, or is there any other
documentation in addition to the blog post?  In particular, there
 was
a thread last year about using an N-gram tokenizer to enable
reasonable (if not ideal) searching of CJK text, so I'd be curious
 to
know how people are configuring their schema (with this
 tokenizer?)
for that use case.
   
Thanks,
   
Peter
   
--
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com
   
   
  
  
  
   --
   Peter M. Wolanin, Ph.D.
   Momentum Specialist,  Acquia. Inc.
   peter.wola...@acquia.com
  
  
 
 
 
  --
  Peter M. Wolanin, Ph.D.
  Momentum Specialist,  Acquia. Inc.
  peter.wola...@acquia.com
 
 
 
 
 
  --
  Robert Muir
  rcm...@gmail.com
 



 --
 Peter M. Wolanin, Ph.D.
 Momentum Specialist,  Acquia. Inc.
 peter.wola...@acquia.com




-- 
Robert Muir

Re: NPE when trying to view a specific document via Luke

2009-11-13 Thread Chris Hostetter

:  FWIW: I was able to reproduce this using the example setup (i picked a
:  doc id at random) �suspecting it was a bug in docFreq
: 
: Probably just a null being passed in the text part of the term.
: I bet Luke expects all field values to be strings, but some are binary.

I'm not sure i follow you ... i think you saying that naive assumptions in 
the LukeRequestHandler could result in it asking for the docFreq of a term 
that has a null string value because some field types are binary, except 
that...

 1) 1.3 didn't have this problem
 2) LukeRequestHandler.getDocumentFieldsInfo didn't change from 1.3 to 1.4

I tied to reproduce this in 1.4 using an index/configs created with 1.3, 
but i got a *different* NPE when loading this url...

   http://localhost:8983/solr/admin/luke?id=SP2514N

SEVERE: java.lang.NullPointerException
at 
org.apache.solr.util.NumberUtils.SortableStr2int(NumberUtils.java:127)
at 
org.apache.solr.util.NumberUtils.SortableStr2float(NumberUtils.java:83)
at 
org.apache.solr.util.NumberUtils.SortableStr2floatStr(NumberUtils.java:89)
at 
org.apache.solr.schema.SortableFloatField.indexedToReadable(SortableFloatField.java:62)
at 
org.apache.solr.schema.SortableFloatField.toExternal(SortableFloatField.java:53)
at 
org.apache.solr.handler.admin.LukeRequestHandler.getDocumentFieldsInfo(LukeRequestHandler.java:245)

...all three of these stack traces seem to suggest that some imple of 
Fieldable.stringValue in 2.9 is returning null in cases where it returned 
*something* else in the 2.4-dev jar used by Solr 1.3.

That seems like it could have other impacts besides LukeRequestHandler.


-Hoss

Re: NPE when trying to view a specific document via Luke

2009-11-13 Thread Chris Hostetter

: I tied to reproduce this in 1.4 using an index/configs created with 1.3, 
: but i got a *different* NPE when loading this url...

I should have tried a simpler test ...  iget NPE's just trying to execute 
a simple search for *:* when i try to use the example index built 
in 1.3  (with the 1.3 configs) in 1.4.  same (apparent) cause: code is 
attempting to deref a string returned by Fieldable.stringValue() which is 
null... 

java.lang.NullPointerException
at 
org.apache.solr.schema.SortableIntField.write(SortableIntField.java:72)
at org.apache.solr.schema.SchemaField.write(SchemaField.java:108)
at org.apache.solr.request.XMLWriter.writeDoc(XMLWriter.java:311)
at org.apache.solr.request.XMLWriter$3.writeDocs(XMLWriter.java:483)
at org.apache.solr.request.XMLWriter.writeDocuments(XMLWriter.java:420)
at org.apache.solr.request.XMLWriter.writeDocList(XMLWriter.java:457)
at org.apache.solr.request.XMLWriter.writeVal(XMLWriter.java:520)
at org.apache.solr.request.XMLWriter.writeResponse(XMLWriter.java:130)
at 
org.apache.solr.request.XMLResponseWriter.write(XMLResponseWriter.java:34)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:325)

This really does smell like something in Lucene changed behavior 
drasticly.  I've been looking at diffs from java/tr...@691741 and 
java/tags/lucene_2_9_1 but nothing jumps out at me that would explain 
this.

If nothing else, i'm opening a solr issue...



-Hoss



StreamingUpdateSolrServer commit?

2009-11-13 Thread erikea...@yahoo.com

When does  StreamingUpdateSolrServer commit?

I know there's a threshhold and thread pool as params but I don't see a commit 
timeout.   Do I have to manage this myself?


  

Re: exclude some fields from copying dynamic fields | schema.xml

2009-11-13 Thread Lance Norskog
There is no direct way.

Let's say you have a nocopy_s and you do not want a copy
nocopy_str_s. This might work: declare nocopy_str_s as a field and
make it not indexed and not stored. I don't know if this will work.

It requires two overrides to work: 1) that declaring a field name that
matches a wildcard will override the default wildcard rule, and 2)
that stored=false indexed=false works.

On Fri, Nov 13, 2009 at 3:23 AM, Vicky_Dev
vikrantv_shirbh...@yahoo.co.in wrote:

 Hi,
 we are using the following entry in schema.xml to make a copy of one type of
 dynamic field to another :
 copyField source=*_s dest=*_str_s /

 Is it possible to exclude some fields from copying.

 We are using Solr1.3

 ~Vikrant

 --
 View this message in context: 
 http://old.nabble.com/exclude-some-fields-from-copying-dynamic-fields-%7C-schema.xml-tp26335109p26335109.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
Lance Norskog
goks...@gmail.com


Re: Reseting doc boosts

2009-11-13 Thread Jon Baer
This looks exactly like what I was needing ... this looks like it would be a 
great tool / addition to Solr web interface but it looks like it only takes 
(Directory d, Similarity s) (vs. subset collection of documents) ...

Either way great find, thanks for your help ...

- Jon

On Nov 13, 2009, at 6:40 PM, Koji Sekiguchi wrote:

 I'm not sure this is what you are looking for,
 but there is FieldNormModifier tool in Lucene.
 
 Koji
 
 -- 
 
 http://www.rondhuit.com/en/
 
 
 Avlesh Singh wrote:
 AFAIK there is no way to reset the doc boost. You would need to re-index.
 Moreover, there is no way to search by boost.
 
 Cheers
 Avlesh
 
 On Fri, Nov 13, 2009 at 8:17 PM, Jon Baer jonb...@gmail.com wrote:
 
  
 Hi,
 
 Im trying to figure out if there is an easy way to basically reset all of
 any doc boosts which you have made (for analytical purposes) ... for example
 if I run an index, gather report, doc boost on the report, and reset the
 boosts @ time of next index ...
 
 It would seem to be from just knowing how Lucene works that I would really
 need to reindex since its a attrib on the doc itself which would have to be
 modified, but there is no easy way to query for docs which have been boosted
 either.  Any insight?
 
 Thanks.
 
 - Jon

 
  
 



Re: Making search results more stable as index is updated

2009-11-13 Thread Lance Norskog
This is one case where permanent caches are interesting. Another case
is highlighting: in some cases highlighting takes a lot of work, and
this work is not cached.

It might be a cleaner architecture to have session-maintaining code in
a separate front-end app, and leave Solr session-free.

On Fri, Nov 13, 2009 at 12:48 PM, Chris Harris rygu...@gmail.com wrote:
 If documents are being added to and removed from an index (and commits
 are being issued) while a user is searching, then the experience of
 paging through search results using the obvious solr mechanism
 (start=100Rows=10) may be disorienting for the user. For one
 example, by the time the user clicks next page for the first time, a
 document that they saw on page 1 may have been pushed onto page 2.
 (This may be especially pronounced if docs are being sorted by date.)

 I'm wondering what are the best options available for presenting a
 more stable set of search results to users in such cases. The obvious
 candidates to me are:

 #1: Cache results in the user session of the web tier. (In particular,
 maybe just cache the uniqueKey of each maching document.)

  Pro: Simple
  Con: May require capping the # of search results in order to make
 the initial query (which now has Solr numRows param  web pageSize)
 fast enough. For example, maybe it's only practical to cache the first
 500 records.

 #2: Create some kind of per-user results cache in Solr. (One simple
 implementation idea: You could make your Solr search handler take a
 userid parameter, and cache each user's last search in a special
 per-user results cache. You then also provide an API that says, give
 me records n through m of userid #1334's last search. For your
 subsequent queries, you consult the latter API rather than redoing
 your search. Because Lucene docids are unstable across commits and
 such, I think this means caching the uniqueKey of each maching
 document. This in turn means looking up the uniqueKey of each maching
 document at search time. It also means you can't use the existing Solr
 caches, but need to make a new one.)

  Pro: Maybe faster than #1?? (Saves on data transfer between Solr and
 web tier, at least during the initial query.)
  Con: More complicated than #1.

 #3: Use filter queries to attempt to make your subsequent queries (for
 page 2, page 3, etc.) return results consistent with your original
 query. (One idea is to give each document a docAddedTimestamp field,
 which would have precision down to the millisecond or something. On
 your initial query, you could note the current time, T. Then for the
 subsequent queries you add a filter query for docAddedTimestamp=T.
 Hopefully with a trie date field this would be fast. This should
 hopefully keep any docs newly added after T from showing up in the
 user's search results as they page through them. However, it won't
 necessarily protect you from docs that were *reindexed* (i.e. re-add a
 doc with the same uniqueKey as an existing doc) or docs that were
 deleted.)

  Pro: Doesn't require a new cache, and no cap on # of search results
  Con: Maybe doesn't provide total stability.

 Any feedback on these options? Are there other ideas to consider?

 Thanks,
 Chris




-- 
Lance Norskog
goks...@gmail.com


Re: StreamingUpdateSolrServer commit?

2009-11-13 Thread Otis Gospodnetic
Unless I slept through it, you still need to explicitly commit, even with SUSS.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: erikea...@yahoo.com erikea...@yahoo.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Fri, November 13, 2009 9:43:53 PM
 Subject: StreamingUpdateSolrServer commit?
 
 
 When does  StreamingUpdateSolrServer commit?
 
 I know there's a threshhold and thread pool as params but I don't see a 
 commit 
 timeout.   Do I have to manage this myself?



Re: Fwd: Lucene MMAP Usage with Solr

2009-11-13 Thread Otis Gospodnetic
I thought that was the way to use it (but I've never had to use it myself) and 
that it means memory through the roof, yes.
If you look at the Solr Admin statistics page, does it show you which Directory 
you are using?

For example, on 1 Solr instance I'm looking at I see:

readerDir :  org.apache.lucene.store.NIOFSDirectory@/mnt/


Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: ST ST stst2...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Fri, November 13, 2009 6:03:57 PM
 Subject: Fwd: Lucene MMAP Usage with Solr
 
 Folks,
 
 I am trying to get Lucene MMAP to work in solr.
 
 I am assuming that when I configure MMAP the entire index will be loaded
 into RAM.
 Is that the right assumption ?
 
 I have tried the following ways for using MMAP:
 
 Option 1. Using the solr config below for MMAP configuration
 
 -Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.MMapDirectory
 
With this config, when I start solr with a 30G index, I expected that the
 RAM usage should go up, but it did not.
 
 Option 2. By Code Change
 I made the following code change :
 
Changed org.apache.solr.core.StandardDirectoryFactory to use MMAP instead
 of FSDirectory.
Code snippet pasted below.
 
 
 Could you help me to understand if these are the right way to use MMAP?
 
 Thanks much
 /ST.
 
 Code SNippet for Option 2:
 
 package org.apache.solr.core;
 /**
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the License); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an AS IS BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
 
 import java.io.File;
 import java.io.IOException;
 
 import org.apache.lucene.store.Directory;
 import org.apache.lucene.store.MMapDirectory;
 
 /**
 * Directory provider which mimics original Solr FSDirectory based behavior.
 *
 */
 public class StandardDirectoryFactory extends DirectoryFactory {
 
   public Directory open(String path) throws IOException {
 return MMapDirectory.open(new File(path));
   }
 }



Re: Stop solr without losing documents

2009-11-13 Thread Otis Gospodnetic
So I think the question is really:
If I stop the servlet container, does Solr issue a commit in the shutdown hook 
in order to ensure all buffered docs are persisted to disk before the JVM 
exits.

I don't have the Solr source handy, but if I did, I'd look for Shutdown, 
Hook and finalize in the code.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: Chris Hostetter hossman_luc...@fucit.org
 To: solr-user@lucene.apache.org
 Sent: Fri, November 13, 2009 4:09:00 PM
 Subject: Re: Stop solr without losing documents
 
 
 : which documents have been updated before a successful commit.  Now
 : stopping solr is as easy as kill -9.
 
 please don't kill -9 ... it's grossly overkill, and doesn't give your 
 servlet container a fair chance to cleanthings up.  A lot of work has been 
 done to make Lucene indexes robust to hard terminations of the JVM (or 
 physical machine) but there's no reason to go out of your way to try and 
 stab it in the heart when you could just shut it down cleanly.
 
 that's not to say your appraoch isn't a good one -- if you only have one 
 client sending updates/commits then having it keep track of what was 
 indexed prior to the lasts successful commit is a viable way to dela with 
 what happens if solr stops responding (either because you shut it down, or 
 because it crashed for some other reason).
 
 Alternately, you could take advantage of the enabled feature from your 
 client (just have it test the enabled url ever N updates or so) and when 
 it sees that you have disabled the port it can send one last commit and 
 then stop sending updates until it sees the enabled URL work againg -- as 
 soon as you see the updates stop, you can safely shutdown hte port.
 
 
 -Hoss



changes to highlighting config or syntax in 1.4?

2009-11-13 Thread Peter Wolanin
I'm testing out the final release of Solr 1.4 as compared to the build
I have been using from around June.

I'm using hte dismax handler for searches.  I'm finding that
highlighting is completely broken as compared to previously.  Much
more text is returned than it should for each string in lst
name=highlighting, but the search words  are never highlighted in
that response.  Setting usePhraseHighlighter=false makes no
difference.

Any pointers appreciated.

-Peter

-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com


Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Otis Gospodnetic
Let's take a step back.  Why do you need to optimize?  You said: As long as 
I'm not optimizing, search and indexing times are satisfactory. :)

You don't need to optimize just because you are continuously adding and 
deleting documents.  On the contrary!

 Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: Jerome L Quinn jlqu...@us.ibm.com
 To: solr-user@lucene.apache.org
 Sent: Thu, November 12, 2009 6:30:42 PM
 Subject: Solr 1.3 query and index perf tank during optimize
 
 
 Hi, everyone, this is a problem I've had for quite a while,
 and have basically avoided optimizing because of it.  However,
 eventually we will get to the point where we must delete as
 well as add docs continuously.
 
 I have a Solr 1.3 index with ~4M docs at around 90G.  This is a single
 instance running inside tomcat 6, so no replication.  Merge factor is the
 default 10.  ramBufferSizeMB is 32.  maxWarmingSearchers=4.
 autoCommit is set at 3 sec.
 
 We continually push new data into the index, at somewhere between 1-10 docs
 every 10 sec or so.  Solr is running on a quad-core 3.0GHz server.
 under IBM java 1.6.  The index is sitting on a local 15K scsi disk.
 There's nothing
 else of substance running on the box.
 
 Optimizing the index takes about 65 min.
 
 As long as I'm not optimizing, search and indexing times are satisfactory.
 
 When I start the optimize, I see massive problems with timeouts pushing new
 docs
 into the index, and search times balloon.  A typical search while
 optimizing takes
 about 1 min instead of a few seconds.
 
 Can anyone offer me help with fixing the problem?
 
 Thanks,
 Jerry Quinn



Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Lance Norskog
The 'maxSegments' feature is new with 1.4.  I'm not sure that it will
cause any less disk I/O during optimize.

The 'mergeFactor=2' idea is not what you think: in this case the index
is always mostly optimized, so you never need to run optimize.
Indexing is always slower, because you amortize the optimize time into
little continuous chunks during indexing. You never stop indexing. You
should not lose documents.

On Fri, Nov 13, 2009 at 1:07 PM, Jerome L Quinn jlqu...@us.ibm.com wrote:

 Mark Miller markrmil...@gmail.com wrote on 11/12/2009 07:18:03 PM:
 Ah, the pains of optimization. Its kind of just how it is. One solution
 is to use two boxes and replication - optimize on the master, and then
 queries only hit the slave. Out of reach for some though, and adds many
 complications.

 Yes, in my use case 2 boxes isn't a great option.


 Another kind of option is to use the partial optimize feature:

  optimize maxOptimizeSegments=5/

 Using this, you can optimize down to n segments and take a shorter hit
 each time.

 Is this a 1.4 feature?  I'm planning to migrate to 1.4, but it'll take a
 while since
 I have to port custom code forward, including a query parser.


 Also, if optimizing is so painful, you might lower the merge factor
 amortize that pain better. Thats another way to slowly get there - if
 you lower the merge factor, as merging takes place, the new merge factor
 will be respected, and semgents will merge down. A merge factor of 2
 (the lowest) will make it so you only ever have 2 segments. Sometimes
 that works reasonably well - you could try 3-6 or something as well.
 Then when you do your partial optimizes (and eventually a full optimize
 perhaps), you want have so far to go.

 So this will slow down indexing but speed up optimize somewhat?
 Unfortunately
 right now I lose docs I'm indexing, as well slowing searching to a crawl.
 Ugh.

 I've got plenty of CPU horsepower.  This is where having the ability to
 optimize
 on another filesystem would be useful.

 Would it perhaps make sense to set up a master/slave on the same machine?
 Then
 I suppose I can have an index being optimized that might not clobber the
 search.
 Would new indexed items still be dropped on the floor?

 Thanks,
 Jerry



-- 
Lance Norskog
goks...@gmail.com


Re: changes to highlighting config or syntax in 1.4?

2009-11-13 Thread Peter Wolanin
Apparently one of my conf files was broken - odd that I didn't see any
exceptions.  Anyhow - excuse my haste, I don't see the problem now.

-Peter

On Fri, Nov 13, 2009 at 11:06 PM, Peter Wolanin
peter.wola...@acquia.com wrote:
 I'm testing out the final release of Solr 1.4 as compared to the build
 I have been using from around June.

 I'm using hte dismax handler for searches.  I'm finding that
 highlighting is completely broken as compared to previously.  Much
 more text is returned than it should for each string in lst
 name=highlighting, but the search words  are never highlighted in
 that response.  Setting usePhraseHighlighter=false makes no
 difference.

 Any pointers appreciated.

 -Peter

 --
 Peter M. Wolanin, Ph.D.
 Momentum Specialist,  Acquia. Inc.
 peter.wola...@acquia.com




-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com


Re: Data import problem with child entity from different database

2009-11-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
am unable to get the file
http://old.nabble.com/file/p26335171/dataimport.temp.xml

On Fri, Nov 13, 2009 at 4:57 PM, Andrew Clegg andrew.cl...@gmail.com wrote:



 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 no obvious issues.
 you may post your entire data-config.xml


 Here it is, exactly as last attempt but with usernames etc. removed.

 Ignore the comments and the unused FileDataSource...

 http://old.nabble.com/file/p26335171/dataimport.temp.xml dataimport.temp.xml


 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 do w/o CachedSqlEntityProcessor first and then apply that later


 Yep, that was just a bit of a wild stab in the dark to see if it made any
 difference.

 Thanks,

 Andrew.

 --
 View this message in context: 
 http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26335171.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Stop solr without losing documents

2009-11-13 Thread Lance Norskog
I would go with polling Solr to find what is not yet there. In
production, it is better to assume that things will break, and have
backstop janitors that fix them. And then test those janitors
regularly.

On Fri, Nov 13, 2009 at 8:02 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 So I think the question is really:
 If I stop the servlet container, does Solr issue a commit in the shutdown 
 hook in order to ensure all buffered docs are persisted to disk before the 
 JVM exits.

 I don't have the Solr source handy, but if I did, I'd look for Shutdown, 
 Hook and finalize in the code.

 Otis
 --
 Sematext is hiring -- http://sematext.com/about/jobs.html?mls
 Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



 - Original Message 
 From: Chris Hostetter hossman_luc...@fucit.org
 To: solr-user@lucene.apache.org
 Sent: Fri, November 13, 2009 4:09:00 PM
 Subject: Re: Stop solr without losing documents


 : which documents have been updated before a successful commit.  Now
 : stopping solr is as easy as kill -9.

 please don't kill -9 ... it's grossly overkill, and doesn't give your
 servlet container a fair chance to cleanthings up.  A lot of work has been
 done to make Lucene indexes robust to hard terminations of the JVM (or
 physical machine) but there's no reason to go out of your way to try and
 stab it in the heart when you could just shut it down cleanly.

 that's not to say your appraoch isn't a good one -- if you only have one
 client sending updates/commits then having it keep track of what was
 indexed prior to the lasts successful commit is a viable way to dela with
 what happens if solr stops responding (either because you shut it down, or
 because it crashed for some other reason).

 Alternately, you could take advantage of the enabled feature from your
 client (just have it test the enabled url ever N updates or so) and when
 it sees that you have disabled the port it can send one last commit and
 then stop sending updates until it sees the enabled URL work againg -- as
 soon as you see the updates stop, you can safely shutdown hte port.


 -Hoss





-- 
Lance Norskog
goks...@gmail.com


Re: javabin in .NET?

2009-11-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
OK. Is there anyone trying it out? where is this code ? I can try to help ..

On Fri, Nov 13, 2009 at 8:10 PM, Mauricio Scheffer
mauricioschef...@gmail.com wrote:
 I meant the standard IO libraries. They are different enough that the code
 has to be manually ported. There were some automated tools back when
 Microsoft introduced .Net, but IIRC they never really worked.

 Anyway it's not a big deal, it should be a straightforward job. Testing it
 thoroughly cross-platform is another thing though.

 2009/11/13 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 The javabin format does not have many dependencies. it may have 3-4
 classes an that is it.

 On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer
 mauricioschef...@gmail.com wrote:
  Nope. It has to be manually ported. Not so much because of the language
  itself but because of differences in the libraries.
 
 
  2009/11/13 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
 
  Is there any tool to directly port java to .Net? then we can etxract
  out the client part of the javabin code and convert it.
 
  On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher erik.hatc...@gmail.com
  wrote:
   Has anyone looked into using the javabin response format from .NET
  (instead
   of SolrJ)?
  
   It's mainly a curiosity.
  
   How much better could performance/bandwidth/throughput be?  How
 difficult
   would it be to implement some .NET code (C#, I'd guess being the best
   choice) to handle this response format?
  
   Thanks,
          Erik
  
  
 
 
 
  --
  -
  Noble Paul | Principal Engineer| AOL | http://aol.com
 
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Data import problem with child entity from different database

2009-11-13 Thread Lance Norskog
dataConfig

dataSource name=caffdubya driver=org.postgresql.Driver
url=jdbc:postgresql://db1/cathdb_v3_3_0 user=USER password=PASS
/

dataSource name=sinatra driver=oracle.jdbc.OracleDriver
url=jdbc:oracle:thin:@db2:1521:biomapwh user=USER password=PASS
/

!-- The following path is on bsmcmp11's local disk for speed. --
!-- The master copy (compressed) lives at
/cath/data/current/pdb-XML-noatom --
!-- For convenience, there's a script at
bsmcmp11:/export/local/refresh_pdb to copy and unpack it. --

dataSource name=filesystem type=FileDataSource
basePath=/export/local/pdb-XML-noatom/ encoding=UTF-8
connectionTimeout=5000 readTimeout=1/

document

entity name=domain dataSource=caffdubya query=select *
from domain_text

!-- Subquery for related PubMed IDs (we could pull the
actual text in later...) ... NOT WORKING! :-( --

entity
name=domain_pubmed_ids
dataSource=sinatra
onError=continue
query=select id as pdb_code, related_id as
related_ids from biomap_admin.uniprot_pdb_pubmed_for_solr where id =
'${domain.pdb_code}' /

/entity

!-- REMOVED MOST ENTITIES FOR TEST PURPOSES, RESTORE FROM
PREVIOUS REVISION --

/document

/dataConfig



2009/11/13 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 am unable to get the file
 http://old.nabble.com/file/p26335171/dataimport.temp.xml

 On Fri, Nov 13, 2009 at 4:57 PM, Andrew Clegg andrew.cl...@gmail.com wrote:



 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 no obvious issues.
 you may post your entire data-config.xml


 Here it is, exactly as last attempt but with usernames etc. removed.

 Ignore the comments and the unused FileDataSource...

 http://old.nabble.com/file/p26335171/dataimport.temp.xml dataimport.temp.xml


 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 do w/o CachedSqlEntityProcessor first and then apply that later


 Yep, that was just a bit of a wild stab in the dark to see if it made any
 difference.

 Thanks,

 Andrew.

 --
 View this message in context: 
 http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26335171.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com




-- 
Lance Norskog
goks...@gmail.com