date:20091113

Re: Type converters for DocumentObjectBinder

2009-11-13 Thread paulhyo


Hi Paul,

it's working for Query, but not for Updating (Add Bean). The getter method
is returning a Calendar (GregorianCalendar instance)

On the indexer side, a toString() or something equivalent is done and an
error is thrown

Caused by: java.text.ParseException: Unparseable date:
java.util.GregorianCalendar:java.util.GregorianCalendar[time=1258100168327,areFieldsSet=
rue,areAllFieldsSet=true,lenient=true,zone=sun.util.calendar.ZoneInfo[id=Europe/Berlin,offset=360,dstSavings=360,useDaylight=true,tran
itions=143,lastRule=java.util.SimpleTimeZone[id=Europe/Berlin,offset=360,dstSavings=360,useDaylight=true,startYear=0,startMode=2,startMo
th=2,startDay=-1,startDayOfWeek=1,startTime=360,startTimeMode=2,endMode=2,endMonth=9,endDay=-1,endDayOfWeek=1,endTime=360,endTimeMode=2]
,firstDayOfWeek=2,minimalDaysInFirstWeek=4,ERA=1,YEAR=2009,MONTH=10,WEEK_OF_YEAR=46,WEEK_OF_MONTH=2,DAY_OF_MONTH=13,DAY_OF_YEAR=317,DAY_OF_WEEK=
,DAY_OF_WEEK_IN_MONTH=2,AM_PM=0,HOUR=9,HOUR_OF_DAY=9,MINUTE=16,SECOND=8,MILLISECOND=327,ZONE_OFFSET=360,DST_OFFSET=0]


public Calendar getValidFrom() {
return validFrom;
}

public void setValidFrom(Calendar validFrom) {
this.validFrom = validFrom;
}

@Field
public void setValidFrom(String validFrom) {
Calendar cal = Calendar.getInstance();
try {
cal.setTime(dateFormat.parse(validFrom));
} catch (ParseException e) {
e.printStackTrace();
}
this.validFrom = cal;
}






Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
 
 create a setter method for the field which take s a Stringand apply
 the annotation there
 
 example
 
 
 private Calendar validFrom;
 
 @Field
 public void setvalidFrom(String s){
 //convert to Calendar object and set the field
 }
 
 
 On Fri, Nov 13, 2009 at 12:24 PM, paulhyo st...@ouestil.ch wrote:

 Hi,

 I would like to know if there is a way to add type converters when using
 getBeans. I need convertion when Updating (Calendar - String) and when
 Searching (String - Calendar)


 The Bean class defines :
 @Field
 private Calendar validFrom;

 but the recieved type within Query Response is a String (2009-11-13)...

 Actually I get this error :

 java.lang.RuntimeException: Exception while setting value : 2009-09-16 on
 private java.util.Calendar
 ch.mycompany.access.solr.impl.SoNatPersonImpl.validFrom
        at
 org.apache.solr.client.solrj.beans.DocumentObjectBinder$DocField.set(DocumentObjectBinder.java:360)
        at
 org.apache.solr.client.solrj.beans.DocumentObjectBinder$DocField.inject(DocumentObjectBinder.java:342)
        at
 org.apache.solr.client.solrj.beans.DocumentObjectBinder.getBeans(DocumentObjectBinder.java:55)
        at
 org.apache.solr.client.solrj.response.QueryResponse.getBeans(QueryResponse.java:324)
        at
 ch.mycompany.access.solr.impl.result.NatPersonPartnerResultBuilder.buildBeanListResult(NatPersonPartnerResultBuilder.java:38)
        at
 ch.mycompany.access.solr.impl.SoQueryManagerImpl.searchNatPersons(SoQueryManagerImpl.java:41)
        at
 ch.mycompany.access.solr.impl.SolrQueryManagerTest.testQueryFamilyNameRigg(SolrQueryManagerTest.java:36)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at junit.framework.TestCase.runTest(TestCase.java:164)
        at junit.framework.TestCase.runBare(TestCase.java:130)
        at junit.framework.TestResult$1.protect(TestResult.java:106)
        at junit.framework.TestResult.runProtected(TestResult.java:124)
        at junit.framework.TestResult.run(TestResult.java:109)
        at junit.framework.TestCase.run(TestCase.java:120)
        at junit.framework.TestSuite.runTest(TestSuite.java:230)
        at junit.framework.TestSuite.run(TestSuite.java:225)
        at
 org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
        at
 org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
        at
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
        at
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
        at
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
        at
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
 Caused by: java.lang.IllegalArgumentException: Can not set
 java.util.Calendar field
 ch.mycompany.access.solr.impl.SoNatPersonImpl.validFrom to
 java.lang.String
        at
 sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:146)
        at

highlighting issue lst.name is a leaf node

2009-11-13 Thread Chuck Mysak

Hello list,

I'm new to solr but from what I'm experimenting, it's awesome.
I have a small issue regarding the highlighting feature.

It finds stuff (as I see from the query analyzer), but the highlight list
looks something like this:

lst name=highlighting
lst name=c:\0596520107.pdf/
lst name=c:\0470511389.pdf/
/lst

(the files were added using  ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest(/update/extract); and I set the literal.id to
the filename)

My solrconfig.xml requesthandler looks like:

  requestHandler name=standard class=solr.SearchHandler default=true
!-- default values for query parameters --
 lst name=defaults
   str name=echoParamsexplicit/str
   !--
   int name=rows10/int
   str name=fl*/str
   str name=version2.1/str
--
   bool name=hltrue/bool
   int name=hl.snippets3/int
   int name=hl.fragsize30/int
   str name=hl.simple.pre![CDATA[span]]/str
   str name=hl.simple.post![CDATA[/span]]/str
   str name=hl.fl*/str
   bool name=hl.requireFieldMatchtrue/bool
   float name=hl.regex.slop0.5/float
   str name=hl.regex.pattern[-\w ,/\n\']{20,200}/str
   bool name=hl.usePhraseHighlightertrue/bool
 /lst
  /requestHandler

The schema.xml is untouched and downloaded yesterday from the latest stable
build.

At first, I thought it had something to do with the extraction of the pdf,
but I tried the demo xml docs also and got the same result.

I'm new to this, so please help.

Thank you,

Chuck

Re: Stop solr without losing documents

2009-11-13 Thread gwk


Michael wrote:

I've got a process external to Solr that is constantly feeding it new
documents, retrying if Solr is nonresponding.  What's the right way to
stop Solr (running in Tomcat) so no documents are lost?

Currently I'm committing all cores and then running catalina's stop
script, but between my commit and the stop, more documents can come in
that would need *another* commit...

Lots of people must have had this problem already, so I know the
answer is simple; I just can't find it!

Thanks.
Michael
  
I don't know if this is the best solution, or even if it's applicable to 
your situation but we do incremental updates from a database based on a 
timestamp, (from a simple seperate sql table filled by triggers so 
deletes are measures correctly as well). We store this timestamp in solr 
as well. Our index script first does a simple Solr request to request 
the newest timestamp and basically selects the documents to update with 
a SELECT * FROM document_updates WHERE timestamp = X where X is the 
timestamp returned from Solr (We use = for the hopefully extremely rare 
case when two updates are at the same time and also at the same time the 
index script is run where it only retrieved one of the updates, this 
will cause some documents to be updates multiple times but as document 
updates are idempotent this is no real problem.)


Regards,

gwk

Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Jan-Eirik B . Nævdal

Some extra for the pros list:

- Full control over which content to be searchable and not.
- Posibility to make pages searchable almost instant after publication
- Control over when the site is indexed


Friendly

Jan-Eirik

On Fri, Nov 13, 2009 at 10:52 AM, Lukáš Vlček lukas.vl...@gmail.com wrote:

 Hi,

 I am looking for good arguments to justify implementation a search for
 sites
 which are available on the public internet. There are many sites in
 powered
 by Solr section which are indexed by Google and other search engines but
 still they decided to invest resources into building and maintenance of
 their own search functionality and not to go with [user_query site:
 my_site.com] google search. Why?

 By no mean I am saying it makes not sense to implement Solr! But I want to
 put together list of reasons and possibly with examples. Your help would be
 much appreciated!

 Let's narrow the scope of this discussion to the following:
 - the search should cover several community sites running open source CMSs,
 JIRAs, Bugillas ... and the like
 - all documents use open formats (no need to parse Word or Excel)
 (maybe something close to what LucidImagination does for mailing lists of
 Lucene and Solr)

 My initial kick off list would be:

 pros:
 - considering we understand the content (we understand the domain scope) we
 can fine tune the search engine to provide more accurate results
 - Solr can give us facets
 - we have user search logs (valuable for analysis)
 - implementing Solr is a fun

 cons:
 - requires resources (but the cost is relatively low depending on the query
 traffic, index size and frequency of updates)

 Regards,
 Lukas

 http://blog.lukas-vlcek.com/




-- 
Jan Eirik B. Nævdal
Solutions Engineer | +47 982 65 347
Iterate AS | www.iterate.no
The Lean Software Development Consultancy

Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Markus Jelsma - Buyways B.V.

Next to the faceting engine:
- MoreLikeThis
- Highlighting
- Spellchecker

But also more flexible querying using the DisMax handler which is
clearly superior. Solr can also be used to store data which can be
retrieved in an instant! We have used this technique in a site and it is
obviously much faster than multiple large and complex SQL statements.


On Fri, 2009-11-13 at 10:52 +0100, Lukáš Vlček wrote:

 pros:
 - considering we understand the content (we understand the domain scope) we
 can fine tune the search engine to provide more accurate results
 - Solr can give us facets
 - we have user search logs (valuable for analysis)
 - implementing Solr is a fun
 
 cons:
 - requires resources (but the cost is relatively low depending on the query
 traffic, index size and frequency of updates)
 
 Regards,
 Lukas
 
 http://blog.lukas-vlcek.com/

Re: highlighting issue lst.name is a leaf node

2009-11-13 Thread Chuck Mysak

I found the solution.
If somebody will run into the same problem, here is how I solved it.

- while uploading the document:

req.setParam(uprefix, attr_);
req.setParam(fmap.content, attr_content);
req.setParam(overwrite, true);
req.setParam(commit, true);

- in the query:
http://localhost:8983/solr/select?q=attr_content:%22Django%22rows=4
- edit the solrconfig.xml in the requesthandler params

   str name=flid,title/str
so that you won't get the whole text content inside the response.

Regards,
Chuck

On Fri, Nov 13, 2009 at 11:21 AM, Chuck Mysak chuck.my...@gmail.com wrote:

 Hello list,

 I'm new to solr but from what I'm experimenting, it's awesome.
 I have a small issue regarding the highlighting feature.

 It finds stuff (as I see from the query analyzer), but the highlight list
 looks something like this:

 lst name=highlighting
 lst name=c:\0596520107.pdf/
 lst name=c:\0470511389.pdf/
 /lst

 (the files were added using  ContentStreamUpdateRequest req = new
 ContentStreamUpdateRequest(/update/extract); and I set the literal.id
 to the filename)

 My solrconfig.xml requesthandler looks like:

   requestHandler name=standard class=solr.SearchHandler
 default=true
 !-- default values for query parameters --
  lst name=defaults
str name=echoParamsexplicit/str
!--
int name=rows10/int
str name=fl*/str
str name=version2.1/str
 --
bool name=hltrue/bool
int name=hl.snippets3/int
int name=hl.fragsize30/int
str name=hl.simple.pre![CDATA[span]]/str
str name=hl.simple.post![CDATA[/span]]/str
str name=hl.fl*/str
bool name=hl.requireFieldMatchtrue/bool
float name=hl.regex.slop0.5/float
str name=hl.regex.pattern[-\w ,/\n\']{20,200}/str
bool name=hl.usePhraseHighlightertrue/bool
  /lst
   /requestHandler

 The schema.xml is untouched and downloaded yesterday from the latest stable
 build.

 At first, I thought it had something to do with the extraction of the pdf,
 but I tried the demo xml docs also and got the same result.

 I'm new to this, so please help.

 Thank you,

 Chuck

Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Chantal Ackermann




Jan-Eirik B. Nævdal schrieb:

Some extra for the pros list:

- Full control over which content to be searchable and not.
- Posibility to make pages searchable almost instant after publication
- Control over when the site is indexed


+1 expecially the last point
you can also add a robot.txt and prohibit spidering of the site to 
reduce traffic. google won't index any highly dynamic content, then.





Friendly

Jan-Eirik

On Fri, Nov 13, 2009 at 10:52 AM, Lukáš Vlček lukas.vl...@gmail.com wrote:


Hi,

I am looking for good arguments to justify implementation a search for
sites
which are available on the public internet. There are many sites in
powered
by Solr section which are indexed by Google and other search engines but
still they decided to invest resources into building and maintenance of
their own search functionality and not to go with [user_query site:
my_site.com] google search. Why?

By no mean I am saying it makes not sense to implement Solr! But I want to
put together list of reasons and possibly with examples. Your help would be
much appreciated!

Let's narrow the scope of this discussion to the following:
- the search should cover several community sites running open source CMSs,
JIRAs, Bugillas ... and the like
- all documents use open formats (no need to parse Word or Excel)
(maybe something close to what LucidImagination does for mailing lists of
Lucene and Solr)

My initial kick off list would be:

pros:
- considering we understand the content (we understand the domain scope) we
can fine tune the search engine to provide more accurate results
- Solr can give us facets
- we have user search logs (valuable for analysis)
- implementing Solr is a fun

cons:
- requires resources (but the cost is relatively low depending on the query
traffic, index size and frequency of updates)

Regards,
Lukas

http://blog.lukas-vlcek.com/





--
Jan Eirik B. Nævdal
Solutions Engineer | +47 982 65 347
Iterate AS | www.iterate.no
The Lean Software Development Consultancy

Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Andrew Clegg



Lukáš Vlček wrote:
 
 I am looking for good arguments to justify implementation a search for
 sites
 which are available on the public internet. There are many sites in
 powered
 by Solr section which are indexed by Google and other search engines but
 still they decided to invest resources into building and maintenance of
 their own search functionality and not to go with [user_query site:
 my_site.com] google search. Why?
 

You're assuming that Solr is just used in these cases to index discrete web
pages which Google etc. would be able to access via following navigational
links.

I would imagine that in a lot of cases, Solr is used to index database
entities which are used to build [parts of] pages dynamically, and which
might be viewable in different forms in various different pages.

Plus, with stored fields, you have the option of actually driving a website
off Solr instead of directly off a database, which might make sense from a
speed perspective in some cases.

And further, going back to page-only indexing -- you have no guarantee when
Google will decide to recrawl your site, so there may be a delay before
changes show up in their index. With an in-house search engine you can
reindex as often as you like.

Andrew.

-- 
View this message in context: 
http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Lukáš Vlček

Hi,

thanks for inputs so far... however, let's put it this way:

When you need to search for something Lucene or Solr related, which one do
you use:
- generic Google
- go to a particular mail list web site and search from here (if there is
any search form at all)
- go to LucidImagination.com and use its search capability

Regards,
Lukas

On Fri, Nov 13, 2009 at 11:50 AM, Andrew Clegg andrew.cl...@gmail.comwrote:

Lukáš Vlček wrote:

I am looking for good arguments to justify implementation a search for
sites
which are available on the public internet. There are many sites in
powered
by Solr section which are indexed by Google and other search engines but
still they decided to invest resources into building and maintenance of
their own search functionality and not to go with [user_query site:
my_site.com] google search. Why?

You're assuming that Solr is just used in these cases to index discrete web
pages which Google etc. would be able to access via following navigational
links.

I would imagine that in a lot of cases, Solr is used to index database
entities which are used to build [parts of] pages dynamically, and which
might be viewable in different forms in various different pages.

Plus, with stored fields, you have the option of actually driving a website
off Solr instead of directly off a database, which might make sense from a
speed perspective in some cases.

And further, going back to page-only indexing -- you have no guarantee when
Google will decide to recrawl your site, so there may be a delay before
changes show up in their index. With an in-house search engine you can
reindex as often as you like.

Andrew.

--
View this message in context:
http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html
Sent from the Solr - User mailing list archive at Nabble.com.

Data import problem with child entity from different database

2009-11-13 Thread Andrew Clegg


Morning all,

I'm having problems with joining child a child entity from one database to a
parent from another...

My entity definitions look like this (names changed for brevity):

entity name=parent dataSource=db1 query=select a, b, c from
parent_table

  entity name=child dataSource=db2 onError=continue query=select c,
d from child_table where c = '${parent.c}' /

/entity

c is getting indexed fine (it's stored, I can see field 'c' in the search
results) but child.d isn't. I know the child table has data for the
corresponding parent rows, and I've even watched the SQL queries against the
child table appearing in Oracle's sqldeveloper as the DataImportHandler
runs. But no content for child.d gets into the index.

My schema contains a definition for a field called d like so:

field name=d type=keywords_ids indexed=true stored=true
multiValued=true termVectors=true /

(keywords_ids is a conservatively-analyzed text type which has worked fine
in other contexts.)

Two things occur to me.

1. db1 is PostgreSQL and db2 is Oracle, although the d field in both tables
is just a char(4), nothing fancy. Could something weird with character
encodings be happening?

2. d isn't a primary key in either parent or child, but this shouldn't
matter should it?

Additional data points -- I also tried using the CachedSqlEntityProcessor to
do in-memory table caching of child, but it didn't work then either. I got a
lot of error messages like this:

No value available for the cache key : d in the entity : child

If anyone knows whether this is a known limitation (if so I can work round
it), or an unexpected case (if so I'll file a bug report), please shout. I'm
using 1.4.

Yet again, many thanks :-)

Andrew.

-- 
View this message in context: 
http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26334948.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Andrew Clegg



Lukáš Vlček wrote:
 
 When you need to search for something Lucene or Solr related, which one do
 you use:
 - generic Google
 - go to a particular mail list web site and search from here (if there is
 any search form at all)
 

Both of these (Nabble in the second case) in case any recent posts have
appeared which Google hasn't picked up.

Andrew.

-- 
View this message in context: 
http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334980.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Jon Baer

For this list I usually end up @ http://solr.markmail.org (which I believe also
uses Lucene under the hood)

Google is such a black box ...

Pros:
+ 1 Open Source (enough said :-)

There also seems to always be the notion that crawling leads itself to
produce the best results but that is rarely the case. And unless you are a
special type of site Google will not overlay your results w/ some type of
context in the search (ie news or sports, etc).

What I think really needs to happen is Solr (and is a bit missing @ the moment)
is there needs to be a common interface to reindexing another index (if that
makes sense) ... something akin or like OpenSearch
(http://www.opensearch.org/Community/OpenSearch_software)

For example what I would like to do is have my site, have my search index, and
connect Google to indexing just to my search index (and not crawl the site) ...
the only current option for something like that are sitemaps which I think Solr
(templates) should have a contrib project for (but you would have to generate
these offline for sure).

- Jon

On Nov 13, 2009, at 6:00 AM, Lukáš Vlček wrote:

Hi,

thanks for inputs so far... however, let's put it this way:

Regards,
Lukas

On Fri, Nov 13, 2009 at 11:50 AM, Andrew Clegg andrew.cl...@gmail.comwrote:

Lukáš Vlček wrote:

You're assuming that Solr is just used in these cases to index discrete web
pages which Google etc. would be able to access via following navigational
links.

Plus, with stored fields, you have the option of actually driving a website
off Solr instead of directly off a database, which might make sense from a
speed perspective in some cases.

Andrew.

--
View this message in context:
http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Selection of terms for MoreLikeThis

2009-11-13 Thread Andrew Clegg

Any ideas on this? Is it worth sending a bug report?

Those links are live, by the way, in case anyone wants to verify that MLT is
returning suggestions with very low tf.idf.

Cheers,

Andrew.

Andrew Clegg wrote:

Hi,

If I run a MoreLikeThis query like the following:

http://www.cathdb.info/solr/mlt?q=id:3.40.50.720rows=0mlt.interestingTerms=listmlt.match.include=falsemlt.fl=keywordsmlt.mintf=1mlt.mindf=1

one of the hits in the results is and (I don't do any stopword removal
on this field).

However if I look inside that document with the TermVectorComponent:

http://www.cathdb.info/solr/select/?q=id:3.40.50.720tv=truetv.all=truetv.fl=keywords

I see that and has a measly tf.idf of 7.46E-4. But there are other terms
with *much* higher tf.idf scores, e.g.:

lst name=aquaspirillum
int name=tf1/int
int name=df10/int
double name=tf-idf0.1/double
/lst

that *don't* appear in the MoreLikeThis list. (I tried adding
mlt.maxwl=999 to the end of the MLT query but it makes no difference.)

What's going on? Surely something with tf.idf = 0.1 is a far better
candidate for a MoreLikeThis query than something with tf.idf = 1.46E-4?
Or does MoreLikeThis do some other heuristic magic to select good
candidates, and sometimes get it wrong?

BTW the keywords field is indexed, stored, multi-valued and term-vectored.

Thanks,

Andrew.

--
:: http://biotext.org.uk/ ::

--
View this message in context:
http://old.nabble.com/Selection-of-terms-for-MoreLikeThis-tp26286005p26335061.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Data import problem with child entity from different database

2009-11-13 Thread Noble Paul നോബിള്‍ नोब्ळ्

no obvious issues.
you may post your entire data-config.xml

do w/o CachedSqlEntityProcessor first and then apply that later

On Fri, Nov 13, 2009 at 4:38 PM, Andrew Clegg andrew.cl...@gmail.com wrote:

Morning all,

I'm having problems with joining child a child entity from one database to a
parent from another...

My entity definitions look like this (names changed for brevity):

entity name=parent dataSource=db1 query=select a, b, c from
parent_table

entity name=child dataSource=db2 onError=continue query=select c,
d from child_table where c = '${parent.c}' /

/entity

c is getting indexed fine (it's stored, I can see field 'c' in the search
results) but child.d isn't. I know the child table has data for the
corresponding parent rows, and I've even watched the SQL queries against the
child table appearing in Oracle's sqldeveloper as the DataImportHandler
runs. But no content for child.d gets into the index.

My schema contains a definition for a field called d like so:

field name=d type=keywords_ids indexed=true stored=true
multiValued=true termVectors=true /

(keywords_ids is a conservatively-analyzed text type which has worked fine
in other contexts.)

Two things occur to me.

1. db1 is PostgreSQL and db2 is Oracle, although the d field in both tables
is just a char(4), nothing fancy. Could something weird with character
encodings be happening?

2. d isn't a primary key in either parent or child, but this shouldn't
matter should it?

Additional data points -- I also tried using the CachedSqlEntityProcessor to
do in-memory table caching of child, but it didn't work then either. I got a
lot of error messages like this:

No value available for the cache key : d in the entity : child

If anyone knows whether this is a known limitation (if so I can work round
it), or an unexpected case (if so I'll file a bug report), please shout. I'm
using 1.4.

Yet again, many thanks :-)

Andrew.

--
View this message in context:
http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26334948.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
-
Noble Paul | Principal Engineer| AOL | http://aol.com

exclude some fields from copying dynamic fields | schema.xml

2009-11-13 Thread Vicky_Dev


Hi, 
we are using the following entry in schema.xml to make a copy of one type of
dynamic field to another : 
copyField source=*_s dest=*_str_s / 

Is it possible to exclude some fields from copying.

We are using Solr1.3

~Vikrant

-- 
View this message in context: 
http://old.nabble.com/exclude-some-fields-from-copying-dynamic-fields-%7C-schema.xml-tp26335109p26335109.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Data import problem with child entity from different database

2009-11-13 Thread Andrew Clegg




Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
 
 no obvious issues.
 you may post your entire data-config.xml
 

Here it is, exactly as last attempt but with usernames etc. removed.

Ignore the comments and the unused FileDataSource...

http://old.nabble.com/file/p26335171/dataimport.temp.xml dataimport.temp.xml 


Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
 
 do w/o CachedSqlEntityProcessor first and then apply that later
 

Yep, that was just a bit of a wild stab in the dark to see if it made any
difference.

Thanks,

Andrew.

-- 
View this message in context: 
http://old.nabble.com/Data-import-problem-with-child-entity-from-different-database-tp26334948p26335171.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Michael McCandless

I think we sorely need a Directory impl that down-prioritizes IO
performed by merging.

It would be wonderful if from Java we could simply set a per-thread
IO priority, but, it'll be a looong time until that's possible.

So I think for now we should make a Directory impl that emulates such
behavior, eg Lucene could state the context (merge, flush, search,
nrt-reopen, etc.) whenever it opens an IndexInput / IndexOutput, and
then the Directory could hack in pausing the merge IO whenever
search/nrt-reopen IO is active.

Mike

On Thu, Nov 12, 2009 at 7:18 PM, Mark Miller markrmil...@gmail.com wrote:
 Jerome L Quinn wrote:
 Hi, everyone, this is a problem I've had for quite a while,
 and have basically avoided optimizing because of it.  However,
 eventually we will get to the point where we must delete as
 well as add docs continuously.

 I have a Solr 1.3 index with ~4M docs at around 90G.  This is a single
 instance running inside tomcat 6, so no replication.  Merge factor is the
 default 10.  ramBufferSizeMB is 32.  maxWarmingSearchers=4.
 autoCommit is set at 3 sec.

 We continually push new data into the index, at somewhere between 1-10 docs
 every 10 sec or so.  Solr is running on a quad-core 3.0GHz server.
 under IBM java 1.6.  The index is sitting on a local 15K scsi disk.
 There's nothing
 else of substance running on the box.

 Optimizing the index takes about 65 min.

 As long as I'm not optimizing, search and indexing times are satisfactory.

 When I start the optimize, I see massive problems with timeouts pushing new
 docs
 into the index, and search times balloon.  A typical search while
 optimizing takes
 about 1 min instead of a few seconds.

 Can anyone offer me help with fixing the problem?

 Thanks,
 Jerry Quinn

 Ah, the pains of optimization. Its kind of just how it is. One solution
 is to use two boxes and replication - optimize on the master, and then
 queries only hit the slave. Out of reach for some though, and adds many
 complications.

 Another kind of option is to use the partial optimize feature:

  optimize maxOptimizeSegments=5/

 Using this, you can optimize down to n segments and take a shorter hit
 each time.

 Also, if optimizing is so painful, you might lower the merge factor
 amortize that pain better. Thats another way to slowly get there - if
 you lower the merge factor, as merging takes place, the new merge factor
 will be respected, and semgents will merge down. A merge factor of 2
 (the lowest) will make it so you only ever have 2 segments. Sometimes
 that works reasonably well - you could try 3-6 or something as well.
 Then when you do your partial optimizes (and eventually a full optimize
 perhaps), you want have so far to go.

 --
 - Mark

 http://www.lucidimagination.com

Re: javabin in .NET?

2009-11-13 Thread Mauricio Scheffer

Nope. It has to be manually ported. Not so much because of the language
itself but because of differences in the libraries.


2009/11/13 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 Is there any tool to directly port java to .Net? then we can etxract
 out the client part of the javabin code and convert it.

 On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher erik.hatc...@gmail.com
 wrote:
  Has anyone looked into using the javabin response format from .NET
 (instead
  of SolrJ)?
 
  It's mainly a curiosity.
 
  How much better could performance/bandwidth/throughput be?  How difficult
  would it be to implement some .NET code (C#, I'd guess being the best
  choice) to handle this response format?
 
  Thanks,
 Erik
 
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Selection of terms for MoreLikeThis

2009-11-13 Thread Chantal Ackermann

Hi Andrew,

no idea, I'm afraid - but could you sent the output of
interestingTerms=details?
This at least would show what MoreLikeThis uses, in comparison to the
TermVectorComponent you've already pasted.

Chantal

Andrew Clegg schrieb: