Re: Nested table support ability

2010-06-23 Thread amit_ak

Hi Otis, Thanks for the update.

My paramteric search has to span across customer table and 30 child tables.
We have close to 1 million customers. Do you think Lucene/Solr is the right
fsolution for such requirements? or database search would be more optimal.

Regards,
Amit

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Nested-table-support-ability-tp905253p916087.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Field missing when use distributed search + dismax

2010-06-23 Thread Scott Zhang
Hi. Lance.

Thanks for replying.

Yes. I especially checked the schema.xml and did another simple test.
The broker is running on localhost:7499/solr.  A solr instance is running on
localhost:7498/solr. For this test, I only use these 2 instances. 7499's
index is empty. 7498 has 12 documents in index. I copied the schema.xml from
7498 to 7499 before test.
1. http://localhost:7498/solr/select
I get:
.
result name=response numFound=12 start=0
-
doc
str name=idgppost_6179/str
str name=typegppost/str
/doc
.

2. http://localhost:7499/solr/select
I get:
result name=response numFound=0 start=0/

3. http://localhost:7499/solr/select?shards=localhost:7498/solr
I get:
result name=response numFound=12 start=0
-
doc
str name=idgppost_6179/str
/doc
-
doc
str name=idgppost_6282/str
/doc

So strange!

I then checked with standard searchhandler.
1. http://localhost:7499/solr/select?shards=localhost:7498/solrq=marship
result name=response numFound=1 start=0
-
doc
str name=idmember_marship11/str
str name=typemember/str
date name=date2010-01-21T00:00:00Z/date
/doc
/result

And 2.
http://localhost:7499/solr/select?shards=localhost:7498/solrq=marshipqt=dismax
result name=response numFound=1 start=0
-
doc
str name=idmember_marship11/str
/doc
/result

So strange!

On Wed, Jun 23, 2010 at 11:12 AM, Lance Norskog goks...@gmail.com wrote:

 Do all of the Solr instances, including the broker, use the same
 schema.xml?

 On 6/22/10, Scott Zhang macromars...@gmail.com wrote:
  Hi. All.
 I was using distributed search over 30 solr instance, the previous one
  was using the standard query handler. And the result was returned
 correctly.
  each result has 2 fields. ID and type.
 Today I want to use search withk dismax, I tried search with each
  instance with dismax. It works correctly, return ID and type for each
  result. The strange thing is when I
  use distributed search, the result only have ID. The field type
  disappeared. I need that type to know what the ID refer to. Why solr
  eat my type?
 
 
  Thanks.
  Regards.
  Scott
 


 --
 Lance Norskog
 goks...@gmail.com



Re: Nested table support ability

2010-06-23 Thread Govind Kanshi
Amit - unless you test it would not be apparent. Key piece is as Otis
mentioned flatten everything. This requires effort from your side to
actually create documents in manner suitable for your searches. The
relationship needs to be merged into the document. To avoid storing text
representations  - you may want to store just the identifier and use front
end to translate between human readable text vs stored identifier.
Taking your case further - Rather than storing ADMIN store just a
representation may be a smallint with customer information.

On Wed, Jun 23, 2010 at 11:30 AM, amit_ak amit...@mindtree.com wrote:


 Hi Otis, Thanks for the update.

 My paramteric search has to span across customer table and 30 child tables.
 We have close to 1 million customers. Do you think Lucene/Solr is the right
 fsolution for such requirements? or database search would be more optimal.

 Regards,
 Amit

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Nested-table-support-ability-tp905253p916087.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Field Collapsing SOLR-236

2010-06-23 Thread Rakhi Khatwani
Hi,
   Patching did work. but when i build the trunk, i get the following
exception:

[SolrTrunk]# ant compile
Buildfile: /testWorkspace/SolrTrunk/build.xml

init-forrest-entities:
  [mkdir] Created dir: /testWorkspace/SolrTrunk/build
  [mkdir] Created dir: /testWorkspace/SolrTrunk/build/web

compile-lucene:

BUILD FAILED
/testWorkspace/SolrTrunk/common-build.xml:207:
/testWorkspace/modules/analysis/common does not exist.

Regards,
Raakhi

On Wed, Jun 23, 2010 at 2:39 AM, Martijn v Groningen 
martijn.is.h...@gmail.com wrote:

 What exactly did not work? Patching, compiling or running it?

 On 22 June 2010 16:06, Rakhi Khatwani rkhatw...@gmail.com wrote:
  Hi,
   I tried checking out the latest code (rev 956715) the patch did not
  work on it.
  Infact i even tried hunting for the revision mentioned earlier in this
  thread (i.e. rev 955615) but cannot find it in the repository. (it has
  revision 955569 followed by revision 955785).
 
  Any pointers??
  Regards
  Raakhi
 
  On Tue, Jun 22, 2010 at 2:03 AM, Martijn v Groningen 
  martijn.is.h...@gmail.com wrote:
 
  Oh in that case is the code stable enough to use it for production?
  -  Well this feature is a patch and I think that says it all.
  Although bugs are fixed it is deferentially an experimental feature
  and people should keep that in mind when using one of the patches.
  Does it support features which solr 1.4 normally supports?
 - As far as I know yes.
 
  am using facets as a workaround but then i am not able to sort on any
  other field. is there any workaround to support this feature??
 - Maybee http://wiki.apache.org/solr/Deduplication prevents from
  adding duplicates in you index, but then you miss the collapse counts
  and other computed values
 
  On 21 June 2010 09:04, Rakhi Khatwani rkhatw...@gmail.com wrote:
   Hi,
  Oh in that case is the code stable enough to use it for production?
   Does it support features which solr 1.4 normally supports?
  
   I am using facets as a workaround but then i am not able to sort on
 any
   other field. is there any workaround to support this feature??
  
   Regards,
   Raakhi
  
   On Fri, Jun 18, 2010 at 6:14 PM, Martijn v Groningen 
   martijn.is.h...@gmail.com wrote:
  
   Hi Rakhi,
  
   The patch is not compatible with 1.4. If you want to work with the
   trunk. I'll need to get the src from
   https://svn.apache.org/repos/asf/lucene/dev/trunk/
  
   Martijn
  
   On 18 June 2010 13:46, Rakhi Khatwani rkhatw...@gmail.com wrote:
Hi Moazzam,
   
 Where did u get the src code from??
   
I am downloading it from
https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4
   
and the latest revision in this location is 955469.
   
so applying the latest patch(dated 17th june 2010) on it still
  generates
errors.
   
Any Pointers?
   
Regards,
Raakhi
   
   
On Fri, Jun 18, 2010 at 1:24 AM, Moazzam Khan moazz...@gmail.com
   wrote:
   
I knew it wasn't me! :)
   
I found the patch just before I read this and applied it to the
 trunk
and it works!
   
Thanks Mark and martijn for all your help!
   
- Moazzam
   
On Thu, Jun 17, 2010 at 2:16 PM, Martijn v Groningen
martijn.is.h...@gmail.com wrote:
 I've added a new patch to the issue, so building the trunk (rev
 955615) with the latest patch should not be a problem. Due to
  recent
 changes in the Lucene trunk the patch was not compatible.

 On 17 June 2010 20:20, Erik Hatcher erik.hatc...@gmail.com
  wrote:

 On Jun 16, 2010, at 7:31 PM, Mark Diggory wrote:

 p.s. I'd be glad to contribute our Maven build re-organization
  back
   to
the
 community to get Solr properly Mavenized so that it can be
   distributed
and
 released more often.  For us the benefit of this structure is
  that
   we
will
 be able to overlay addons such as RequestHandlers and other
 third
   party
 support without having to rebuild Solr from scratch.

 But you don't have to rebuild Solr from scratch to add a new
  request
handler
 or other plugins - simply compile your custom stuff into a JAR
 and
   put
it in
 solr-home/lib (or point to it with lib in solrconfig.xml).

  Ideally, a Maven Archetype could be created that would allow
 one
rapidly
 produce a Solr webapp and fire it up in Jetty in mere seconds.

 How's that any different than cd example; java -jar start.jar?
  Or
  do
you
 mean a Solr client webapp?

 Finally, with projects such as Bobo, integration with Spring
  would
   make
 configuration more consistent and request significantly less
 java
coding
 just to add new capabilities everytime someone authors a new
RequestHandler.

 It's one line of config to add a new request handler.  How many
ridiculously
 ugly confusing lines of Spring XML would it take?

  The biggest thing I learned 

Re: Field Collapsing SOLR-236

2010-06-23 Thread Rakhi Khatwani
Oops this is probably i didn't checkout the modules file from the trunk.
doing that right now :)

Regards
Raakhi

On Wed, Jun 23, 2010 at 1:12 PM, Rakhi Khatwani rkhatw...@gmail.com wrote:

 Hi,
Patching did work. but when i build the trunk, i get the following
 exception:

 [SolrTrunk]# ant compile
 Buildfile: /testWorkspace/SolrTrunk/build.xml

 init-forrest-entities:
   [mkdir] Created dir: /testWorkspace/SolrTrunk/build
   [mkdir] Created dir: /testWorkspace/SolrTrunk/build/web

 compile-lucene:

 BUILD FAILED
 /testWorkspace/SolrTrunk/common-build.xml:207:
 /testWorkspace/modules/analysis/common does not exist.

 Regards,
 Raakhi

 On Wed, Jun 23, 2010 at 2:39 AM, Martijn v Groningen 
 martijn.is.h...@gmail.com wrote:

 What exactly did not work? Patching, compiling or running it?

 On 22 June 2010 16:06, Rakhi Khatwani rkhatw...@gmail.com wrote:
  Hi,
   I tried checking out the latest code (rev 956715) the patch did not
  work on it.
  Infact i even tried hunting for the revision mentioned earlier in this
  thread (i.e. rev 955615) but cannot find it in the repository. (it has
  revision 955569 followed by revision 955785).
 
  Any pointers??
  Regards
  Raakhi
 
  On Tue, Jun 22, 2010 at 2:03 AM, Martijn v Groningen 
  martijn.is.h...@gmail.com wrote:
 
  Oh in that case is the code stable enough to use it for production?
  -  Well this feature is a patch and I think that says it all.
  Although bugs are fixed it is deferentially an experimental feature
  and people should keep that in mind when using one of the patches.
  Does it support features which solr 1.4 normally supports?
 - As far as I know yes.
 
  am using facets as a workaround but then i am not able to sort on any
  other field. is there any workaround to support this feature??
 - Maybee http://wiki.apache.org/solr/Deduplication prevents from
  adding duplicates in you index, but then you miss the collapse counts
  and other computed values
 
  On 21 June 2010 09:04, Rakhi Khatwani rkhatw...@gmail.com wrote:
   Hi,
  Oh in that case is the code stable enough to use it for
 production?
   Does it support features which solr 1.4 normally supports?
  
   I am using facets as a workaround but then i am not able to sort on
 any
   other field. is there any workaround to support this feature??
  
   Regards,
   Raakhi
  
   On Fri, Jun 18, 2010 at 6:14 PM, Martijn v Groningen 
   martijn.is.h...@gmail.com wrote:
  
   Hi Rakhi,
  
   The patch is not compatible with 1.4. If you want to work with the
   trunk. I'll need to get the src from
   https://svn.apache.org/repos/asf/lucene/dev/trunk/
  
   Martijn
  
   On 18 June 2010 13:46, Rakhi Khatwani rkhatw...@gmail.com wrote:
Hi Moazzam,
   
 Where did u get the src code from??
   
I am downloading it from
https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4
   
and the latest revision in this location is 955469.
   
so applying the latest patch(dated 17th june 2010) on it still
  generates
errors.
   
Any Pointers?
   
Regards,
Raakhi
   
   
On Fri, Jun 18, 2010 at 1:24 AM, Moazzam Khan moazz...@gmail.com
 
   wrote:
   
I knew it wasn't me! :)
   
I found the patch just before I read this and applied it to the
 trunk
and it works!
   
Thanks Mark and martijn for all your help!
   
- Moazzam
   
On Thu, Jun 17, 2010 at 2:16 PM, Martijn v Groningen
martijn.is.h...@gmail.com wrote:
 I've added a new patch to the issue, so building the trunk (rev
 955615) with the latest patch should not be a problem. Due to
  recent
 changes in the Lucene trunk the patch was not compatible.

 On 17 June 2010 20:20, Erik Hatcher erik.hatc...@gmail.com
  wrote:

 On Jun 16, 2010, at 7:31 PM, Mark Diggory wrote:

 p.s. I'd be glad to contribute our Maven build
 re-organization
  back
   to
the
 community to get Solr properly Mavenized so that it can be
   distributed
and
 released more often.  For us the benefit of this structure is
  that
   we
will
 be able to overlay addons such as RequestHandlers and other
 third
   party
 support without having to rebuild Solr from scratch.

 But you don't have to rebuild Solr from scratch to add a new
  request
handler
 or other plugins - simply compile your custom stuff into a JAR
 and
   put
it in
 solr-home/lib (or point to it with lib in solrconfig.xml).

  Ideally, a Maven Archetype could be created that would allow
 one
rapidly
 produce a Solr webapp and fire it up in Jetty in mere
 seconds.

 How's that any different than cd example; java -jar start.jar?
  Or
  do
you
 mean a Solr client webapp?

 Finally, with projects such as Bobo, integration with Spring
  would
   make
 configuration more consistent and request significantly less
 java
coding
 just to add new capabilities everytime someone authors 

Re: Field missing when use distributed search + dismax

2010-06-23 Thread Scott Zhang
Hi. All.

I found more about fields missing things.
I tried the default distributed search example which configured 2 instances,
one on 8983 and another on 7574.
When I try search with standard query handler, the result fields are all
right.
When I search with the deafult dismax, some fields disappeared. Not sure
why.

Can anyone test this and confirm the reason?

Thanks.
Regards.


On Wed, Jun 23, 2010 at 2:50 PM, Scott Zhang macromars...@gmail.com wrote:

 Hi. Lance.

 Thanks for replying.

 Yes. I especially checked the schema.xml and did another simple test.
 The broker is running on localhost:7499/solr.  A solr instance is running
 on localhost:7498/solr. For this test, I only use these 2 instances. 7499's
 index is empty. 7498 has 12 documents in index. I copied the schema.xml from
 7498 to 7499 before test.
 1. http://localhost:7498/solr/select
 I get:
 .
 result name=response numFound=12 start=0
 -
 doc
 str name=idgppost_6179/str
 str name=typegppost/str
 /doc
 .

 2. http://localhost:7499/solr/select
 I get:
 result name=response numFound=0 start=0/

 3. http://localhost:7499/solr/select?shards=localhost:7498/solr
 I get:
 result name=response numFound=12 start=0
 -
 doc
 str name=idgppost_6179/str
 /doc
 -
 doc
 str name=idgppost_6282/str
 /doc

 So strange!

 I then checked with standard searchhandler.
 1. http://localhost:7499/solr/select?shards=localhost:7498/solrq=marship
 result name=response numFound=1 start=0
 -
 doc
 str name=idmember_marship11/str
 str name=typemember/str
 date name=date2010-01-21T00:00:00Z/date
 /doc
 /result

 And 2.
 http://localhost:7499/solr/select?shards=localhost:7498/solrq=marshipqt=dismax
 result name=response numFound=1 start=0
 -
 doc
 str name=idmember_marship11/str
 /doc
 /result

 So strange!


 On Wed, Jun 23, 2010 at 11:12 AM, Lance Norskog goks...@gmail.com wrote:

 Do all of the Solr instances, including the broker, use the same
 schema.xml?

 On 6/22/10, Scott Zhang macromars...@gmail.com wrote:
  Hi. All.
 I was using distributed search over 30 solr instance, the previous
 one
  was using the standard query handler. And the result was returned
 correctly.
  each result has 2 fields. ID and type.
 Today I want to use search withk dismax, I tried search with each
  instance with dismax. It works correctly, return ID and type for
 each
  result. The strange thing is when I
  use distributed search, the result only have ID. The field type
  disappeared. I need that type to know what the ID refer to. Why solr
  eat my type?
 
 
  Thanks.
  Regards.
  Scott
 


 --
 Lance Norskog
 goks...@gmail.com





Re: Searching across multiple repeating fields

2010-06-23 Thread Mark Allan

Cheers, Geert-Jan, that's very helpful.

We won't always be searching with dates and we wouldn't want  
duplicates to show up in the results, so your second suggestion looks  
like a good workaround if I can't solve the actual problem.  I didn't  
know about FieldCollapsing, so I'll definitely keep it in mind.


Thanks
Mark

On 22 Jun 2010, at 3:44 pm, Geert-Jan Brits wrote:


Perhaps my answer is useless, bc I don't have an answer to your direct
question, but:
You *might* want to consider if your concept of a solr-document is  
on the

correct granular level, i.e:

your problem posted could be tackled (afaik) by defining a  document  
being a

'sub-event' with only 1 daterange.
So for each event-doc you have now, this is replaced by several sub- 
event

docs in this proposed situation.

Additionally each sub-event doc gets an additional field 'parent- 
eventid'
which maps to something like an event-id (which you're probably  
using) .

So several sub-event docs can point to the same event-id.

Lastly, all sub-event docs belonging to a particular event implement  
all the

other fields that you may have stored in that particular event-doc.

Now you can query for events based on data-rages like you  
envisioned, but
instead of returning events you return sub-event-docs. However since  
all
data of the original event (except the multiple dateranges) is  
available in
the subevent-doc this shouldn't really bother the client. If you  
need to

display all dates of an event (the only info missing from the returned
solr-doc) you could easily store it in a RDB and fetch it using the  
defined

parent-eventid.

The only caveat I see, is that possibly multiple sub-events with the  
same

'parent-eventid' might get returned for a particular query.
This however depends on the type of queries you envision. i.e:
1)  If you always issue queries with date-filters, and *assuming* that
sub-events of a particular event don't temporally overlap, you will  
never

get multiple sub-events returned.
2)  if 1)  doesn't hold and assuming you *do* mind multiple sub- 
events of

the same actual event, you could try to use Field Collapsing on
'parent-eventid' to only return the first sub-event per parent- 
eventid that
matches the rest of your query. (Note however, that Field Collapsing  
is a

patch at the moment. http://wiki.apache.org/solr/FieldCollapsing)

Not sure if this helped you at all, but at the very least it was a  
nice

conceptual exercise ;-)

Cheers,
Geert-Jan


2010/6/22 Mark Allan mark.al...@ed.ac.uk


Hi all,

Firstly, I apologise for the length of this email but I need to  
describe

properly what I'm doing before I get to the problem!

I'm working on a project just now which requires the ability to  
store and
search on temporal coverage data - ie. a field which specifies a  
date range

during which a certain event took place.

I hunted around for a few days and couldn't find anything which  
seemed to
fit, so I had a go at writing my own field type based on  
solr.PointType.

It's used as follows:
schema.xml
  fieldType name=temporal class=solr.TemporalCoverage
dimension=2 subFieldSuffix=_i/
  field name=daterange type=temporal indexed=true  
stored=true

multiValued=true/
data.xml
  add
  doc
  ...
  field name=daterange1940,1945/field
  /doc
  /add

Internally, this gets stored as:
  arr name=daterangestr1940,1945/str/arr
  int name=daterange_0_i1940/int
  int name=daterange_1_i1945/int

In due course, I'll declare the subfields as a proper date type,  
but in the
meantime, this works absolutely fine.  I can search for an  
individual date
and Solr will check (queryDate  daterange_0 AND queryDate   
daterange_1 )
and the correct documents are returned.  My code also allows the  
user to
input a date range in the query but I won't complicate matters with  
that

just now!

The problem arises when a document has more than one daterange  
field
(imagine a news broadcast which covers a variety of topics and  
hence time

periods).

A document with two daterange fields
  doc
  ...
  field name=daterange19820402,19820614/field
  field name=daterange1990,2000/field
  /doc
gets stored internally as
  arr
name=daterangestr19820402,19820614/strstr1990,2000/str/ 
arr
  arr name=daterange_0_iint19820402/intint1990/int/ 
arr
  arr name=daterange_1_iint19820614/intint2000/int/ 
arr


In this situation, searching for 1985 should yield zero results as  
it is
contained within neither daterange, however, the above document is  
returned
in the result set.  What Solr is doing is checking that the  
queryDate (1985)
is greater than *any* of the values in daterange_0 AND queryDate is  
less

than *any* of the values in daterange_1.

How can I get Solr to respect the positions of each item in the  
daterange_0
and _1 arrays?  Ideally I'd like the search to use the following  
logic, thus
preventing the above document from being returned in a search for  
1985:

Re: OOM on sorting on dynamic fields

2010-06-23 Thread Matteo Fiandesio
Hi to all,
we moved solr with patched lucene's FieldCache in production environment.
During tests we noticed random ConcurrentModificationException calling
the getCacheEntries method due to this bug

https://issues.apache.org/jira/browse/LUCENE-2273

We applied that patch as well, and added an abstract int
getCacheSize() method to FieldCache abstract class and its
implementation in abstract Cache inner class in CacheFieldImpl  that
returns the cache size without instantiating a CacheEntry array.

Response time are slower on cache purging but acceptable from the user
point of view.
Regards,
Matteo


On 22 June 2010 22:41, Matteo Fiandesio matteo.fiande...@gmail.com wrote:
 Fields over i'm sorting to are dynamic so one query sorts on
 erick_time_1,erick_timeA_1 and other sorts on erick_time_2 and so
 on.What we see in the heap are a lot of arrays,most of them,filled
 with 0s maybe due to the fact that this timestamps fields are not
 present in all the documents.

 By the way,
 I have a script that generates the OOM in 10 minutes on our solr
 instance and with the temporary patch it runned without any problems.
 The side effect is that when the cache is purged next query that
 regenerates the  cache is a little bit slower.

 I'm aware that the solution is unelegant and we are investigating to
 solve the problem in another way.
 Regards,
 Matteo


 On 22 June 2010 19:25, Erick Erickson erickerick...@gmail.com wrote:
 Hmmm, I'm missing something here then. Sorting over 15 fields of type long
 shouldn't use much memory, even if all the values are unique. When you say
 12-15 dynamic fields, are you talking about 12-15 fields per query out of
 XXX total fields? And is XXX large? At a guess, how many different fields
 do
 you think you're sorting over cumulative by the time you get your OOM?
 Note if you sort over the field erick_time in 10 different queries, I'm
 only counting that as 1 field. I guess another way of asking this is
 how many dynamic fields are there total?.

 If this is really a sorting issue, you should be able to force this to
 happen
 almost immediately by firing off enough sort queries at the server. It'll
 tell you a lot if you can't make this happen, even on a relatively small
 test machine.

 Best
 Erick

 On Tue, Jun 22, 2010 at 12:59 PM, Matteo Fiandesio 
 matteo.fiande...@gmail.com wrote:

 Hi Erick,
 the index is quite small (1691145 docs) but sorting is massive and
 often on unique timestamp fields.

 OOM occur after a range of time between three and four hours.
 Depending as well if users browse a part of the application.

 We use solrj to make the queries so we did not use Readers objects
 directly.

 Without sorting we don't see the problem
 Regards,
 Matteo

 On 22 June 2010 17:01, Erick Erickson erickerick...@gmail.com wrote:
  H.. A couple of details I'm wondering about. How many
  documents are we talking about in your index? Do you get
  OOMs when you start fresh or does it take a while?
 
  You've done some good investigations, so it seems like there
  could well be something else going on here than just the usual
  suspects of sorting
 
  I'm wondering if you aren't really closing readers somehow.
  Are you updating your index frequently and re-opening readers often?
  If so, how?
 
  I'm assuming that if you do NOT sort on all these fields, you don't have
  the problem, is that true?
 
  Best
  Erick
 
  On Fri, Jun 18, 2010 at 10:52 AM, Matteo Fiandesio 
  matteo.fiande...@gmail.com wrote:
 
  Hello,
  we are experiencing OOM exceptions in our single core solr instance
  (on a (huge) amazon EC2 machine).
  We investigated a lot in the mailing list and through jmap/jhat dump
  analyzing and the problem resides in the lucene FieldCache that fills
  the heap and blows up the server.
 
  Our index is quite small but we have a lot of sort queries  on fields
  that are dynamic,of type long representing timestamps and are not
  present in all the documents.
  Those queries apply sorting on 12-15 of those fields.
 
  We are using solr 1.4 in production and the dump shows a lot of
  Integer/Character and Byte Array filled up with 0s.
  With solr's trunk code things does not change.
 
  In the mailing list we saw a lot of messages related to this issues:
  we tried truncating the dates to day precision,using missingSortLast =
  true,changing the field type from slong to long,setting autowarming to
  different values,disabling and enabling caches with different values
  but we did not manage to solve the problem.
 
  We were thinking to implement an LRUFieldCache field type to manage
  the FieldCache as an LRU and preventing but, before starting a new
  development, we want to be sure that we are not doing anything wrong
  in the solr configuration or in the index generation.
 
  Any help would be appreciated.
  Regards,
  Matteo
 
 





Re: Field Collapsing SOLR-236

2010-06-23 Thread Govind Kanshi
fieldType:analyzer without class or tokenizer  filter list seems to point
to the config - you may want to correct.


On Wed, Jun 23, 2010 at 3:09 PM, Rakhi Khatwani rkhatw...@gmail.com wrote:

 Hi,
I checked out modules  lucene from the trunk.
 Performed a build using the following commands
 ant clean
 ant compile
 ant example

 Which compiled successfully.


 I then put my existing index(using schema.xml from solr1.4.0/conf/solr/) in
 the multicore folder, configured solr.xml and started the server

 When i type in http://localhost:8983/solr

 i get the following error:
 org.apache.solr.common.SolrException: Plugin init failure for [schema.xml]
 fieldType:analyzer without class or tokenizer  filter list
 at

 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:168)
 at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:480)
 at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:122)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:429)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:286)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:198)
 at

 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:123)
 at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:86)
 at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
 at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
 at

 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:662)
 at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
 at

 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1250)
 at
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
 at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467)
 at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
 at

 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
 at

 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
 at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
 at

 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
 at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
 at
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
 at org.mortbay.jetty.Server.doStart(Server.java:224)
 at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
 at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.mortbay.start.Main.invokeMain(Main.java:194)
 at org.mortbay.start.Main.start(Main.java:534)
 at org.mortbay.start.Main.start(Main.java:441)
 at org.mortbay.start.Main.main(Main.java:119)
 Caused by: org.apache.solr.common.SolrException: analyzer without class or
 tokenizer  filter list
 at org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:908)
 at org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:60)
 at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:450)
 at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:435)
 at

 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:142)
 ... 32 more


 Then i picked up an existing index (schema.xml from solr1.3/solr/conf) and
 put it in multicore folder, configured solr.xml and restarted my index

 Collapsing worked fine.

 Any pointers, which part of schema.xml (solr 1.4) is causing this
 exception?

 Regards,
 Raakhi



 On Wed, Jun 23, 2010 at 1:35 PM, Rakhi Khatwani rkhatw...@gmail.com
 wrote:

 
  Oops this is probably i didn't checkout the modules file from the trunk.
  doing that right now :)
 
  Regards
  Raakhi
 
  On Wed, Jun 23, 2010 at 1:12 PM, Rakhi Khatwani rkhatw...@gmail.com
 wrote:
 
  Hi,
 Patching did work. but when i build the trunk, i get the
 following
  exception:
 
  [SolrTrunk]# ant compile
  Buildfile: /testWorkspace/SolrTrunk/build.xml
 
  init-forrest-entities:
[mkdir] Created dir: /testWorkspace/SolrTrunk/build
[mkdir] Created dir: /testWorkspace/SolrTrunk/build/web
 
  compile-lucene:
 
  BUILD FAILED
  /testWorkspace/SolrTrunk/common-build.xml:207:
  /testWorkspace/modules/analysis/common does not exist.
 
  Regards,
  Raakhi
 
  On Wed, Jun 23, 2010 at 2:39 AM, Martijn v Groningen 
  martijn.is.h...@gmail.com wrote:
 
  What exactly did not work? Patching, compiling or running it?
 
  On 22 June 2010 16:06, Rakhi Khatwani rkhatw...@gmail.com wrote:
   Hi,
I tried checking out the latest code (rev 956715) the patch did
  not
   work on it.
   

TermsComponent - AutoComplete - Multiple Term Suggestions Inclusive Search?

2010-06-23 Thread Saïd Radhouani
Hi,

I'm using the Terms Component to se up the autocomplete feature based on a 
String field. Here are the params I'm using:

terms=trueterms.fl=typeterms.lower=catterms.prefix=catterms.lower.incl=false

With the above params, I've been able to get suggestions for terms that start 
with the specified prefix. I'm wondering wether it's possible to:

- have inclusive search, i.e., by typing cat, we get category, 
subcategory, etc.?

- start suggestion from any word in the field. i.e., by typing cat, we get 
The best category...?

Thanks!

 -Saïd




Re: Field Collapsing SOLR-236

2010-06-23 Thread Rakhi Khatwani
Hi,
   But these is almost no settings in my config
heres a snapshot of what i have in my solrconfig.xml

config
updateHandler class=solr.DirectUpdateHandler2 /

requestDispatcher handleSelect=true 
requestParsers enableRemoteStreaming=false
multipartUploadLimitInKB=2048 /
/requestDispatcher

requestHandler name=standard class=solr.StandardRequestHandler
default=true /
requestHandler name=/update class=solr.XmlUpdateRequestHandler /
requestHandler name=/admin/
class=org.apache.solr.handler.admin.AdminHandlers /

!-- config for the admin interface --
admin
defaultQuery*:*/defaultQuery
/admin

!-- config for field collapsing --
searchComponent name=query
class=org.apache.solr.handler.component.CollapseComponent /
/config

Am i goin wrong anywhere?
Regards,
Raakhi

On Wed, Jun 23, 2010 at 3:28 PM, Govind Kanshi govind.kan...@gmail.comwrote:

 fieldType:analyzer without class or tokenizer  filter list seems to point
 to the config - you may want to correct.


 On Wed, Jun 23, 2010 at 3:09 PM, Rakhi Khatwani rkhatw...@gmail.com
 wrote:

  Hi,
 I checked out modules  lucene from the trunk.
  Performed a build using the following commands
  ant clean
  ant compile
  ant example
 
  Which compiled successfully.
 
 
  I then put my existing index(using schema.xml from solr1.4.0/conf/solr/)
 in
  the multicore folder, configured solr.xml and started the server
 
  When i type in http://localhost:8983/solr
 
  i get the following error:
  org.apache.solr.common.SolrException: Plugin init failure for
 [schema.xml]
  fieldType:analyzer without class or tokenizer  filter list
  at
 
 
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:168)
  at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:480)
  at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:122)
  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:429)
  at org.apache.solr.core.CoreContainer.load(CoreContainer.java:286)
  at org.apache.solr.core.CoreContainer.load(CoreContainer.java:198)
  at
 
 
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:123)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:86)
  at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at
 
 
 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:662)
  at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
  at
 
 
 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1250)
  at
  org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
  at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at
 
 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
  at
 
 
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at
 
 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at
  org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
  at org.mortbay.jetty.Server.doStart(Server.java:224)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
 
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.mortbay.start.Main.invokeMain(Main.java:194)
  at org.mortbay.start.Main.start(Main.java:534)
  at org.mortbay.start.Main.start(Main.java:441)
  at org.mortbay.start.Main.main(Main.java:119)
  Caused by: org.apache.solr.common.SolrException: analyzer without class
 or
  tokenizer  filter list
  at org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:908)
  at org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:60)
  at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:450)
  at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:435)
  at
 
 
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:142)
  ... 32 more
 
 
  Then i picked up an existing index (schema.xml from solr1.3/solr/conf)
 and
  put it in multicore folder, configured solr.xml and restarted my index
 
  Collapsing worked fine.
 
  Any pointers, which part of schema.xml (solr 1.4) is causing this
  exception?
 
  Regards,
  Raakhi
 
 
 
  On Wed, Jun 23, 2010 at 1:35 PM, Rakhi Khatwani rkhatw...@gmail.com
  wrote:
 
  
   Oops this is probably i didn't 

Import XML files different format?

2010-06-23 Thread scrapy
Hi,

I'm new to solr. It looks great.

I would like to add a XML document in the following format in solr:

?xml version=1.0 encoding=utf-8?
race
go
id![CDATA[...]]/id
title![CDATA[...]]/title
url![CDATA[...]]/url
content![CDATA[...]]/content
city![CDATA[...]]/city
postcode![CDATA[...]]/postcode
contract![CDATA[...]]/contract
category![CDATA[...]]/category
date![CDATA[...]]/date
time![CDATA[...]]/time
/go

etc...
/race



Is there a way to do this? If yes how?

Or i need to convert it with some scripts to this:

add
doc
   field name=authorsPatrick Eagar/field
   field name=subjectSports/field
etc...


Thanks for your help

Regards


Re: Import XML files different format?

2010-06-23 Thread Erik Hatcher

You can use DataImportHandler's XML/XPath capabilities to do this:

  http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource 



or you could, of course, convert your XML to Solr's XML format.

Another fine option for what this data looks like, CSV format.

I'd imagine you have the orginal data in a relational database though?

Erik


On Jun 23, 2010, at 7:59 AM, scr...@asia.com wrote:


Hi,

I'm new to solr. It looks great.

I would like to add a XML document in the following format in solr:

?xml version=1.0 encoding=utf-8?
race
go
   id![CDATA[...]]/id
   title![CDATA[...]]/title
   url![CDATA[...]]/url
   content![CDATA[...]]/content
   city![CDATA[...]]/city
   postcode![CDATA[...]]/postcode
   contract![CDATA[...]]/contract
   category![CDATA[...]]/category
   date![CDATA[...]]/date
   time![CDATA[...]]/time
/go

etc...
/race



Is there a way to do this? If yes how?

Or i need to convert it with some scripts to this:

add
doc
  field name=authorsPatrick Eagar/field
  field name=subjectSports/field
etc...


Thanks for your help

Regards




Re: Import XML files different format?

2010-06-23 Thread scrapy
Thanks Eric for your answer.

I'll try to use DIH via data-config.xml as i might index other content with 
different XML structure in the futur... 

Will i need to have different data-config for each XML strucure content file? 
And then manualy cange between them?



 

 


 

 

-Original Message-
From: Erik Hatcher erik.hatc...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wed, Jun 23, 2010 2:19 pm
Subject: Re: Import XML files different format?


You can use DataImportHandler's XML/XPath capabilities to do this: 
 
  
http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource
  
 
or you could, of course, convert your XML to Solr's XML format. 
 
Another fine option for what this data looks like, CSV format. 
 
I'd imagine you have the orginal data in a relational database though? 
 
   Erik 
 
On Jun 23, 2010, at 7:59 AM, scr...@asia.com wrote: 
 
 Hi, 
 
 I'm new to solr. It looks great. 
 
 I would like to add a XML document in the following format in solr: 
 
 ?xml version=1.0 encoding=utf-8? 
 race 
 go 
id![CDATA[...]]/id 
title![CDATA[...]]/title 
url![CDATA[...]]/url 
content![CDATA[...]]/content 
city![CDATA[...]]/city 
postcode![CDATA[...]]/postcode 
contract![CDATA[...]]/contract 
category![CDATA[...]]/category 
date![CDATA[...]]/date 
time![CDATA[...]]/time 
 /go 
 
 etc... 
 /race 
 
 
 
 Is there a way to do this? If yes how? 
 
 Or i need to convert it with some scripts to this: 
 
 add 
 doc 
   field name=authorsPatrick Eagar/field 
   field name=subjectSports/field 
 etc... 
 
 
 Thanks for your help 
 
 Regards 
 

 


Re: TermsComponent - AutoComplete - Multiple Term Suggestions Inclusive Search?

2010-06-23 Thread Chantal Ackermann
Hi Saïd,

I think your problem is the field's type: String. You have to use a
TextField and apply tokenizers that will find subcategory if you put
in cat. (Not sure which filter does that, though. I wouldn't think
that the PorterStemmer cuts off prefix syllables of that kind?)

If, however, you search on an analyzed version of the field it should
return hits as usual according to the analyzer chain, and you can thus
use the values of that field listed in the hits as suggestions.

Exmple:
input: potter
field type: solr.TextField (with porter stemmer)
finds: Harry Potter and Whatever
and also Potters and Plums


Cheers,
Chantal


On Wed, 2010-06-23 at 13:17 +0200, Saïd Radhouani wrote:
 Hi,
 
 I'm using the Terms Component to se up the autocomplete feature based on a 
 String field. Here are the params I'm using:
 
 terms=trueterms.fl=typeterms.lower=catterms.prefix=catterms.lower.incl=false
 
 With the above params, I've been able to get suggestions for terms that start 
 with the specified prefix. I'm wondering wether it's possible to:
 
 - have inclusive search, i.e., by typing cat, we get category, 
 subcategory, etc.?
 
 - start suggestion from any word in the field. i.e., by typing cat, we get 
 The best category...?
 
 Thanks!
 
  -Saïd
 
 





Alphabetic range

2010-06-23 Thread Sophie M.

Hello all,

I try since several day to build up an alphabetical range. I will explain
all steps (i have the Solr1.4 Enterprise  Search Server book written by
Smiley and Pugh).

I want get all artists beginning by the two first letter. If I request mi,
I want to have as response michael jackson and all artists name beginning
by mi.

I defined a field type similiar to Smiley and Pugh's example p.148

fieldType name=bucketFirstTwoLetters class=solr.TextField
sortMissingLast=true omitNorms=true
analyser type=index
tokenizer class=solr.PatternTokenizerFactory
pattern=^([a-zA-Z])([a-zA-Z]).* group=2/ !-- les deux premieres
lettres--
/analyser
analyser type=query
tokenizer class=solr.KeywordTokenizerFactory/
/analyser
/fieldType

I defined the field ArtistSort like : 

field name=ArtistSort type=bucketFirstTwoLetters stored=true
multivalued=false/
To the request : 

http://localhost:8983/solr/music/select?indent=onq=yuqt=standardwt=standardfacet=onfacet.field=ArtistSortfacetsort=lexfacet.missing=onfacet.method=enumfl=ArtistSort

I get :

http://lucene.472066.n3.nabble.com/file/n916716/select.xml select.xml 

I don't understand why the pattern doesn't my exacty. For example An An Yu
matches but I only want artists whom name begins by yu. And I know that an
artist named ReYu would match because ReYu would be interpreted as Re Yu (as
two words).

I also tried to make an other type of queries like : 

http://localhost:8983/solr/music/select?indent=onversion=2.2q=ArtistSort:mi*fq=start=0rows=10fl=ArtistSortqt=standardwt=standardexplainOther=hl.fl=

I get exacly what I would. I made several tries, I get only artist's names
wich begins by the good first to letters.

But I get very few responses, see there :

result name=response numFound=6 start=0

doc
str name=ArtistSortmike manne and tiger blues/str
/doc
−
doc
str name=ArtistSortmimika/str
/doc
−
doc
str name=ArtistSortmiduno/str
/doc
−
doc
str name=ArtistSortmilue macïro/str
/doc
−
doc
str name=ArtistSortmister pringle/str
/doc
−
doc
str name=ArtistSortmimmai/str
/doc


In my index there is more than 80 000 artists...  I really don't understand
why I can't get more responses. I think about the problem since days and
days and now my brain freezes 

Thank you in advance.

Sophie
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Alphabetic-range-tp916716p916716.html
Sent from the Solr - User mailing list archive at Nabble.com.


Setting up Eclipse with merged Lucene Solr source tree

2010-06-23 Thread Ukyo Virgden
Hi,

I'm trying to setup and eclipse environment for combined Lusolr tree. I've
created a Lucene project containing /trunk/lusolr/lucene
and /trunk/lusolr/modules as one project and /trunk/lusolr/solr as another.
I've added lucene project as a dependency to Solr project, removed solr libs
from lucene project and added Lucene project to dependencies of Solr
project.

Lucene source tree is fine but in the Solr tree I get 5 errors

The method getTextContent() is undefined for the type Node TestConfig.java
/Solr/src/test/org/apache/solr/core line 91
The method getTextContent() is undefined for the type Node TestConfig.java
/Solr/src/test/org/apache/solr/core line 94
The method setXIncludeAware(boolean) is undefined for the type
DocumentBuilderFactory Config.java /Solr/src/java/org/apache/solr/core line
113
The method setXIncludeAware(boolean) is undefined for the type
DocumentBuilderFactory DataImporter.java
/Solr/contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport
line
The method setXIncludeAware(boolean) is undefined for the type Object
TestXIncludeConfig.java /Solr/src/test/org/apache/solr/core line 32

Is this the correct way to setup eclipse after the source tree merge?

Thanks in advance
Ukyo


dataimport.properties is not updated on delta-import

2010-06-23 Thread warb

Hello!

I am having some difficulties getting dataimport (DIH) to behave correctly
in Solr 1.4.0. Indexing itself works just as it is supposed to with both
full-import and delta-import adding modified or newly created records to the
index. The problem is however that the date and time of the last
delta-import is not updated in the dataimport.properites file. The only
time the file gets updated is when performing a full-import. 

Now, this is not a huge problem since delta-import will simply disregard
records already imported (due to the primary key), but it seems wasteful to
fetch records which have already been added on previous runs. Also, as the
database grows the delta-imports will take longer and longer.

Does anyone know of anything I might have overlooked or known bugs?

Thanks in advance!

Johan Andersson
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/dataimport-properties-is-not-updated-on-delta-import-tp916753p916753.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexing Rich Format Documents using Data Import Handler (DIH) and the TikaEntityProcessor

2010-06-23 Thread Tod

Please refer to this thread for history:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201006.mbox/%3c4c1b6bb6.7010...@gmail.com%3e


I'm trying to integrate the TikaEntityProcessor as suggested.  I'm using 
Solr Version: 1.4.0 and getting the following error:


java.lang.ClassNotFoundException: Unable to load BinURLDataSource or 
org.apache.solr.handler.dataimport.BinURLDataSource


curl -s http://test.html|curl 
http://localhost:9080/solr/update/extract?extractOnly=true --data-binary 
@-  -H 'Content-type:text/html'


... works fine so presumably my Tika processor is working.


My data-config.xml looks like this:

dataConfig
  dataSource type=JdbcDataSource
driver=oracle.jdbc.driver.OracleDriver
url=jdbc:oracle:thin:@whatever:12345:whatever
user=me
name=ds-db
password=secret/

  dataSource type=BinURLDataSource
name=ds-url/

  document
entity name=my_database
 dataSource=ds-db
 query=select * from my_database where rownum lt;=2
  field column=CONTENT_IDname=content_id/
  field column=CMS_TITLE name=cms_title/
  field column=FORM_TITLEname=form_title/
  field column=FILE_SIZE name=file_size/
  field column=KEYWORDS  name=keywords/
  field column=DESCRIPTION   name=description/
  field column=CONTENT_URL   name=content_url/
/entity

entity name=my_database_url
 dataSource=ds-url
 query=select CONTENT_URL from my_database where 
content_id='${my_database.CONTENT_ID}'

 entity processor=TikaEntityProcessor
  dataSource=ds-url
  format=text
  url=http://www.mysite.com/${my_database.content_url};
  field column=text/
 /entity
/entity

  /document
/dataConfig

I added the entity name=my_database_url section to an existing 
(working) database entity to be able to have Tika index the content 
pointed to by the content_url.


Is there anything obviously wrong with what I've tried so far because 
this is not working, it keeps rolling back with the error above.



Thanks - Tod


Re: TermsComponent - AutoComplete - Multiple Term Suggestions Inclusive Search?

2010-06-23 Thread Sophie M.

To build your autocompletion, you can use the NGramFilterFactory. If you type
cat It will match subcategory and the best category.

If you change your mind and you don't want anymore to match subcategory, you
can use the EdgeNGramFilterFactory.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/TermsComponent-AutoComplete-Multiple-Term-Suggestions-Inclusive-Search-tp916530p916769.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dataimport.properties is not updated on delta-import

2010-06-23 Thread Stefan Moises

Hi,

what I have experienced is that the primary key seems to be case 
sensitive for the delta queries, at least for some jdcd drivers... see 
http://lucene.472066.n3.nabble.com/Problem-with-DIH-delta-import-on-JDBC-tp763469p765262.html 
... so make sure you specify it with the correct case (e.g. ID instead 
of id) in your db-data-config.xml.


Maybe that's the problem...

Cheers,
Stefan

Am 23.06.2010 15:09, schrieb warb:

Hello!

I am having some difficulties getting dataimport (DIH) to behave correctly
in Solr 1.4.0. Indexing itself works just as it is supposed to with both
full-import and delta-import adding modified or newly created records to the
index. The problem is however that the date and time of the last
delta-import is not updated in the dataimport.properites file. The only
time the file gets updated is when performing a full-import.

Now, this is not a huge problem since delta-import will simply disregard
records already imported (due to the primary key), but it seems wasteful to
fetch records which have already been added on previous runs. Also, as the
database grows the delta-imports will take longer and longer.

Does anyone know of anything I might have overlooked or known bugs?

Thanks in advance!

Johan Andersson
   


--
***
Stefan Moises
Senior Softwareentwickler

shoptimax GmbH
Guntherstraße 45 a
90461 Nürnberg
Amtsgericht Nürnberg HRB 21703
GF Friedrich Schreieck

Tel.: 0911/25566-25
Fax:  0911/25566-29
moi...@shoptimax.de
http://www.shoptimax.de
***



fuzzy query performance

2010-06-23 Thread Peter Karich
Hi!

How can I improve the performance of a fuzzy search like: mihchael~0.7
through a relative large index (~1 million docs)?
It takes over 15 seconds at the moment if we would perform it on the
normal text search field.
I searched the web and the jira and couldn't find anything related to that.

Any pointers or ideas would be appreciated!

Regards,
Peter.


Re: fuzzy query performance

2010-06-23 Thread Mark Miller

On 6/23/10 9:48 AM, Peter Karich wrote:

Hi!

How can I improve the performance of a fuzzy search like: mihchael~0.7
through a relative large index (~1 million docs)?
It takes over 15 seconds at the moment if we would perform it on the
normal text search field.
I searched the web and the jira and couldn't find anything related to that.

Any pointers or ideas would be appreciated!

Regards,
Peter.


Solr trunk should have much improved fuzzy speeds (due to some very cool 
work that was done in Lucene) - you using 1.4?


--
- Mark

http://www.lucidimagination.com


Re: Setting up Eclipse with merged Lucene Solr source tree

2010-06-23 Thread Erick Erickson
Did you see this page?
http://wiki.apache.org/solr/HowToContribute

http://wiki.apache.org/solr/HowToContributeEspecially down near the end,
the section
Development Environment Tips

HTH
Erick

On Wed, Jun 23, 2010 at 8:57 AM, Ukyo Virgden ukyovirg...@gmail.com wrote:

 Hi,

 I'm trying to setup and eclipse environment for combined Lusolr tree. I've
 created a Lucene project containing /trunk/lusolr/lucene
 and /trunk/lusolr/modules as one project and /trunk/lusolr/solr as another.
 I've added lucene project as a dependency to Solr project, removed solr
 libs
 from lucene project and added Lucene project to dependencies of Solr
 project.

 Lucene source tree is fine but in the Solr tree I get 5 errors

 The method getTextContent() is undefined for the type Node TestConfig.java
 /Solr/src/test/org/apache/solr/core line 91
 The method getTextContent() is undefined for the type Node TestConfig.java
 /Solr/src/test/org/apache/solr/core line 94
 The method setXIncludeAware(boolean) is undefined for the type
 DocumentBuilderFactory Config.java /Solr/src/java/org/apache/solr/core line
 113
 The method setXIncludeAware(boolean) is undefined for the type
 DocumentBuilderFactory DataImporter.java

 /Solr/contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport
 line
 The method setXIncludeAware(boolean) is undefined for the type Object
 TestXIncludeConfig.java /Solr/src/test/org/apache/solr/core line 32

 Is this the correct way to setup eclipse after the source tree merge?

 Thanks in advance
 Ukyo



Re: Help with highlighting

2010-06-23 Thread noel
Here's my request:
q=ASA+AND+minisite_id%3A36version=1.3json.nl=maprows=10start=0wt=jsonhl=truehl.fl=%2Ahl.simple.pre=%3Cspan+class%3D%22hl%22%3Ehl.simple.post=%3C%2Fspan%3Ehl.fragsize=0hl.mergeContiguous=false

And here's what happened:
It didn't return results, even when I applied an asterisk for which fields 
highlight. I tried other fields and that didn't work either, however all_text 
is the only one that works. Any other ideas why the other fields won't 
highlight? Thanks.

-Original Message-
From: Erik Hatcher erik.hatc...@gmail.com
Sent: Tuesday, June 22, 2010 9:49pm
To: solr-user@lucene.apache.org
Subject: Re: Help with highlighting

You need to share with us the Solr request you made, any any custom  
request handler settings that might map to.  Chances are you just need  
to twiddle with the highlighter parameters (see wiki for docs) to get  
it to do what you want.

Erik

On Jun 22, 2010, at 4:42 PM, n...@frameweld.com wrote:

 Hi, I need help with highlighting fields that would match a query.  
 So far, my results only highlight if the field is from all_text, and  
 I would like it to use other fields. It simply isn't the case if I  
 just turn highlighting on. Any ideas why it only applies to  
 all_text? Here is my schema:

 ?xml version=1.0 ?

 schema name=Search version=1.1
   types
   !-- Basic Solr Bundled Data Types --
   
   !-- Rudimentary types --
   fieldType name=string class=solr.StrField  
 sortMissingLast=true omitNorms=true /
   fieldType name=boolean class=solr.BoolField  
 sortMissingLast=true omitNorms=true /
   
   !-- Non-sortable numeric types --
   fieldType name=integer class=solr.IntField 
 omitNorms=true/

   fieldType name=long class=solr.LongField omitNorms=true/
   fieldType name=float class=solr.FloatField 
 omitNorms=true/
   fieldType name=double class=solr.DoubleField 
 omitNorms=true/
   
   !-- Sortable numeric types --
   fieldType name=sint class=solr.SortableIntField  
 sortMissingLast=true omitNorms=true/
   fieldType name=slong class=solr.SortableLongField  
 sortMissingLast=true omitNorms=true/
   fieldType name=sfloat class=solr.SortableFloatField  
 sortMissingLast=true omitNorms=true/
   fieldType name=sdouble class=solr.SortableDoubleField  
 sortMissingLast=true omitNorms=true/
   
   !-- Date/Time types --

   fieldType name=date class=solr.DateField  
 sortMissingLast=true omitNorms=true/
   
   !-- Pseudo types --
   fieldType name=random class=solr.RandomSortField  
 indexed=true /
   
   !-- Analyzing types --
   fieldType name=text_ws class=solr.TextField  
 positionIncrementGap=100
   analyzer
   tokenizer 
 class=solr.WhitespaceTokenizerFactory/
   /analyzer
   /fieldType

   
   fieldType name=text class=solr.TextField  
 positionIncrementGap=100
   analyzer type=index
   tokenizer 
 class=solr.WhitespaceTokenizerFactory/
   !-- filter class=solr.SynonymFilterFactory  
 synonyms=index_synonyms.txt ignoreCase=true expand=false/ --
   filter class=solr.WordDelimiterFilterFactory 
  
 generateWordParts=1 generateNumberParts=1 catenateWords=1  
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EnglishPorterFilterFactory 
  
 protected=protwords.txt/
   filter 
 class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer

   analyzer type=query
   tokenizer 
 class=solr.WhitespaceTokenizerFactory/
   filter class=solr.SynonymFilterFactory  
 synonyms=synonyms.txt ignoreCase=true expand=true/
   filter class=solr.WordDelimiterFilterFactory 
  
 generateWordParts=1 generateNumberParts=1 catenateWords=0  
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EnglishPorterFilterFactory 
  
 protected=protwords.txt/
   filter 
 class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   /fieldType

   
   fieldType name=textTight class=solr.TextField  
 positionIncrementGap=100 
   analyzer
   tokenizer 
 class=solr.WhitespaceTokenizerFactory/
   filter 

Re: Help with highlighting

2010-06-23 Thread dan sutton
It looks to me like a tokenisation issue, all_text content and the query
text will match, but the string fieldtype fields 'might not' and therefore
will not be highlighted.

On Wed, Jun 23, 2010 at 4:40 PM, n...@frameweld.com wrote:

 Here's my request:
 q=ASA+AND+minisite_id%3A36version=1.3json.nl
 =maprows=10start=0wt=jsonhl=truehl.fl=%2Ahl.simple.pre=%3Cspan+class%3D%22hl%22%3Ehl.simple.post=%3C%2Fspan%3Ehl.fragsize=0hl.mergeContiguous=false

 And here's what happened:
 It didn't return results, even when I applied an asterisk for which fields
 highlight. I tried other fields and that didn't work either, however
 all_text is the only one that works. Any other ideas why the other fields
 won't highlight? Thanks.

 -Original Message-
 From: Erik Hatcher erik.hatc...@gmail.com
 Sent: Tuesday, June 22, 2010 9:49pm
 To: solr-user@lucene.apache.org
 Subject: Re: Help with highlighting

 You need to share with us the Solr request you made, any any custom
 request handler settings that might map to.  Chances are you just need
 to twiddle with the highlighter parameters (see wiki for docs) to get
 it to do what you want.

Erik

 On Jun 22, 2010, at 4:42 PM, n...@frameweld.com wrote:

  Hi, I need help with highlighting fields that would match a query.
  So far, my results only highlight if the field is from all_text, and
  I would like it to use other fields. It simply isn't the case if I
  just turn highlighting on. Any ideas why it only applies to
  all_text? Here is my schema:
 
  ?xml version=1.0 ?
 
  schema name=Search version=1.1
types
!-- Basic Solr Bundled Data Types --
 
!-- Rudimentary types --
fieldType name=string class=solr.StrField
  sortMissingLast=true omitNorms=true /
fieldType name=boolean class=solr.BoolField
  sortMissingLast=true omitNorms=true /
 
!-- Non-sortable numeric types --
fieldType name=integer class=solr.IntField
 omitNorms=true/
 
fieldType name=long class=solr.LongField
 omitNorms=true/
fieldType name=float class=solr.FloatField
 omitNorms=true/
fieldType name=double class=solr.DoubleField
 omitNorms=true/
 
!-- Sortable numeric types --
fieldType name=sint class=solr.SortableIntField
  sortMissingLast=true omitNorms=true/
fieldType name=slong class=solr.SortableLongField
  sortMissingLast=true omitNorms=true/
fieldType name=sfloat class=solr.SortableFloatField
  sortMissingLast=true omitNorms=true/
fieldType name=sdouble class=solr.SortableDoubleField
  sortMissingLast=true omitNorms=true/
 
!-- Date/Time types --
 
fieldType name=date class=solr.DateField
  sortMissingLast=true omitNorms=true/
 
!-- Pseudo types --
fieldType name=random class=solr.RandomSortField
  indexed=true /
 
!-- Analyzing types --
fieldType name=text_ws class=solr.TextField
  positionIncrementGap=100
analyzer
tokenizer
 class=solr.WhitespaceTokenizerFactory/
/analyzer
/fieldType
 
 
fieldType name=text class=solr.TextField
  positionIncrementGap=100
analyzer type=index
tokenizer
 class=solr.WhitespaceTokenizerFactory/
!-- filter class=solr.SynonymFilterFactory
  synonyms=index_synonyms.txt ignoreCase=true expand=false/ --
filter
 class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter
 class=solr.LowerCaseFilterFactory/
filter
 class=solr.EnglishPorterFilterFactory
  protected=protwords.txt/
filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
 
analyzer type=query
tokenizer
 class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory
  synonyms=synonyms.txt ignoreCase=true expand=true/
filter
 class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=0
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter
 class=solr.LowerCaseFilterFactory/
filter
 class=solr.EnglishPorterFilterFactory
  protected=protwords.txt/
filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType
 
 
fieldType name=textTight class=solr.TextField
  positionIncrementGap=100 

remove from list

2010-06-23 Thread Susan Rust
Hey SOLR folks -- There's too much info for me to digest, so please  
remove me from the email threads.


However, if we can build you a forum, bulletin board or other web- 
based tool, please let us know. For that matter, we would be happy to  
build you a new website.


Bill O'Connor is our CTO and the Drupal.org SOLR Redesign Lead. So we  
love SOLR! Let us know how we can support your efforts.


Susan Rust
VP of Client Services

If you wish to travel quickly, go alone
If you wish to travel far, go together

Achieve Internet
1767 Grand Avenue, Suite 2
San Diego, CA 92109

800-618-8777 x106
858-453-5760 x106

Susan-Rust (skype)
@Susan_Rust (twitter)
@Achieveinternet (twitter)
@drupalsandiego (San Diego Drupal Users' Group Twitter)



This message contains confidential information and is intended only  
for the individual named. If you are not the named addressee you  
should not disseminate, distribute or copy this e-mail. Please notify  
the sender immediately by e-mail if you have received this e-mail by  
mistake and delete this e-mail from your system. E-mail transmission  
cannot be guaranteed to be secure or error-free as information could  
be intercepted, corrupted, lost, destroyed, arrive late or incomplete,  
or contain viruses. The sender therefore does not accept liability for  
any errors or omissions in the contents of this message, which arise  
as a result of e-mail transmission. If verification is required please  
request a hard-copy version.














On Jun 23, 2010, at 1:52 AM, Mark Allan wrote:


Cheers, Geert-Jan, that's very helpful.

We won't always be searching with dates and we wouldn't want  
duplicates to show up in the results, so your second suggestion  
looks like a good workaround if I can't solve the actual problem.  I  
didn't know about FieldCollapsing, so I'll definitely keep it in mind.


Thanks
Mark

On 22 Jun 2010, at 3:44 pm, Geert-Jan Brits wrote:

Perhaps my answer is useless, bc I don't have an answer to your  
direct

question, but:
You *might* want to consider if your concept of a solr-document is  
on the

correct granular level, i.e:

your problem posted could be tackled (afaik) by defining a   
document being a

'sub-event' with only 1 daterange.
So for each event-doc you have now, this is replaced by several sub- 
event

docs in this proposed situation.

Additionally each sub-event doc gets an additional field 'parent- 
eventid'
which maps to something like an event-id (which you're probably  
using) .

So several sub-event docs can point to the same event-id.

Lastly, all sub-event docs belonging to a particular event  
implement all the

other fields that you may have stored in that particular event-doc.

Now you can query for events based on data-rages like you  
envisioned, but
instead of returning events you return sub-event-docs. However  
since all
data of the original event (except the multiple dateranges) is  
available in
the subevent-doc this shouldn't really bother the client. If you  
need to
display all dates of an event (the only info missing from the  
returned
solr-doc) you could easily store it in a RDB and fetch it using the  
defined

parent-eventid.

The only caveat I see, is that possibly multiple sub-events with  
the same

'parent-eventid' might get returned for a particular query.
This however depends on the type of queries you envision. i.e:
1)  If you always issue queries with date-filters, and *assuming*  
that
sub-events of a particular event don't temporally overlap, you will  
never

get multiple sub-events returned.
2)  if 1)  doesn't hold and assuming you *do* mind multiple sub- 
events of

the same actual event, you could try to use Field Collapsing on
'parent-eventid' to only return the first sub-event per parent- 
eventid that
matches the rest of your query. (Note however, that Field  
Collapsing is a

patch at the moment. http://wiki.apache.org/solr/FieldCollapsing)

Not sure if this helped you at all, but at the very least it was a  
nice

conceptual exercise ;-)

Cheers,
Geert-Jan


2010/6/22 Mark Allan mark.al...@ed.ac.uk


Hi all,

Firstly, I apologise for the length of this email but I need to  
describe

properly what I'm doing before I get to the problem!

I'm working on a project just now which requires the ability to  
store and
search on temporal coverage data - ie. a field which specifies a  
date range

during which a certain event took place.

I hunted around for a few days and couldn't find anything which  
seemed to
fit, so I had a go at writing my own field type based on  
solr.PointType.

It's used as follows:
schema.xml
 fieldType name=temporal class=solr.TemporalCoverage
dimension=2 subFieldSuffix=_i/
 field name=daterange type=temporal indexed=true  
stored=true

multiValued=true/
data.xml
 add
 doc
 ...
 field name=daterange1940,1945/field
 /doc
 /add

Internally, this gets stored as:
 arr 

RE: remove from list

2010-06-23 Thread Markus Jelsma
If you want to unsubscribe, then you can do so [1] without trying to sell 
something ;)

 

[1]: http://lucene.apache.org/solr/mailing_lists.html

 

Cheers!
 
-Original message-
From: Susan Rust su...@achieveinternet.com
Sent: Wed 23-06-2010 18:23
To: solr-user@lucene.apache.org; Erik Hatcher erik.hatc...@gmail.com; 
Subject: remove from list

Hey SOLR folks -- There's too much info for me to digest, so please  
remove me from the email threads.

However, if we can build you a forum, bulletin board or other web- 
based tool, please let us know. For that matter, we would be happy to  
build you a new website.

Bill O'Connor is our CTO and the Drupal.org SOLR Redesign Lead. So we  
love SOLR! Let us know how we can support your efforts.

Susan Rust
VP of Client Services

If you wish to travel quickly, go alone
If you wish to travel far, go together

Achieve Internet
1767 Grand Avenue, Suite 2
San Diego, CA 92109

800-618-8777 x106
858-453-5760 x106

Susan-Rust (skype)
@Susan_Rust (twitter)
@Achieveinternet (twitter)
@drupalsandiego (San Diego Drupal Users' Group Twitter)



This message contains confidential information and is intended only  
for the individual named. If you are not the named addressee you  
should not disseminate, distribute or copy this e-mail. Please notify  
the sender immediately by e-mail if you have received this e-mail by  
mistake and delete this e-mail from your system. E-mail transmission  
cannot be guaranteed to be secure or error-free as information could  
be intercepted, corrupted, lost, destroyed, arrive late or incomplete,  
or contain viruses. The sender therefore does not accept liability for  
any errors or omissions in the contents of this message, which arise  
as a result of e-mail transmission. If verification is required please  
request a hard-copy version.













On Jun 23, 2010, at 1:52 AM, Mark Allan wrote:

 Cheers, Geert-Jan, that's very helpful.

 We won't always be searching with dates and we wouldn't want  
 duplicates to show up in the results, so your second suggestion  
 looks like a good workaround if I can't solve the actual problem.  I  
 didn't know about FieldCollapsing, so I'll definitely keep it in mind.

 Thanks
 Mark

 On 22 Jun 2010, at 3:44 pm, Geert-Jan Brits wrote:

 Perhaps my answer is useless, bc I don't have an answer to your  
 direct
 question, but:
 You *might* want to consider if your concept of a solr-document is  
 on the
 correct granular level, i.e:

 your problem posted could be tackled (afaik) by defining a   
 document being a
 'sub-event' with only 1 daterange.
 So for each event-doc you have now, this is replaced by several sub- 
 event
 docs in this proposed situation.

 Additionally each sub-event doc gets an additional field 'parent- 
 eventid'
 which maps to something like an event-id (which you're probably  
 using) .
 So several sub-event docs can point to the same event-id.

 Lastly, all sub-event docs belonging to a particular event  
 implement all the
 other fields that you may have stored in that particular event-doc.

 Now you can query for events based on data-rages like you  
 envisioned, but
 instead of returning events you return sub-event-docs. However  
 since all
 data of the original event (except the multiple dateranges) is  
 available in
 the subevent-doc this shouldn't really bother the client. If you  
 need to
 display all dates of an event (the only info missing from the  
 returned
 solr-doc) you could easily store it in a RDB and fetch it using the  
 defined
 parent-eventid.

 The only caveat I see, is that possibly multiple sub-events with  
 the same
 'parent-eventid' might get returned for a particular query.
 This however depends on the type of queries you envision. i.e:
 1)  If you always issue queries with date-filters, and *assuming*  
 that
 sub-events of a particular event don't temporally overlap, you will  
 never
 get multiple sub-events returned.
 2)  if 1)  doesn't hold and assuming you *do* mind multiple sub- 
 events of
 the same actual event, you could try to use Field Collapsing on
 'parent-eventid' to only return the first sub-event per parent- 
 eventid that
 matches the rest of your query. (Note however, that Field  
 Collapsing is a
 patch at the moment. http://wiki.apache.org/solr/FieldCollapsing)

 Not sure if this helped you at all, but at the very least it was a  
 nice
 conceptual exercise ;-)

 Cheers,
 Geert-Jan


 2010/6/22 Mark Allan mark.al...@ed.ac.uk

 Hi all,

 Firstly, I apologise for the length of this email but I need to  
 describe
 properly what I'm doing before I get to the problem!

 I'm working on a project just now which requires the ability to  
 store and
 search on temporal coverage data - ie. a field which specifies a  
 date range
 during which a certain event took place.

 I hunted around for a few days and couldn't find anything which  
 seemed to
 fit, so I had a go at writing my 

Re: remove from list

2010-06-23 Thread Susan Rust

Will do -- but wasn't selling -- trying to donate!

Susan Rust
VP of Client Services

If you wish to travel quickly, go alone
If you wish to travel far, go together

Achieve Internet
1767 Grand Avenue, Suite 2
San Diego, CA 92109

800-618-8777 x106
858-453-5760 x106

Susan-Rust (skype)
@Susan_Rust (twitter)
@Achieveinternet (twitter)
@drupalsandiego (San Diego Drupal Users' Group Twitter)



This message contains confidential information and is intended only  
for the individual named. If you are not the named addressee you  
should not disseminate, distribute or copy this e-mail. Please notify  
the sender immediately by e-mail if you have received this e-mail by  
mistake and delete this e-mail from your system. E-mail transmission  
cannot be guaranteed to be secure or error-free as information could  
be intercepted, corrupted, lost, destroyed, arrive late or incomplete,  
or contain viruses. The sender therefore does not accept liability for  
any errors or omissions in the contents of this message, which arise  
as a result of e-mail transmission. If verification is required please  
request a hard-copy version.














On Jun 23, 2010, at 9:30 AM, Markus Jelsma wrote:

If you want to unsubscribe, then you can do so [1] without trying to  
sell something ;)




[1]: http://lucene.apache.org/solr/mailing_lists.html



Cheers!

-Original message-
From: Susan Rust su...@achieveinternet.com
Sent: Wed 23-06-2010 18:23
To: solr-user@lucene.apache.org; Erik Hatcher  
erik.hatc...@gmail.com;

Subject: remove from list

Hey SOLR folks -- There's too much info for me to digest, so please
remove me from the email threads.

However, if we can build you a forum, bulletin board or other web-
based tool, please let us know. For that matter, we would be happy to
build you a new website.

Bill O'Connor is our CTO and the Drupal.org SOLR Redesign Lead. So we
love SOLR! Let us know how we can support your efforts.

Susan Rust
VP of Client Services

If you wish to travel quickly, go alone
If you wish to travel far, go together

Achieve Internet
1767 Grand Avenue, Suite 2
San Diego, CA 92109

800-618-8777 x106
858-453-5760 x106

Susan-Rust (skype)
@Susan_Rust (twitter)
@Achieveinternet (twitter)
@drupalsandiego (San Diego Drupal Users' Group Twitter)



This message contains confidential information and is intended only
for the individual named. If you are not the named addressee you
should not disseminate, distribute or copy this e-mail. Please notify
the sender immediately by e-mail if you have received this e-mail by
mistake and delete this e-mail from your system. E-mail transmission
cannot be guaranteed to be secure or error-free as information could
be intercepted, corrupted, lost, destroyed, arrive late or incomplete,
or contain viruses. The sender therefore does not accept liability for
any errors or omissions in the contents of this message, which arise
as a result of e-mail transmission. If verification is required please
request a hard-copy version.













On Jun 23, 2010, at 1:52 AM, Mark Allan wrote:


Cheers, Geert-Jan, that's very helpful.

We won't always be searching with dates and we wouldn't want
duplicates to show up in the results, so your second suggestion
looks like a good workaround if I can't solve the actual problem.  I
didn't know about FieldCollapsing, so I'll definitely keep it in  
mind.


Thanks
Mark

On 22 Jun 2010, at 3:44 pm, Geert-Jan Brits wrote:


Perhaps my answer is useless, bc I don't have an answer to your
direct
question, but:
You *might* want to consider if your concept of a solr-document is
on the
correct granular level, i.e:

your problem posted could be tackled (afaik) by defining a
document being a
'sub-event' with only 1 daterange.
So for each event-doc you have now, this is replaced by several sub-
event
docs in this proposed situation.

Additionally each sub-event doc gets an additional field 'parent-
eventid'
which maps to something like an event-id (which you're probably
using) .
So several sub-event docs can point to the same event-id.

Lastly, all sub-event docs belonging to a particular event
implement all the
other fields that you may have stored in that particular event-doc.

Now you can query for events based on data-rages like you
envisioned, but
instead of returning events you return sub-event-docs. However
since all
data of the original event (except the multiple dateranges) is
available in
the subevent-doc this shouldn't really bother the client. If you
need to
display all dates of an event (the only info missing from the
returned
solr-doc) you could easily store it in a RDB and fetch it using the
defined
parent-eventid.

The only caveat I see, is that possibly multiple sub-events with
the same
'parent-eventid' might get returned for a particular query.
This however depends on the type of queries you envision. i.e:
1)  If you always issue 

Re: Help with highlighting

2010-06-23 Thread noel
Thanks, that's exactly the problem. I've tried different types, even a 
fieldType that had no tokenizers and that didn't work. However, text just gives 
me my results as wanted. 

-Original Message-
From: dan sutton danbsut...@gmail.com
Sent: Wednesday, June 23, 2010 12:06pm
To: solr-user@lucene.apache.org
Subject: Re: Help with highlighting

It looks to me like a tokenisation issue, all_text content and the query
text will match, but the string fieldtype fields 'might not' and therefore
will not be highlighted.

On Wed, Jun 23, 2010 at 4:40 PM, n...@frameweld.com wrote:

 Here's my request:
 q=ASA+AND+minisite_id%3A36version=1.3json.nl
 =maprows=10start=0wt=jsonhl=truehl.fl=%2Ahl.simple.pre=%3Cspan+class%3D%22hl%22%3Ehl.simple.post=%3C%2Fspan%3Ehl.fragsize=0hl.mergeContiguous=false

 And here's what happened:
 It didn't return results, even when I applied an asterisk for which fields
 highlight. I tried other fields and that didn't work either, however
 all_text is the only one that works. Any other ideas why the other fields
 won't highlight? Thanks.

 -Original Message-
 From: Erik Hatcher erik.hatc...@gmail.com
 Sent: Tuesday, June 22, 2010 9:49pm
 To: solr-user@lucene.apache.org
 Subject: Re: Help with highlighting

 You need to share with us the Solr request you made, any any custom
 request handler settings that might map to.  Chances are you just need
 to twiddle with the highlighter parameters (see wiki for docs) to get
 it to do what you want.

Erik

 On Jun 22, 2010, at 4:42 PM, n...@frameweld.com wrote:

  Hi, I need help with highlighting fields that would match a query.
  So far, my results only highlight if the field is from all_text, and
  I would like it to use other fields. It simply isn't the case if I
  just turn highlighting on. Any ideas why it only applies to
  all_text? Here is my schema:
 
  ?xml version=1.0 ?
 
  schema name=Search version=1.1
types
!-- Basic Solr Bundled Data Types --
 
!-- Rudimentary types --
fieldType name=string class=solr.StrField
  sortMissingLast=true omitNorms=true /
fieldType name=boolean class=solr.BoolField
  sortMissingLast=true omitNorms=true /
 
!-- Non-sortable numeric types --
fieldType name=integer class=solr.IntField
 omitNorms=true/
 
fieldType name=long class=solr.LongField
 omitNorms=true/
fieldType name=float class=solr.FloatField
 omitNorms=true/
fieldType name=double class=solr.DoubleField
 omitNorms=true/
 
!-- Sortable numeric types --
fieldType name=sint class=solr.SortableIntField
  sortMissingLast=true omitNorms=true/
fieldType name=slong class=solr.SortableLongField
  sortMissingLast=true omitNorms=true/
fieldType name=sfloat class=solr.SortableFloatField
  sortMissingLast=true omitNorms=true/
fieldType name=sdouble class=solr.SortableDoubleField
  sortMissingLast=true omitNorms=true/
 
!-- Date/Time types --
 
fieldType name=date class=solr.DateField
  sortMissingLast=true omitNorms=true/
 
!-- Pseudo types --
fieldType name=random class=solr.RandomSortField
  indexed=true /
 
!-- Analyzing types --
fieldType name=text_ws class=solr.TextField
  positionIncrementGap=100
analyzer
tokenizer
 class=solr.WhitespaceTokenizerFactory/
/analyzer
/fieldType
 
 
fieldType name=text class=solr.TextField
  positionIncrementGap=100
analyzer type=index
tokenizer
 class=solr.WhitespaceTokenizerFactory/
!-- filter class=solr.SynonymFilterFactory
  synonyms=index_synonyms.txt ignoreCase=true expand=false/ --
filter
 class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter
 class=solr.LowerCaseFilterFactory/
filter
 class=solr.EnglishPorterFilterFactory
  protected=protwords.txt/
filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
 
analyzer type=query
tokenizer
 class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory
  synonyms=synonyms.txt ignoreCase=true expand=true/
filter
 class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=0
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter
 class=solr.LowerCaseFilterFactory/
  

Highlight question

2010-06-23 Thread Gregg Hoshovsky
I just started working with the highlighting.  I am using the default 
configurations. I have a field that I can get a single highlight to occur 
marking the data.

What I would like to do is this,

Given a word say 'tumor', and the sentence

 the lower tumor grew 1.5 cm. blah blah blah  we need to remove the tumor in 
the next surgery

I would like to get em the lower tumor grew 1.5 cm /em. blah blah 
blah  we need to ...em remove the tumor in the next /em. surgery

Thus finding multiple references to the work and  only grabbing a few words 
around it.



In the solrconfig.xml I have been able to change the hl.simple.pre/post 
variable, but when I try to change the hl,regex pattern or the hl.snippets they 
don't have any effect. I thought the hl.snippets would alow me to find more 
than one and highlight it, and well I tried a bunch of regex patterns but they 
didn't do anything.

here is a snippet of the config file.

Any help is appreciated.

Gregg


   !-- A regular-expression-based fragmenter (f.i., for sentence extraction) 
--
   fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter
lst name=defaults
  !-- slightly smaller fragsizes work better because of slop --
  int name=hl.snippets4/int  int name=hl.fragsize70/int
  !-- allow 50% slop on fragment sizes --
  float name=hl.regex.slop0.2/float
  !-- a basic sentence pattern --
  str name=hl.regex.pattern[-\w ,/\n\']{1,1}/str
/lst
   /fragmenter

   !-- Configure the standard formatter --
   formatter name=html class=org.apache.solr.highlight.HtmlFormatter 
default=true
lst name=defaults
  int name=hl.snippets4/int
 int name=hl.fragsize100/int
 str name=hl.simple.pre![CDATA[...em]]/str
 str name=hl.simple.post![CDATA[/em]]/str
/lst



Help with sorting

2010-06-23 Thread Adi Neacsu
Hi everyone , 
I'm stuck in sorting with solr . I have documents of some 
institutions 
differentiated by an id named instanta . I indexed all those 
documents 
and among other things I put in the index the date  the document 
was 
created and the id of the institution .When I want sort the 
documents 
wich contain a certain word by date or by instituion all I get is 
an 
order that I don't understand . 

field name=datecreated type=date indexed=true stored=false / 
field name=instanta type=int indexed=true stored=false 
required=true / 

 QueryOptions options = new QueryOptions 
{ 
Rows = resultsPerPage, 
Start = (pageNumber - 1) * resultsPerPage, 
OrderBy = new[] { new SortOrder(instanta, Order.DESC) } 

}; 

Thank you in advance 
 
jud. Adrian Neacsu
Presedinte Tribunalul Vrancea
http://www.adrianneacsu.jurindex.ro

www.jurisprudenta.org

www.societateapentrujustitie.ro
 (+40) 0721949875  ;  (+40) 0749182508  
fax 0337814221


  

DIH and dynamicField

2010-06-23 Thread Boyd Hemphill
I am new to the list so any coaching on asking question is much appreciated.

I am  having a problem where importing with DIH and attempting to use
dynamicField produces no result.  I get no error, nor do I get a message in
the log.

I found this:  https://issues.apache.org/jira/browse/SOLR-742 which says the
issue was closed in bulk for the 1.4 release.  The messages above seem to
indicate the patch was in/out/good/bad, so I am not sure if the issue was
fixed as we are seeing the same behavior described in the bug.

Has this issue, in fact, been resolved?  Is anyone using DIH and
dynamicField successfully together?

Solr is truly fantastic (so is DIH for that matter).  Thank you!

Boyd Hemphill


Re: fuzzy query performance

2010-06-23 Thread Peter Karich
Hi Mark!

 Solr trunk should have much improved fuzzy speeds (due to some very
cool work that was done in Lucene) - you using 1.4?

yes.
So, you mean I should try it out her:
http://svn.apache.org/viewvc/lucene/dev/trunk/solr/

or some 'more stable' branch?
http://svn.apache.org/viewvc/lucene/solr/branches/branch-1.5-dev/

What would you choose?

Regards,
Peter.

 Hi!

 How can I improve the performance of a fuzzy search like: mihchael~0.7
 through a relative large index (~1 million docs)?
 It takes over 15 seconds at the moment if we would perform it on the
 normal text search field.
 I searched the web and the jira and couldn't find anything related to
 that.

 Any pointers or ideas would be appreciated!

 Regards,
 Peter.

 Solr trunk should have much improved fuzzy speeds (due to some very
 cool work that was done in Lucene) - you using 1.4?



Stemmed and/or unStemmed field

2010-06-23 Thread Vishal A.
Hello all,

 

One quick question, trying to find out what scenario would work best.

We have huge free text dataset containing product titles, descriptions.
Unfortunately, we don't have the data categorized so we rely on 'search
relevancy + synonyms'  heavily to categorize.

Here is what I am trying to do :  Someone clicks on  'Comforters  Pillows'
, we would want the results to be filtered where title has keyword
'Comforter' or  'Pillows' but we have been getting results with word
'comfort' in the title. I assume it is because of stemming. What is the
right way to handle this?

I am thinking to create another unstemmed field as 'title_unstemmed' which
stores the data unstemmed. So basically, with dismax -  I could boost score
on unstemmed field.  I can think of other scenarios where stemming would be
needed so stemmed field would still match.

 

Does that sound like something that will work? Any suggestions please?  

 

Much appreciated 



Can solr return pretty text as the content?

2010-06-23 Thread JohnRodey

When I feed pretty text into solr for indexing from lucene and search for it,
the content is always returned as one long line of text.  Is there a way for
solr to return the pretty formatted text to me?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-solr-return-pretty-text-as-the-content-tp917912p917912.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Can solr return pretty text as the content?

2010-06-23 Thread caman

Define Pretty text.

 

1)Are you talking about XML/JSON returned by SOLR is not pretty ?

If yes, try indent=on with your query params

 

2)Or talking about data in certain field? 

Solr returns what you feed it. Look at your filters for that field
type. Your filters/tokenizer may be stripping the formatting.

 

 

 

From: JohnRodey [via Lucene]
[mailto:ml-node+917912-920852633-124...@n3.nabble.com] 
Sent: Wednesday, June 23, 2010 1:19 PM
To: caman
Subject: Can solr return pretty text as the content?

 

When I feed pretty text into solr for indexing from lucene and search for
it, the content is always returned as one long line of text.  Is there a way
for solr to return the pretty formatted text to me? 

  _  

View message @
http://lucene.472066.n3.nabble.com/Can-solr-return-pretty-text-as-the-conten
t-tp917912p917912.html 
To start a new topic under Solr - User, email
ml-node+472068-464289649-124...@n3.nabble.com 
To unsubscribe from Solr - User, click
 (link removed) 
GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx  here. 

 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-solr-return-pretty-text-as-the-content-tp917912p917966.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Highlight question

2010-06-23 Thread Ahmet Arslan
 In the solrconfig.xml I have been able to change the
 hl.simple.pre/post variable, but when I try to change the
 hl,regex pattern or the hl.snippets they don't have any
 effect. I thought the hl.snippets would alow me to find more
 than one and highlight it, and well I tried a bunch of regex
 patterns but they didn't do anything.

int name=hl.snippets4/int param should go to under default section of 
your default SearchHandler. 

requestHandler name=standard class=solr.SearchHandler default=true
lst name=defaults
str name=echoParamsall/str 
int name=hl.snippets4/int 
/lst
/requestHandler

Also hl.formatter=regex paremter is required to activate regular expression 
based fragmenter.





Re: Help with sorting

2010-06-23 Thread Ahmet Arslan


 When I want sort the
 documents 
 wich contain a certain word by date or by instituion all I
 get is 
 an 
 order that I don't understand . 
 
 field name=datecreated type=date indexed=true
 stored=false / 
 field name=instanta type=int indexed=true
 stored=false 
 required=true / 

You need to use a sortable type: sint with solr 1.3; tint with solr 1.4

field name=instanta type=tint


  


Re: fuzzy query performance

2010-06-23 Thread Robert Muir
On Wed, Jun 23, 2010 at 3:34 PM, Peter Karich peat...@yahoo.de wrote:


 So, you mean I should try it out her:
 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/


yes, the speedups are only in trunk.

-- 
Robert Muir
rcm...@gmail.com


Re: DIH and dynamicField

2010-06-23 Thread Robert Zotter


Boyd Hemphill-2 wrote:
 
 I am  having a problem where importing with DIH and attempting to use
 dynamicField produces no result.  I get no error, nor do I get a message
 in
 the log.

It would help if you posted the relevant parts of your data-config.xml and
schema.xml. If you are doing a straight column to name mapping my first
guess would be you could have those backwards or there is some
misconfiguration in your schema.xml. For example if you have a database
column foo and you want to add it to the foo_dynamic field you should be
using something like this:

solrconfig.xml
dynamicField name=*_dynamic .../

data-config.xml
field column=foo name=foo_dynamic/

Hope this helps. 

- Robert Zotter
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-and-dynamicField-tp917823p918189.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Stemmed and/or unStemmed field

2010-06-23 Thread Robert Muir
On Wed, Jun 23, 2010 at 3:58 PM, Vishal A.
aboxfortheotherst...@gmail.comwrote:


 Here is what I am trying to do :  Someone clicks on  'Comforters  Pillows'
 , we would want the results to be filtered where title has keyword
 'Comforter' or  'Pillows' but we have been getting results with word
 'comfort' in the title. I assume it is because of stemming. What is the
 right way to handle this?


from your examples, it seems a more lightweight stemmer might be an easy
option: https://issues.apache.org/jira/browse/LUCENE-2503

-- 
Robert Muir
rcm...@gmail.com


RE: Stemmed and/or unStemmed field

2010-06-23 Thread caman

Ahh,perfect.

Will take a look. thanks

 

From: Robert Muir [via Lucene]
[mailto:ml-node+918302-232685105-124...@n3.nabble.com] 
Sent: Wednesday, June 23, 2010 4:17 PM
To: caman
Subject: Re: Stemmed and/or unStemmed field

 

On Wed, Jun 23, 2010 at 3:58 PM, Vishal A. 
[hidden email]wrote: 

 
 Here is what I am trying to do :  Someone clicks on  'Comforters 
Pillows' 
 , we would want the results to be filtered where title has keyword 
 'Comforter' or  'Pillows' but we have been getting results with word 
 'comfort' in the title. I assume it is because of stemming. What is the 
 right way to handle this? 
 

from your examples, it seems a more lightweight stemmer might be an easy 
option: https://issues.apache.org/jira/browse/LUCENE-2503

-- 
Robert Muir 
[hidden email] 



  _  

View message @
http://lucene.472066.n3.nabble.com/Stemmed-and-or-unStemmed-field-tp917876p9
18302.html 
To start a new topic under Solr - User, email
ml-node+472068-464289649-124...@n3.nabble.com 
To unsubscribe from Solr - User, click
 (link removed) 
GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx  here. 

 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Stemmed-and-or-unStemmed-field-tp917876p918309.html
Sent from the Solr - User mailing list archive at Nabble.com.


Some minor Solritas layout tweaks

2010-06-23 Thread Ken Krugler
I grabbed the latest  greatest from trunk, and then had to make a few  
minor layout tweaks.


1. In main.css, the .query-box input { height} isn't tall enough (at  
least on my Mac 10.5/FF 3.6 config), so character descenders get  
clipped.


I bumped it from 40px to 50px, and that fixed the issue for me.

2. The constraint text (for removing facet constraints) overlaps with  
the Solr logo.


It looks like the div that contains this anchor text is missing a  
class=constraints, as I see a .constraints in the CSS.


I added this class name, and also (to main.css):

.constraints {
  margin-top: 10px;
}

But IANAWD, so this is probably not the best way to fix the issue.

3. And then I see a .constraints-title in the CSS, but it's not used.

Was the intent of this to set the '' character to gray?

4. It seems silly to open JIRA issues for these types of things, but I  
also don't want to add to noise on the list.


Which approach is preferred?

Thanks,

-- Ken




Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g






Multiple Solr Webapps in Glassfish with JNDI

2010-06-23 Thread Kelly Taylor

Does anybody know how to setup multiple Solr webapps in Glassfish with JNDI?

-Kelly
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-Solr-Webapps-in-Glassfish-with-JNDI-tp918383p918383.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Setting up Eclipse with merged Lucene Solr source tree

2010-06-23 Thread Lance Norskog
I have found it easier to make these projects in my Eclipse workspace
and make remote links to the parts that I really want. This cuts the
total stuff in the project- cuts build times, 'search everywhere'
times, menus full of classes named '*file*', etc.

But git may have problems with this, and git is a lifesaver for
playing with patches etc.

Lance

On Wed, Jun 23, 2010 at 8:03 AM, Erick Erickson erickerick...@gmail.com wrote:
 Did you see this page?
 http://wiki.apache.org/solr/HowToContribute

 http://wiki.apache.org/solr/HowToContributeEspecially down near the end,
 the section
 Development Environment Tips

 HTH
 Erick

 On Wed, Jun 23, 2010 at 8:57 AM, Ukyo Virgden ukyovirg...@gmail.com wrote:

 Hi,

 I'm trying to setup and eclipse environment for combined Lusolr tree. I've
 created a Lucene project containing /trunk/lusolr/lucene
 and /trunk/lusolr/modules as one project and /trunk/lusolr/solr as another.
 I've added lucene project as a dependency to Solr project, removed solr
 libs
 from lucene project and added Lucene project to dependencies of Solr
 project.

 Lucene source tree is fine but in the Solr tree I get 5 errors

 The method getTextContent() is undefined for the type Node TestConfig.java
 /Solr/src/test/org/apache/solr/core line 91
 The method getTextContent() is undefined for the type Node TestConfig.java
 /Solr/src/test/org/apache/solr/core line 94
 The method setXIncludeAware(boolean) is undefined for the type
 DocumentBuilderFactory Config.java /Solr/src/java/org/apache/solr/core line
 113
 The method setXIncludeAware(boolean) is undefined for the type
 DocumentBuilderFactory DataImporter.java

 /Solr/contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport
 line
 The method setXIncludeAware(boolean) is undefined for the type Object
 TestXIncludeConfig.java /Solr/src/test/org/apache/solr/core line 32

 Is this the correct way to setup eclipse after the source tree merge?

 Thanks in advance
 Ukyo





-- 
Lance Norskog
goks...@gmail.com


Re: DIH and dynamicField

2010-06-23 Thread Lance Norskog
A side comment about patches and JIRA- the second-to-last comment on
SOLR-742 says ''Committed'. That means one of the committers (Shalin
in this case) committed the fix. It was in 2008 so it's in Solr 1.4.

https://issues.apache.org/jira/browse/SOLR-742?focusedCommentId=12643747page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12643747

But, yes, Robert is right: post what you can of your config files.

On Wed, Jun 23, 2010 at 3:11 PM, Robert Zotter robertzot...@gmail.com wrote:


 Boyd Hemphill-2 wrote:

 I am  having a problem where importing with DIH and attempting to use
 dynamicField produces no result.  I get no error, nor do I get a message
 in
 the log.

 It would help if you posted the relevant parts of your data-config.xml and
 schema.xml. If you are doing a straight column to name mapping my first
 guess would be you could have those backwards or there is some
 misconfiguration in your schema.xml. For example if you have a database
 column foo and you want to add it to the foo_dynamic field you should be
 using something like this:

 solrconfig.xml
 dynamicField name=*_dynamic .../

 data-config.xml
 field column=foo name=foo_dynamic/

 Hope this helps.

 - Robert Zotter
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/DIH-and-dynamicField-tp917823p918189.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Lance Norskog
goks...@gmail.com


Re: Multiple Solr Webapps in Glassfish with JNDI

2010-06-23 Thread Otis Gospodnetic
Hi Kelly,

I'm not much of a Classfish user, but have you tried following the JNDI 
instructions for Tomcat, maybe that works for Glassfish, too?

http://search-lucene.com/?q=jndifc_project=Solr
 
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Kelly Taylor wired...@hotmail.com
 To: solr-user@lucene.apache.org
 Sent: Wed, June 23, 2010 8:03:48 PM
 Subject: Multiple Solr Webapps in Glassfish with JNDI
 
 
Does anybody know how to setup multiple Solr webapps in Glassfish with 
 JNDI?

-Kelly
-- 
View this message in context: 
 href=http://lucene.472066.n3.nabble.com/Multiple-Solr-Webapps-in-Glassfish-with-JNDI-tp918383p918383.html;
  
 target=_blank 
 http://lucene.472066.n3.nabble.com/Multiple-Solr-Webapps-in-Glassfish-with-JNDI-tp918383p918383.html
Sent 
 from the Solr - User mailing list archive at Nabble.com.


Re: fuzzy query performance

2010-06-23 Thread Otis Gospodnetic
Btw. here you can see Robert's presentation on what he did to speed up fuzzy 
queries:  http://www.slideshare.net/otisg
 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch




- Original Message 
 From: Robert Muir rcm...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wed, June 23, 2010 5:13:10 PM
 Subject: Re: fuzzy query performance
 
 On Wed, Jun 23, 2010 at 3:34 PM, Peter Karich 
 ymailto=mailto:peat...@yahoo.de; 
 href=mailto:peat...@yahoo.de;peat...@yahoo.de 
 wrote:


 So, you mean I should try it out her:
 
 href=http://svn.apache.org/viewvc/lucene/dev/trunk/solr/; target=_blank 
 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/


yes, 
 the speedups are only in trunk.

-- 
Robert Muir

 ymailto=mailto:rcm...@gmail.com; 
 href=mailto:rcm...@gmail.com;rcm...@gmail.com


Re: Alphabetic range

2010-06-23 Thread Otis Gospodnetic
Sophie,

Go to your Solr Admin page, look for the Analysis page link, go there, enter 
some artists names, enter the query, check the verbose checkboxes, and submit.  
This will tell you what is going on with your analysis at index and at search 
time.
 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Sophie M. sop...@beezik.com
 To: solr-user@lucene.apache.org
 Sent: Wed, June 23, 2010 8:56:39 AM
 Subject: Alphabetic range
 
 
Hello all,

I try since several day to build up an alphabetical range. 
 I will explain
all steps (i have the Solr1.4 Enterprise  Search Server 
 book written by
Smiley and Pugh).

I want get all artists beginning by 
 the two first letter. If I request mi,
I want to have as response michael 
 jackson and all artists name beginning
by mi.

I defined a field 
 type similiar to Smiley and Pugh's example p.148

fieldType 
 name=bucketFirstTwoLetters class=solr.TextField
sortMissingLast=true 
 omitNorms=true
analyser 
 type=index

 tokenizer 
 class=solr.PatternTokenizerFactory
pattern=^([a-zA-Z])([a-zA-Z]).* 
 group=2/ !-- les deux premieres
lettres--

 /analyser

 analyser type=query

 tokenizer 
 class=solr.KeywordTokenizerFactory/

 /analyser

 /fieldType

I defined the field ArtistSort like 
 : 

field name=ArtistSort type=bucketFirstTwoLetters 
 stored=true
multivalued=false/
To the request : 


 href=http://localhost:8983/solr/music/select?indent=onq=yuqt=standardwt=standardfacet=onfacet.field=ArtistSortfacetsort=lexfacet.missing=onfacet.method=enumfl=ArtistSort;
  
 target=_blank 
 http://localhost:8983/solr/music/select?indent=onq=yuqt=standardwt=standardfacet=onfacet.field=ArtistSortfacetsort=lexfacet.missing=onfacet.method=enumfl=ArtistSort

I 
 get :


 href=http://lucene.472066.n3.nabble.com/file/n916716/select.xml; 
 target=_blank 
 http://lucene.472066.n3.nabble.com/file/n916716/select.xml select.xml 
 

I don't understand why the pattern doesn't my exacty. For example An An 
 Yu
matches but I only want artists whom name begins by yu. And I know that 
 an
artist named ReYu would match because ReYu would be interpreted as Re Yu 
 (as
two words).

I also tried to make an other type of queries like : 
 


 href=http://localhost:8983/solr/music/select?indent=onversion=2.2q=ArtistSort:mi*fq=start=0rows=10fl=ArtistSortqt=standardwt=standardexplainOther=hl.fl=;
  
 target=_blank 
 http://localhost:8983/solr/music/select?indent=onversion=2.2q=ArtistSort:mi*fq=start=0rows=10fl=ArtistSortqt=standardwt=standardexplainOther=hl.fl=

I 
 get exacly what I would. I made several tries, I get only artist's names
wich 
 begins by the good first to letters.

But I get very few responses, see 
 there :

result name=response numFound=6 
 start=0

doc
str name=ArtistSortmike manne and 
 tiger blues/str
/doc
−
doc
str 
 name=ArtistSortmimika/str
/doc
−
doc
str 
 name=ArtistSortmiduno/str
/doc
−
doc
str 
 name=ArtistSortmilue 
 macïro/str
/doc
−
doc
str 
 name=ArtistSortmister 
 pringle/str
/doc
−
doc
str 
 name=ArtistSortmimmai/str
/doc


In my index 
 there is more than 80 000 artists...  I really don't understand
why I 
 can't get more responses. I think about the problem since days and
days and 
 now my brain freezes 

Thank you in advance.

Sophie
-- 
View 
 this message in context: 
 href=http://lucene.472066.n3.nabble.com/Alphabetic-range-tp916716p916716.html;
  
 target=_blank 
 http://lucene.472066.n3.nabble.com/Alphabetic-range-tp916716p916716.html
Sent 
 from the Solr - User mailing list archive at Nabble.com.


Re: Performance related question on DISMAX handler..

2010-06-23 Thread Otis Gospodnetic
BB,

Dismax could be slower than standard, depending on what kinds of queries you 
throw at either handler.
Millions of docs is a bit imprecise (2M or 22M or 222M or 999M, tweet-sized 
docs or book sized docs), but given adequate hardware and proper treatment 
shouldn't be a problem.
 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: bbarani bbar...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tue, June 22, 2010 2:27:05 PM
 Subject: Performance related question on DISMAX handler..
 
 
Hi,

I just want to know if there will be any overhead / performance 
 degradation
if I use the Dismax search handler instead of standard search 
 handler?

We are planning to index millions of documents and not sure if 
 using Dismax
will slow down the search performance. Would be great if someone 
 can share
their thoughts.

Thanks,
BB
-- 
View this message in 
 context: 
 href=http://lucene.472066.n3.nabble.com/Performance-related-question-on-DISMAX-handler-tp914892p914892.html;
  
 target=_blank 
 http://lucene.472066.n3.nabble.com/Performance-related-question-on-DISMAX-handler-tp914892p914892.html
Sent 
 from the Solr - User mailing list archive at Nabble.com.


Spatial types and DIH

2010-06-23 Thread Eric Angel
I'm using solr 4.0-2010-06-23_08-05-33 and can't figure out how to add the 
spatial types (LatLon, Point, GeoHash or SpatialTile) using dataimporthandler.  
My lat/lngs from the database are in separate fields.  Does anyone know how to 
do his?

Eric

Re: Field missing when use distributed search + dismax

2010-06-23 Thread Otis Gospodnetic
Make sure you list it in ...fl=ID,type or set it in the defaults section of 
your handler.
 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Scott Zhang macromars...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tue, June 22, 2010 11:04:07 AM
 Subject: Field missing when use distributed search + dismax
 
 Hi. All.
   I was using distributed search over 30 solr instance, the 
 previous one
was using the standard query handler. And the result was 
 returned correctly.
each result has 2 fields. ID and type.
  
 Today I want to use search withk dismax, I tried search with each
instance 
 with dismax. It works correctly, return ID and type for each
result. The 
 strange thing is when I
use distributed search, the result only have ID. 
 The field type
disappeared. I need that type to know what the ID refer 
 to. Why solr
eat my type?


Thanks.
Regards.
Scott


Re: anyone use hadoop+solr?

2010-06-23 Thread Otis Gospodnetic
Marc is referring to the very informative by Ted Dunning from maybe a month or 
so ago.

For what it's worth, we just used Hadoop Streaming, JRuby, and EmbeddedSolr to 
speed up indexing by parallelizing it.
 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Marc Sturlese marc.sturl...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tue, June 22, 2010 12:43:27 PM
 Subject: Re: anyone use hadoop+solr?
 
 
Well, the patch consumes the data from a csv. You have to modify the input 
 to
use TableInputFormat (I don't remember if it's called exaclty like that) 
 and
it will work.
Once you've done that, you have to specify as much 
 reducers as shards you
want.

I know 2 ways to index using 
 hadoop
method 1 (solr-1301  nutch):
-Map: just get data from the 
 source and create key-value
-Reduce: does the analysis and index the 
 data
So, the index is build on the reducer side

method 2 (hadoop 
 lucene index contrib)
-Map: does analysis and open indexWriter to add 
 docs
-Reducer: Merge small indexs build in the map
So, indexs are build on 
 the map side
method 2 has no good integration with Solr at the 
 moment.

In the jira (SOLR-1301) there's a good explanation of the 
 advantages and
disadvantages of indexing on the map or reduce side. I 
 recomend you to read
with detail all the comments on the jira to know exactly 
 how it works.


-- 
View this message in context: 
 href=http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914625.html;
  
 target=_blank 
 http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914625.html
Sent 
 from the Solr - User mailing list archive at Nabble.com.


Re: solr with hadoop

2010-06-23 Thread Otis Gospodnetic
I don't think it's ever been discussed - your Q below is #1 hit currently: 
http://search-lucene.com/?q=%2B%28dih+OR+dataimporthandler%29+hdfs
 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Jon Baer jonb...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tue, June 22, 2010 12:47:14 PM
 Subject: Re: solr with hadoop
 
 I was playing around w/ Sqoop the other day, its a simple Cloudera tool for 
 imports (mysql - hdfs) @ 
 href=http://www.cloudera.com/developers/downloads/sqoop/; target=_blank 
 http://www.cloudera.com/developers/downloads/sqoop/

It seems to me 
 (it would be pretty efficient) to dump to HDFS and have something like Data 
 Import Handler be able to read from hdfs:// directly ...

Has this route 
 been discussed / developed before (ie DIH w/ hdfs:// handler)?

- 
 Jon

On Jun 22, 2010, at 12:29 PM, MitchK wrote:

 
 I 
 wanted to add a Jira-issue about exactly what Otis is asking here.
 
 Unfortunately, I haven't time for it because of my exams.
 
 
 However, I'd like to add a question to Otis' ones:
 If you destribute the 
 indexing-progress this way, are you able to replicate
 the different 
 documents correctly?
 
 Thank you.
 - Mitch
 
 
 Otis Gospodnetic-2 wrote:
 
 Stu,
 
 
 Interesting!  Can you provide more details about your 
 setup?  By load
 balance the indexing stage you mean 
 distribute the indexing process,
 right?  Do you simply take 
 your content to be indexed, split it into N
 chunks where N matches 
 the number of TaskNodes in your Hadoop cluster and
 provide a map 
 function that does the indexing?  What does the reduce
 function 
 do?  Does that call IndexWriter.addAllIndexes or do you do that
 
 outside Hadoop?
 
 Thanks,
 Otis
 
 --
 Sematext -- 
 http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 - Original Message 
 From: Stu Hood 
 ymailto=mailto:stuh...@webmail.us; 
 href=mailto:stuh...@webmail.us;stuh...@webmail.us
 To: 
 ymailto=mailto:solr-user@lucene.apache.org; 
 href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org
 
 Sent: Monday, January 7, 2008 7:14:20 PM
 Subject: Re: solr with 
 hadoop
 
 As Mike suggested, we use Hadoop to organize our 
 data en route to Solr.
 Hadoop allows us to load balance the indexing 
 stage, and then we use
 the raw Lucene IndexWriter.addAllIndexes 
 method to merge the data to be
 hosted on Solr instances.
 
 
 Thanks,
 Stu
 
 
 
 
 -Original Message-
 From: Mike Klaas 
 ymailto=mailto:mike.kl...@gmail.com; 
 href=mailto:mike.kl...@gmail.com;mike.kl...@gmail.com
 
 Sent: Friday, January 4, 2008 3:04pm
 To: 
 ymailto=mailto:solr-user@lucene.apache.org; 
 href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org
 
 Subject: Re: solr with hadoop
 
 On 4-Jan-08, at 11:37 AM, 
 Evgeniy Strokin wrote:
 
 I have huge index base 
 (about 110 millions documents, 100 fields  
 each). But size 
 of the index base is reasonable, it's about 70 Gb.  
 All I 
 need is increase performance, since some queries, which match  
 
 big number of documents, are running slow.
 So I 
 was thinking is any benefits to use hadoop for this? And if  
 
 so, what direction should I go? Is anybody did something 
 for  
 integration Solr with Hadoop? Does it give any 
 performance boost?
 
 Hadoop might be useful for 
 organizing your data enroute to Solr, but  
 I don't see how it 
 could be used to boost performance over a huge  
 Solr 
 index.  To accomplish that, you need to split it up over two  
 
 machines (for which you might find hadoop useful).
 
 
 -Mike
 
 
 
 
 
 
 
 
 -- 
 View this message in 
 context: 
 href=http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p914589.html;
  
 target=_blank 
 http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p914589.html
 
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Nested table support ability

2010-06-23 Thread Otis Gospodnetic
Amit,

I'd say it depends on the types of queries you need to run.  Maybe you 
mentioned that already, but your reply cut it off (Nabble).  I can say this 
with certainty: 1M is a small number and 30 fields is not a big deal.
 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: amit_ak amit...@mindtree.com
 To: solr-user@lucene.apache.org
 Sent: Wed, June 23, 2010 2:00:50 AM
 Subject: Re: Nested table support ability
 
 
Hi Otis, Thanks for the update.

My paramteric search has to span 
 across customer table and 30 child tables.
We have close to 1 million 
 customers. Do you think Lucene/Solr is the right
fsolution for such 
 requirements? or database search would be more 
 optimal.

Regards,
Amit

-- 
View this message in context: 
 href=http://lucene.472066.n3.nabble.com/Nested-table-support-ability-tp905253p916087.html;
  
 target=_blank 
 http://lucene.472066.n3.nabble.com/Nested-table-support-ability-tp905253p916087.html
Sent 
 from the Solr - User mailing list archive at Nabble.com.


Re: Non-prefix, hierarchical autocomplete? Would SOLR-1316 work? Solritas?

2010-06-23 Thread Otis Gospodnetic
Hi Andy,

I didn't check out SOLR-1316 yet, other then looking at the comments.  Sounds 
more complicated than it should be, but maybe it's great and I really need to 
try it.
Solritas uses TermsComponent, which should work well for individual terms 
(which country and city names are not, unless you tokenize them as single 
tokens).
I don't think there is anything that will do everything you need out of the box.
You can get autocompletion on the country field, but you then need to do a bit 
of JS work to restrict cities to the country specified in the country field.  
Actually, now that I wrote this, I think we did something very much like that 
with http://sematext.com/products/autocomplete/index.html .
Finally, for dealing with commas or spaces as tag separators, you can peak at 
the JS in a service like delicious.com and see how they do it.  Their 
implementation of tag entry is nice.

And here is another slick auto-complete with extra niceness in the search form 
itself, from one of our customers: http://www.etsy.com/explorer 
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Andy angelf...@yahoo.com
 To: solr-user@lucene.apache.org
 Sent: Sat, June 19, 2010 3:28:15 AM
 Subject: Non-prefix, hierarchical autocomplete? Would SOLR-1316 work? 
 Solritas?
 
 Hi,

I've seen some posts on using SOLR-1316 or Solritas for autocomplete. 
 Wondered what is the best solution for my use case:

1) I would like to 
 have an hierarchical autocomplete. For example, I have a Country dropdown 
 list and a City textbox. A user would select a country from the dropdown 
 list, 
 and then type out the City in the textbox. Based on which country he 
 selected, I 
 want to limit the autocomplete suggestions to cities that are relevant for 
 the 
 selected country.

This hierarchy could be multi-level. For example, there 
 may be a Neighborhood textbox. The autocomplete suggestions for 
 Neighborhood 
 would be limited to neighborhoods that are relevant for the city entered by 
 the 
 user in the City textbox.

2) I want to have autocomplete suggestions 
 that includes non-prefix matches. For example, if the user type auto, the 
 autocomplete suggestions should include terms such as automata and build 
 automation.

3) I'm doing autocomplete for tags. I would like to allow 
 multi-word tags and use comma (,) as a separator for tags. So when the use 
 hits the space bar, he is still typing out the same tag, but when he hits the 
 comma key, he's starting a new tag.

Would SOLR-1316 or Solritas work for 
 the above requirements? If they do how do I set it up? I can't really find 
 much 
 documentation on SOLR-1316 or Solritas in this 
 area.

Thanks.


Re: Indexing Different Types

2010-06-23 Thread Otis Gospodnetic
Stephen,

Sure, multiple cores, one for each type is one approach.  Another one is just 
adding a 'type' field and restricting auto-completion by type.  In our AC 
implementation we have a piece made for very similar situations, where you have 
multiple types of entities, but want a single input field (search box) to give 
you suggestions from all entity types, yet have suggestions for different types 
visually grouped together.  I don't think we have a demo of that anywhere, 
though you can see AC in action on http://search-lucene.com/ for example.
 
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Divine Mercy itsl...@hotmail.com
 To: solr-user@lucene.apache.org
 Sent: Mon, June 21, 2010 4:59:55 PM
 Subject: Indexing Different Types
 
 
Hi 

I have a requirement and I am wondering what is the best way to 
 handle this through Solr.

I have different types of unrelated data for 
 example categories, tags and some address information.

I would like to 
 implement auto complete on this information, so there would be an auto 
 complete 
 form for each one.

What would be the best way for implementing this using 
 SOLR?

Would this be using multiple indexes one index for tags, categories 
 and address.





Regards


Stephen


   
 
_

 href=http://clk.atdmt.com/UKM/go/19780/direct/01/; target=_blank 
 http://clk.atdmt.com/UKM/go/19780/direct/01/
We want to hear all 
 your funny, exciting and crazy Hotmail stories. Tell us now