how to save a snapshot of an index?

2010-07-12 Thread Li Li
When I add some docs by
post.jar(org.apache.solr.util.SimplePostTool), It commits after all
docs are added. It will call IndexWriter.commit(). And a new segment
will be added and sometimes it triggers segment merging. New index
files will be generated(frm, tii,tis, ). Old segments will be
deleted after all references are closed(All the reader which open it).
That's ok. But I want to backup a version of my index so that when
something wrong happen I can use it. I can write a script to backup
all the files in the index directory everyday. But it may happen that
when it's indexing, the script may backup wrong files. So it must
obtain the ***.lock file to make things right. Is there any built in
tools in solr for my need ? I just want to back up the index
periodly(such as 0 clock every day).


RE: Database connections during data import

2010-07-12 Thread Willem Van Riet
Hi Gora

Also indexing 4mil + records from a MS-SQL database - index size is
about 25Gb.

I managed to solve both the performance and recovery issue by
segmenting the indexing process along with the
CachedSqlEntityProcessor. 

Basically I populate a temp table with a subset of primary keys (I use a
modulus of the productId to achieve this) and inner join from that table
on both the primary query and all the child queries. As a result when a
segment fails (usually also due to connectivity being interrupted) only
one segment has to be re-done. Imports are managed by a custom built
service running on the SOLR box. Its smart enough to pick up stalled
imports when polling dataimport and restart that segment.

With indexing segmented data sets become small enough for
CachedSqlEntityProcessor to load it all into RAM (the box has 8GB).
Doing this reduced indexing time from 27hours to 2.5hours! (Due to
currency changes we need a full re-index every day). I suspect that
latency kills import speed whenever there's child queries involved.
Databases are also generally much better at 1 query with 300,000 rows
than 100,000 queries with 2-4.

The 4GB (actually 3.2GB) limit only applies to the 32bit version of
Windows/SQL Server. That being said SQL server is not much of a RAM hog.
After its basic querying needs memory is only used to cache indexes and
query plans. SQL is pretty happy with 4GB but if you can upgrade the OS
another 2GB for the disk cache will help a lot. 

Regards,
Willem 

PS: You are using the JTDS driver? (http://jtds.sourceforge.net/) I find
it faster and more stable than the MS one.



-Original Message-
From: Gora Mohanty [mailto:g...@srijan.in] 
Sent: 10 July 2010 03:31 PM
To: solr-user@lucene.apache.org
Subject: Database connections during data import

Hi,

  We are indexing a large amount of data into Solr from a MS-SQL
database (don't ask!). There are approximately 4 million records,
and a total database size of the order of 20GB. There is also a need
for incremental updates, but these are only a few % of the total.

  After some trials-and-error, things are working great. Indexing is
a little slow as per our original expectations, but this is
probably to be expected, given that:
  * There are a fair number of queries per record indexed into Solr
  * Only one database server is in use at the moment, and this
could well be a bottle-neck (please see below).
  * The index has many fields, and we are also storing everything
in this phase, so that we can recover data directly from the
Solr index.
  * Transformers are used pretty liberally
  * Finally, we are no longer so concerned about the indexing speed
of a single Solr instance, as thanks to the possibility of
merging indexes, we can simply throw more hardware at the
problem.
(Incidentally, a big thank-you to everyone who has contributed to
 Solr. The above work was way easier than we had feared.)

As a complete indexing takes about 20h, sometimes the process gets
interrupted due to a loss of the database connection. I can tell
that that a loss of connection is the problem from the Solr Tomcat
logs, but it is difficult to tell whether it is the database
dropping connections (the database server is at 60-70% CPU
utilisation, but close to being maxed out at 4GB, and I am told
that MS-SQL/the OS cannot handle more RAM), or a network glitch.
What happens is that the logs report a reconnection, but the number
of processed records reported by the DataImportHandler
at /solr/dataimport?command=full-import stops incrementing, even
several hours after the reconnection. Is there any way to recover
from a reconnection, and continue DataImportHandler indexing at the
point where the process left off?

Regards,
Gora

P.S. Incidentally, would there be any interest in a
 GDataRequestHandler for Solr queries, and a
 GDataResponseWriter? We wrote one in the interests
 of trying to adhere to a de-facto standard, and can consider
 contributing these, after further testing, and cleanup.


Re: Modifications to AbstractSubTypeFieldType

2010-07-12 Thread Mark Allan

On 7 Jul 2010, at 6:24 pm, Yonik Seeley wrote:
On Wed, Jul 7, 2010 at 8:15 AM, Grant Ingersoll  
gsing...@apache.org wrote:
Originally, I had intended that it was just for one Field Sub Type,  
thinking that if we ever wanted multiple sub types, that a new,  
separate class would be needed



Right - this was my original thinking too.  AbstractSubTypeFieldType
is only a convenience class to create compound types... people can do
it other ways.


Just for clarification, does that mean my modifications won't be  
included?  If so, can you let me know so that I can extract the  
changes and maintain them in a different package structure from the  
main Solr code please.


Cheers
Mark

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: how to save a snapshot of an index?

2010-07-12 Thread Ahmet Arslan
     That's ok. But I want to backup a version of
 my index so that when
 something wrong happen I can use it. I can write a script
 to backup
 all the files in the index directory everyday. But it may
 happen that
 when it's indexing, the script may backup wrong files. So
 it must
 obtain the ***.lock file to make things right. Is there any
 built in
 tools in solr for my need ? I just want to back up the
 index periodly(such as 0 clock every day).

Under the bin directory there are some scripts for that purpose.
http://wiki.apache.org/solr/CollectionDistribution






Re: how to save a snapshot of an index?

2010-07-12 Thread Peter Karich
Hi Li Li,

If the changes are not that frequently just copy the data folder:
http://wiki.apache.org/solr/SolrOperationsTools

Or see this question + answer:
http://stackoverflow.com/questions/3083314/solr-incremental-backup-on-real-time-system-with-heavy-index

where those direct links could help:
http://wiki.apache.org/solr/CollectionDistribution (solr  1.4)
http://wiki.apache.org/solr/SolrReplication (solr = 1.4)

Regards,
Peter.

 When I add some docs by
 post.jar(org.apache.solr.util.SimplePostTool), It commits after all
 docs are added. It will call IndexWriter.commit(). And a new segment
 will be added and sometimes it triggers segment merging. New index
 files will be generated(frm, tii,tis, ). Old segments will be
 deleted after all references are closed(All the reader which open it).
 That's ok. But I want to backup a version of my index so that when
 something wrong happen I can use it. I can write a script to backup
 all the files in the index directory everyday. But it may happen that
 when it's indexing, the script may backup wrong files. So it must
 obtain the ***.lock file to make things right. Is there any built in
 tools in solr for my need ? I just want to back up the index
 periodly(such as 0 clock every day).
   



Re: Field Collapsing SOLR-236

2010-07-12 Thread Rakhi Khatwani
Hi Mozzam,
  I finally got it working
Thanks a ton guys :)

Regards
Raakhi

On Sat, Jul 10, 2010 at 10:45 AM, Moazzam Khan moazz...@gmail.com wrote:

 Hi Rakhi,

 Sorry, I didn't see this email until just now. Did you get it working?


 If not here's some things that might help.


 - Download the patch first.
 - Check the date on which the patch was released.
 - Download the version of the trunk that existed at that date.
 - Apply the patch using the patch program in linux. There is a Windows
 program for patching but I can't remember right now.
 - After applying the patch just compile the whole thing


 It might be better if you used the example folder first and modify the
 config to work for multicore (at least that's what I did) . You can
 compile example by doing

 ant example

 (if I remember correctly)

 For config stuff refer to this link :

 http://wiki.apache.org/solr/FieldCollapsing


 HTH :)

 - Moazzam


 I'd give you the



 On Wed, Jun 23, 2010 at 7:23 AM, Rakhi Khatwani rkhatw...@gmail.com
 wrote:
  Hi,
But these is almost no settings in my config
  heres a snapshot of what i have in my solrconfig.xml
 
  config
  updateHandler class=solr.DirectUpdateHandler2 /
 
  requestDispatcher handleSelect=true 
  requestParsers enableRemoteStreaming=false
  multipartUploadLimitInKB=2048 /
  /requestDispatcher
 
  requestHandler name=standard class=solr.StandardRequestHandler
  default=true /
  requestHandler name=/update class=solr.XmlUpdateRequestHandler /
  requestHandler name=/admin/
  class=org.apache.solr.handler.admin.AdminHandlers /
 
  !-- config for the admin interface --
  admin
  defaultQuery*:*/defaultQuery
  /admin
 
  !-- config for field collapsing --
  searchComponent name=query
  class=org.apache.solr.handler.component.CollapseComponent /
  /config
 
  Am i goin wrong anywhere?
  Regards,
  Raakhi
 
  On Wed, Jun 23, 2010 at 3:28 PM, Govind Kanshi govind.kan...@gmail.com
 wrote:
 
  fieldType:analyzer without class or tokenizer  filter list seems to
 point
  to the config - you may want to correct.
 
 
  On Wed, Jun 23, 2010 at 3:09 PM, Rakhi Khatwani rkhatw...@gmail.com
  wrote:
 
   Hi,
  I checked out modules  lucene from the trunk.
   Performed a build using the following commands
   ant clean
   ant compile
   ant example
  
   Which compiled successfully.
  
  
   I then put my existing index(using schema.xml from
 solr1.4.0/conf/solr/)
  in
   the multicore folder, configured solr.xml and started the server
  
   When i type in http://localhost:8983/solr
  
   i get the following error:
   org.apache.solr.common.SolrException: Plugin init failure for
  [schema.xml]
   fieldType:analyzer without class or tokenizer  filter list
   at
  
  
 
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:168)
   at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:480)
   at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:122)
   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:429)
   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:286)
   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:198)
   at
  
  
 
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:123)
   at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:86)
   at
 org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
   at
  org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at
  
  
 
 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:662)
   at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
   at
  
  
 
 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1250)
   at
  
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
   at
 org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467)
   at
  org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at
  
  
 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
   at
  
  
 
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
   at
  org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at
  
  
 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
   at
  org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at
  
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
   at org.mortbay.jetty.Server.doStart(Server.java:224)
   at
  org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
  
  
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at
  
  
 
 

Re: Sort by Day - Use of DateMathParser in Function Query?

2010-07-12 Thread Chantal Ackermann
Hi Hoss,

 ...somewhere you got confused, or missunderstood something.  There is no 
 default date field in Solr, there are only recomendations and examples 
 provided in the example schema.xml -- in Solr 1.4.1 *and* in Solr 1.4 the 
 recommended field for dealing with dates is solr.TrieDateField
 

The idea of a default date type came while reading this on
http://wiki.apache.org/solr/FunctionQuery:


Arguments may be numerically indexed date fields such as TrieDate (the
default in 1.4), or date math (examples in SolrQuerySyntax) based on a
constant date or NOW. 


And now that I revisited that sentence, I see that it answers my
question on whether I can use date math in those queries.
Sorry for not reading more thoroughly...


 As noted in the FunctionQuery wiki page you mentioned, the ms() function 
 does not work with solr.DateField.  
 
 (most likely your schema.xml originally started from the example in SOlr 
 1.3 or earlier ... *OR* ... you needed the 
 sortMissingLast/sortMissingFirst functionality that DateField supports but 
 TrieDateField does not.  the 1.4 example schema.xml explains the 
 differences)

Actually, right now, I don't need sortMissingLast because the date is
required for all documents. It is good that you mention it, though. I
will keep it in mind when considering changing a field to TrieDate.

Thanks!
Chantal





Re: Filter multivalue fields from search result

2010-07-12 Thread Alex J. G. Burzyński

Hi,

So if those are separate documents how should I handle paging? Two 
separate queries?
First to return all matching courses-events pairs, and second one to get 
courses for given page?


Is this common design described in details somewhere?

Thanks,
Alex

On 2010-07-09 01:50, Lance Norskog wrote:

Yes, denormalizing the index into separate (name,town) pairs is the
common design for this problem.

2010/7/8 Alex J. G. Burzyńskimailing-s...@ajgb.net:
   

Hi,

Is it possible to remove from search results the multivalued fields that
don't pass the search criteria?

My schema is defined as:

!-- course_id --
field name=id type=string indexed=true stored=true
required=true /
!-- course_name --
field name=name type=string indexed=true stored=true/
!-- events.event_town --
field name=town type=string indexed=true stored=true
multiValued=true/
!-- events.event_date --
field name=date type=tdate indexed=true stored=true
multiValued=true/

And example docs are:

++--+++
| id | name | town   | date   |
++--+++
| 1  | Microsoft Excel  | London | 2010-08-20 |
||  | Glasgow| 2010-08-24 |
||  | Leeds  | 2010-08-28 |
| 2  | Microsoft Word   | Aberdeen   | 2010-08-21 |
||  | Reading| 2010-08-25 |
||  | London | 2010-08-29 |
| 2  | Microsoft Powerpoint | Birmingham | 2010-08-22 |
||  | Leeds  | 2010-08-26 |
++--+++

so the query for q=name:Microsoft town:Leeds returns docs 1  3.

How would I remove London/Glasgow from doc 1 and Birmingham from doc 3?

Or is it that I should create separate doc for each name-event?

Thanks,
Alex

 



   


Two analyzer per field

2010-07-12 Thread Mark N
Is it possible to specify two analyzers per fields

for example , consider a field  *F1  *( keyword analyzer) = cheers mate
*F2 *(keyword analyzer ) =
hello world

There is also a copy field  *TEXT *( standard analyzer )   which will store
the  terms  { cheers mate hello world }

now when user perform any search we will be looking at copy field TEXT
only which uses standard analyzer . Suppose user search hello word  phrase
it will not return any result
as hello and world terms are tokenized .

is it possible that I index hello world as it is as well in to
*TEXT*field ? i.e can I use keyword analyzer as well and standard
analyzer for
field TEXT
what should be better approach to handle this situation ?





-- 
Nipen Mark


fq= more then one ?

2010-07-12 Thread Jörg Agatz
Hallo,

i tryes to ceate a new Search for mails, and become a Problem..

If i search:
http://172.20.1.33:8983/solr/select/?q=*:*start=0fq=EMAIL_HEADER_FROM:t...@mail.de

it works, i only get E-Mails from t...@mail.de
But i need something like That:

http://172.20.1.33:8983/solr/select/?q=*:*start=0fq=EMAIL_HEADER_FROM:t...@mail.defq=EMAIL_HEADER_TO:t...@mail.de

But that, dosent work, it looks like, i can Only one parameter in FQ..

Maby you can help me.

King


Re: fq= more then one ?

2010-07-12 Thread Chantal Ackermann
Hi Jörg,

the filter queries are exclusive. You can specify as many as you want
but everything that does not fit one of them will be excluded from your
result.

You can specify an OR clause in a single filter query to achieve what
you want:

fq=(EMAIL_HEADER_FROM:t...@mail.de OR EMAIL_HEADER_TO:t...@mail.de)

Cheers,
Chantal

On Mon, 2010-07-12 at 11:05 +0200, Jörg Agatz wrote:
 Hallo,
 
 i tryes to ceate a new Search for mails, and become a Problem..
 
 If i search:
 http://172.20.1.33:8983/solr/select/?q=*:*start=0fq=EMAIL_HEADER_FROM:t...@mail.de
 
 it works, i only get E-Mails from t...@mail.de
 But i need something like That:
 
 http://172.20.1.33:8983/solr/select/?q=*:*start=0fq=EMAIL_HEADER_FROM:t...@mail.defq=EMAIL_HEADER_TO:t...@mail.de
 
 But that, dosent work, it looks like, i can Only one parameter in FQ..
 
 Maby you can help me.
 
 King



Re: fq= more then one ?

2010-07-12 Thread Rebecca Watson
hi,

you shouldn't have two fq parameters -- some solr params work like
that, but fq doesn't

 http://172.20.1.33:8983/solr/select/?q=*:*start=0fq=EMAIL_HEADER_FROM:t...@mail.defq=EMAIL_HEADER_TO:t...@mail.de

you need to combine it into a single param i.e. try putting it as an
OR or AND if you're using the standard request handler:

fq=EMAIL_HEADER_FROM:t...@mail.de%20or%20email_header_to:t...@mail.de

or put something like + if you're using dismax (i think but i don't use it :) )

hope that helps,

bec :)


Re: fq= more then one ?

2010-07-12 Thread Rebecca Watson
oops - i thought you couldn't put more than one - ignore my answer then :)

On 12 July 2010 17:20, Rebecca Watson bec.wat...@gmail.com wrote:
 hi,

 you shouldn't have two fq parameters -- some solr params work like
 that, but fq doesn't

 http://172.20.1.33:8983/solr/select/?q=*:*start=0fq=EMAIL_HEADER_FROM:t...@mail.defq=EMAIL_HEADER_TO:t...@mail.de

 you need to combine it into a single param i.e. try putting it as an
 OR or AND if you're using the standard request handler:

 fq=EMAIL_HEADER_FROM:t...@mail.de%20or%20email_header_to:t...@mail.de

 or put something like + if you're using dismax (i think but i don't use it :) 
 )

 hope that helps,

 bec :)



Re: Filter multivalue fields from search result

2010-07-12 Thread Chantal Ackermann
Hi Alex,

I think you have to explain the complete use case. Paging is done by
specifying the parameter start (and rows if you want to have more or
less than 10 hits per page). For each page you need of course a new
query, but the queries differ only in the parameter value start (first
page start=0, second page start=10 etc. if rows=10). The other
parameters remain the same.

You should also have a look at facets. They might help you to get a list
of the values of your multi valued fields that you can display in the
UI, allowing the user to drill down the results further.

Chantal

On Mon, 2010-07-12 at 10:26 +0200, Alex J. G. Burzyński wrote:
 Hi,
 
 So if those are separate documents how should I handle paging? Two 
 separate queries?
 First to return all matching courses-events pairs, and second one to get 
 courses for given page?
 
 Is this common design described in details somewhere?
 
 Thanks,
 Alex
 
 On 2010-07-09 01:50, Lance Norskog wrote:
  Yes, denormalizing the index into separate (name,town) pairs is the
  common design for this problem.
 
  2010/7/8 Alex J. G. Burzyńskimailing-s...@ajgb.net:
 
  Hi,
 
  Is it possible to remove from search results the multivalued fields that
  don't pass the search criteria?
 
  My schema is defined as:
 
  !-- course_id --
  field name=id type=string indexed=true stored=true
  required=true /
  !-- course_name --
  field name=name type=string indexed=true stored=true/
  !-- events.event_town --
  field name=town type=string indexed=true stored=true
  multiValued=true/
  !-- events.event_date --
  field name=date type=tdate indexed=true stored=true
  multiValued=true/
 
  And example docs are:
 
  ++--+++
  | id | name | town   | date   |
  ++--+++
  | 1  | Microsoft Excel  | London | 2010-08-20 |
  ||  | Glasgow| 2010-08-24 |
  ||  | Leeds  | 2010-08-28 |
  | 2  | Microsoft Word   | Aberdeen   | 2010-08-21 |
  ||  | Reading| 2010-08-25 |
  ||  | London | 2010-08-29 |
  | 2  | Microsoft Powerpoint | Birmingham | 2010-08-22 |
  ||  | Leeds  | 2010-08-26 |
  ++--+++
 
  so the query for q=name:Microsoft town:Leeds returns docs 1  3.
 
  How would I remove London/Glasgow from doc 1 and Birmingham from doc 3?
 
  Or is it that I should create separate doc for each name-event?
 
  Thanks,
  Alex
 
   
 
 
 




Re: fq= more then one ?

2010-07-12 Thread Jörg Agatz
OK...
Thanks..

It works if i try it direktly..
but in PHP it dosent:

*Warning*: file_get_contents(http://@mail.de OR
EMAIL_HEADER_TO:t...@mail.de email_header_to%3at...@mail.de)) [
function.file-get-contentshttp://172.20.1.33/new/function.file-get-contents]:
failed to open stream: HTTP request failed! HTTP/1.1 400 Bad Request in *
/var/www/new/msearchres.php* on line *32*


Code:

$url='
http://172.20.1.33:8983/solr/select?wt=phpsq='.urlencode($q).'sort='.urlencode($sort).'%20'.$direction.'fq=(EMAIL_HEADER_FROM:'.$wo.'
OR EMAIL_HEADER_TO:'.$wo.')';

if(isset($_GET['s'])) $url.=start=.$_GET['s'];
$serializedResult = file_get_contents($url);
$results = unserialize($serializedResult);


Ranking position in solr

2010-07-12 Thread Chamnap Chhorn
I wonder there is a proper way to fulfill this requirement. A book has
several keyphrases. Each keyphrase consists from one word to 3 words. The
author could either buy keyphrase position or don't buy position. Note: each
author could buy more than 1 keyphrase. The keyphrase search must be exact
and case sensitive.

For example: Book A, keyphrases: agile, web, development Book B, keyphrases:
css, html, web

Let's say Author of Book A buys search result position 1 with keyphrase
web, so his book should be in the first position. His book should be
listed before the Book B.

Anyone has any suggestions on how to implement this in solr?

-- 
Chhorn Chamnap
http://chamnapchhorn.blogspot.com/


Re: Filter multivalue fields from search result

2010-07-12 Thread Alex J. G. Burzyński

Hi Chantal,

The paging problem I've asked about is that having course-event pairs 
and specifying rows limits the number of pairs returned not the courses


+---+--+++
| id-id | name | town   | date   |
+---+--+++
| 1-1   | Microsoft Excel  | London | 2010-08-20 |
| 1-2   | Microsoft Excel  | Glasgow| 2010-08-24 |
| 1-3   | Microsoft Excel  | Leeds  | 2010-08-28 |
| 2-1   | Microsoft Word   | Aberdeen   | 2010-08-21 |
| 2-2   | Microsoft Word   | Reading| 2010-08-25 |
| 2-3   | Microsoft Word   | London | 2010-08-29 |
| 3-1   | Microsoft Powerpoint | Birmingham | 2010-08-22 |
| 3-2   | Microsoft Powerpoint | Leeds  | 2010-08-26 |
| 3-3   | Microsoft Powerpoint | Leeds  | 2010-08-30 |
+---+--+++


And from UI point of view I'm returning less courses then events - 
that's why I've asked about paging.


The search for q=name:Microsoft town:Leeds with rows=2 should return:
1-3  3-2  3-3

But 3-3 will be obviously on page 2.

I hope that it makes my questions more clear.

Thanks,
Alex


On 2010-07-12 10:26, Chantal Ackermann wrote:

Hi Alex,

I think you have to explain the complete use case. Paging is done by
specifying the parameter start (and rows if you want to have more or
less than 10 hits per page). For each page you need of course a new
query, but the queries differ only in the parameter value start (first
page start=0, second page start=10 etc. if rows=10). The other
parameters remain the same.

You should also have a look at facets. They might help you to get a list
of the values of your multi valued fields that you can display in the
UI, allowing the user to drill down the results further.

Chantal

On Mon, 2010-07-12 at 10:26 +0200, Alex J. G. Burzyński wrote:
   

Hi,

So if those are separate documents how should I handle paging? Two
separate queries?
First to return all matching courses-events pairs, and second one to get
courses for given page?

Is this common design described in details somewhere?

Thanks,
Alex

On 2010-07-09 01:50, Lance Norskog wrote:
 

Yes, denormalizing the index into separate (name,town) pairs is the
common design for this problem.

2010/7/8 Alex J. G. Burzyńskimailing-s...@ajgb.net:

   

Hi,

Is it possible to remove from search results the multivalued fields that
don't pass the search criteria?

My schema is defined as:

!-- course_id --
field name=id type=string indexed=true stored=true
required=true /
!-- course_name --
field name=name type=string indexed=true stored=true/
!-- events.event_town --
field name=town type=string indexed=true stored=true
multiValued=true/
!-- events.event_date --
field name=date type=tdate indexed=true stored=true
multiValued=true/

And example docs are:

++--+++
| id | name | town   | date   |
++--+++
| 1  | Microsoft Excel  | London | 2010-08-20 |
||  | Glasgow| 2010-08-24 |
||  | Leeds  | 2010-08-28 |
| 2  | Microsoft Word   | Aberdeen   | 2010-08-21 |
||  | Reading| 2010-08-25 |
||  | London | 2010-08-29 |
| 2  | Microsoft Powerpoint | Birmingham | 2010-08-22 |
||  | Leeds  | 2010-08-26 |
++--+++

so the query for q=name:Microsoft town:Leeds returns docs 1   3.

How would I remove London/Glasgow from doc 1 and Birmingham from doc 3?

Or is it that I should create separate doc for each name-event?

Thanks,
Alex


 



   


   


Query: URl too long

2010-07-12 Thread Frederico Azeiteiro
Hi,

 

I need to perform a search using a list of values (about 2000).

 

I'm using SolrNET QueryInList function that creates the searchstring
like:

 

fieldName: value1 OR fieldName: value2 OR fieldName: value3... (2000
values)

 

This method created a string with about 100 000 chars and the Web
Request fails with URI too long (C#).

 

I'm trying to update an old Lucene app that performs this kind of
searches. 

How can I achieve this with Solr?

 

What are my options here?

 

Thank you,

Frederico



Re: Query: URl too long

2010-07-12 Thread Chantal Ackermann
Hi Frederico,

not sure about solrNET, but changing the http method from GET to POST
worked for me (using SolrJ).

Chantal

On Mon, 2010-07-12 at 12:18 +0200, Frederico Azeiteiro wrote:
 Hi,
 
  
 
 I need to perform a search using a list of values (about 2000).
 
  
 
 I'm using SolrNET QueryInList function that creates the searchstring
 like:
 
  
 
 fieldName: value1 OR fieldName: value2 OR fieldName: value3... (2000
 values)
 
  
 
 This method created a string with about 100 000 chars and the Web
 Request fails with URI too long (C#).
 
  
 
 I'm trying to update an old Lucene app that performs this kind of
 searches. 
 
 How can I achieve this with Solr?
 
  
 
 What are my options here?
 
  
 
 Thank you,
 
 Frederico





Re: Query: URl too long

2010-07-12 Thread Ahmet Arslan
 I'm using SolrNET QueryInList function that creates the
 searchstring
 like:
 
  
 
 fieldName: value1 OR fieldName: value2 OR fieldName:
 value3... (2000
 values)
 
  
 
 This method created a string with about 100 000 chars and
 the Web
 Request fails with URI too long (C#).

Not sure about SolrNet but you can use POST method instead of GET or configure 
maxHttpHeaderSize setting of your servlet container. For example for tomcat
http://wiki.apache.org/solr/SolrTomcat#Enabling_Longer_Query_Requests


  


Re: Ranking position in solr

2010-07-12 Thread Ahmet Arslan
 I wonder there is a proper way to
 fulfill this requirement. A book has
 several keyphrases. Each keyphrase consists from one word
 to 3 words. The
 author could either buy keyphrase position or don't buy
 position. Note: each
 author could buy more than 1 keyphrase. The keyphrase
 search must be exact
 and case sensitive.
 
 For example: Book A, keyphrases: agile, web, development
 Book B, keyphrases:
 css, html, web
 
 Let's say Author of Book A buys search result position 1
 with keyphrase
 web, so his book should be in the first position. His
 book should be
 listed before the Book B.
 
 Anyone has any suggestions on how to implement this in
 solr?

http://wiki.apache.org/solr/QueryElevationComponent - which is used to 
elevate results based on editorial decisions - may help.


  


Re: Query: URl too long

2010-07-12 Thread Jon Poulton
Hi there,
We had a similar issue. It's an easy fix, simply change the request type from 
GET to POST. 

Jon

On 12 Jul 2010, at 11:18, Frederico Azeiteiro wrote:

 Hi,
 
 
 
 I need to perform a search using a list of values (about 2000).
 
 
 
 I'm using SolrNET QueryInList function that creates the searchstring
 like:
 
 
 
 fieldName: value1 OR fieldName: value2 OR fieldName: value3... (2000
 values)
 
 
 
 This method created a string with about 100 000 chars and the Web
 Request fails with URI too long (C#).
 
 
 
 I'm trying to update an old Lucene app that performs this kind of
 searches. 
 
 How can I achieve this with Solr?
 
 
 
 What are my options here?
 
 
 
 Thank you,
 
 Frederico
 



Re: Filter multivalue fields from search result

2010-07-12 Thread Chantal Ackermann
Hi Alex,

feedback inline:

On Mon, 2010-07-12 at 12:03 +0200, Alex J. G. Burzyński wrote:
 Hi Chantal,
 
 The paging problem I've asked about is that having course-event pairs 
 and specifying rows limits the number of pairs returned not the courses
 
 +---+--+++
 | id-id | name | town   | date   |
 +---+--+++
 | 1-1   | Microsoft Excel  | London | 2010-08-20 |
 | 1-2   | Microsoft Excel  | Glasgow| 2010-08-24 |
 | 1-3   | Microsoft Excel  | Leeds  | 2010-08-28 |
 | 2-1   | Microsoft Word   | Aberdeen   | 2010-08-21 |
 | 2-2   | Microsoft Word   | Reading| 2010-08-25 |
 | 2-3   | Microsoft Word   | London | 2010-08-29 |
 | 3-1   | Microsoft Powerpoint | Birmingham | 2010-08-22 |
 | 3-2   | Microsoft Powerpoint | Leeds  | 2010-08-26 |
 | 3-3   | Microsoft Powerpoint | Leeds  | 2010-08-30 |
 +---+--+++
 
 
 And from UI point of view I'm returning less courses then events - 
 that's why I've asked about paging.
 
 The search for q=name:Microsoft town:Leeds with rows=2 should return:
 1-3  3-2  3-3

If you want to list all available courses in a query and also display
how often and where they take place, then query for name (in your
table) and facet on town per name. This might require the use of the
facet.query parameter.

Otherwise use your query from above and group afterwards in the client
or your server backend. Of course, you should increase the rows value.
But I see your point with paging, so facetting might be a better option.
Or maybe field collapsing is what you need (there is a patch - search
for solr field collapsing and you should find a lot about it). (I
haven't tried that, however, and it's just a guess.)

Chantal

 
 But 3-3 will be obviously on page 2.
 
 I hope that it makes my questions more clear.
 
 Thanks,
 Alex
 




indexing with pdf files problem

2010-07-12 Thread satya swaroop
hi all,
  i am working with solr on tomcat. the indexing is good for xml files
but when i send the docs or html files or pdf's through curl i get the error
as lazy error. can u telll me the way. the output is as follows when i send
a pdf file  i am working in ubuntu. solr home is /opt/example
  tomcat is /opt/tomcat6


htmlheadtitleApache Tomcat/6.0.26 - Error report/titlestyle!--H1
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
H2
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
H3
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
BODY
{font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
P
{font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
{color : black;}A.name {color : black;}HR {color : #525D76;}--/style
/headbodyh1HTTP Status 500 - lazy loading error

org.apache.solr.common.SolrException: lazy loading error
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.solr.common.SolrException:
java.lang.NullPointerException
at
org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:76)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
... 16 more
Caused by: java.lang.NullPointerException
at
org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:73)
at
org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90)
at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:99)
at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:84)
at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:61)
at
org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:74)
... 17 more
/h1HR size=1 noshade=noshadepbtype/b Status
report/ppbmessage/b ulazy loading error

org.apache.solr.common.SolrException: lazy loading error
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at

RE: Query: URl too long

2010-07-12 Thread Frederico Azeiteiro
Hi,

A closer look shows that the problem is not on the request but on the
creation of the URI object.

The exception is sent when trying to access the URI object inside the
URIbuilder.

Trying to google it but without luck...


-Original Message-
From: Jon Poulton [mailto:jon.poul...@vyre.com] 
Sent: segunda-feira, 12 de Julho de 2010 11:56
To: solr-user@lucene.apache.org
Subject: Re: Query: URl too long

Hi there,
We had a similar issue. It's an easy fix, simply change the request type
from GET to POST. 

Jon

On 12 Jul 2010, at 11:18, Frederico Azeiteiro wrote:

 Hi,
 
 
 
 I need to perform a search using a list of values (about 2000).
 
 
 
 I'm using SolrNET QueryInList function that creates the searchstring
 like:
 
 
 
 fieldName: value1 OR fieldName: value2 OR fieldName: value3... (2000
 values)
 
 
 
 This method created a string with about 100 000 chars and the Web
 Request fails with URI too long (C#).
 
 
 
 I'm trying to update an old Lucene app that performs this kind of
 searches. 
 
 How can I achieve this with Solr?
 
 
 
 What are my options here?
 
 
 
 Thank you,
 
 Frederico
 



RE: Query: URl too long

2010-07-12 Thread Frederico Azeiteiro
Yes, i guess i can't create an URI object that long.

Can someone remember other options?
I'm thinking about options avoiding the http request... 

My best try is using lucene again but keep the solr for indexing.

Do you thing this is a good aproach? 



-Original Message-
From: Frederico Azeiteiro [mailto:frederico.azeite...@cision.com] 
Sent: segunda-feira, 12 de Julho de 2010 12:10
To: solr-user@lucene.apache.org
Subject: RE: Query: URl too long

Hi,

A closer look shows that the problem is not on the request but on the
creation of the URI object.

The exception is sent when trying to access the URI object inside the
URIbuilder.

Trying to google it but without luck...


-Original Message-
From: Jon Poulton [mailto:jon.poul...@vyre.com] 
Sent: segunda-feira, 12 de Julho de 2010 11:56
To: solr-user@lucene.apache.org
Subject: Re: Query: URl too long

Hi there,
We had a similar issue. It's an easy fix, simply change the request type
from GET to POST. 

Jon

On 12 Jul 2010, at 11:18, Frederico Azeiteiro wrote:

 Hi,
 
 
 
 I need to perform a search using a list of values (about 2000).
 
 
 
 I'm using SolrNET QueryInList function that creates the searchstring
 like:
 
 
 
 fieldName: value1 OR fieldName: value2 OR fieldName: value3... (2000
 values)
 
 
 
 This method created a string with about 100 000 chars and the Web
 Request fails with URI too long (C#).
 
 
 
 I'm trying to update an old Lucene app that performs this kind of
 searches. 
 
 How can I achieve this with Solr?
 
 
 
 What are my options here?
 
 
 
 Thank you,
 
 Frederico
 



Using stored terms for faceting

2010-07-12 Thread Peter Karich
Hi,

is it possible to use the stored terms of a field for a faceted search?

I mean, I don't want to get the term frequency per document as it is
shown here:
http://wiki.apache.org/solr/TermVectorComponentExampleOptions

I want to get the frequency of the term of my special search and show
only the 10 most frequent terms and all the nice things that I can do
for faceting.

At the moment I am calculating the terms for every document and index
them into a separate multivalued field where I can then easily apply
faceting.
But is there a better way?

Regards,
Peter.



RE: Query: URl too long

2010-07-12 Thread Ahmet Arslan
 Yes, i guess i can't create an URI
 object that long.
 
 Can someone remember other options?

You can shorten your String by not repeating OR and fieldName. e.g.

fieldName: value1 OR fieldName: value2 OR fieldName: value3...

q=value1 value2 value3q.op=ORdf=fieldName

By the way how are you generating these value1 value2 etc? If the above does 
not solve your problem you can embed this logic into a custom SearchHandler.


  


Re: Query: URl too long

2010-07-12 Thread kenf_nc

Frederico, 
You should also pose your question on the SolrNet forum,
http://groups.google.com/group/solrnet?hl=en
Switching from GET to POST isn't a Solr issue, but a SolrNet issue.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-URl-too-long-tp959990p960208.html
Sent from the Solr - User mailing list archive at Nabble.com.


Problem during indexing

2010-07-12 Thread sarfaraz masood
i am trying to add 20 million documents to my index from another index that 
contains these documents(cant help this architecture..its something that i will 
have to follow) Now the problems i am facing are the following :

1) Too many files open error.. its at the code which is adding documents to my 
index.

    IndexWriter w = new IndexWriter();
IndexWriter w = new IndexWriter(index , analyzer, 
IndexWriter.MaxFieldLength.UNLIMITED );
            w.setMergeFactor(1000);
            w.setMaxBufferedDocs(1000);
            w.setMaxMergeDocs(60);

            for(i =0;ireader1.numDocs();i++)
            {
                System.out.println(i);
   addDoc(w, reader1.document(i).getField( url ).stringValue( 
),reader1.document(i).getField( content ).stringValue().replace('.',' 
').replace('-',' '));    
                    
            }
            w.optimize();
            w.close();
            reader1.close();

Due to the mergefactor parameters that i have set, around 1300 .cfs files are 
now opened in index, but there is only one fdt file. These files seem to be the 
reason for the this error. 

Are these files not closed?? do i have to do IndexWriter.commit() in the loop 
to close these open files?? 




RE: Query: URl too long

2010-07-12 Thread Frederico Azeiteiro
Not an option because the query has other fields to query also.
They are generated throw a list choices (that could go to 5000's string
with 7 char each..).

I don't know is this could be considered off-topic (please advise...)
but:

i'm doing some test with lucene (Lucene.Net 2.9.2) but the results with
date range queries are not similar (0 hits on Lucene, 900 with Solr).
Does lucene supports date range queries?

Thank you for your help.

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: segunda-feira, 12 de Julho de 2010 13:16
To: solr-user@lucene.apache.org
Subject: RE: Query: URl too long

 Yes, i guess i can't create an URI
 object that long.
 
 Can someone remember other options?

You can shorten your String by not repeating OR and fieldName. e.g.

fieldName: value1 OR fieldName: value2 OR fieldName: value3...

q=value1 value2 value3q.op=ORdf=fieldName

By the way how are you generating these value1 value2 etc? If the above
does not solve your problem you can embed this logic into a custom
SearchHandler.


  


ShingleFilter failing with more words than indexed phrase

2010-07-12 Thread Ethan Collins
I am using Solr 1.4.1 (lucene 2.9.3) on windows and am trying to
understand ShingleFilter. I wrote the following code and find that if
I provide more words than the actual phrase index in the field, then
the search on that field fails (no score found with debugQuery=true).

Here is an example to reproduce, with field names:
Id: 1
title_1: Nina Simone
title_2: I put a spell on you

Query (dismax) with: “Nina Simone I put”  - Fails i.e. no score shown
from title_1 search (using debugQuery)
“Nina Simone” - Success

I checked the index with luke and it showed correct indexes. I used
Solr’s Field Analysis with the ‘shingle’ field and tried “Nina Simone
I put” and it succeeds, as I would expect as correct behavior. It’s
only during the query that no score is provided. I also checked
‘parsedquery’ and it shows disjunctionMaxQuery issuing the string
“Nina_Simone Simone_I I_put” to the title_1 field.

title_1 and title_2 fields are of type ‘shingle’, defined as:

fieldType name=shingle class=solr.TextField
positionIncrementGap=100 indexed=true stored=true
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ShingleFilterFactory
maxShingleSize=2 outputUnigrams=false/
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ShingleFilterFactory
maxShingleSize=2 outputUnigrams=false/
/analyzer
/fieldType

Note that I also have a catchall field which is text. I have qf set
to: 'id catchall' and pf set to: 'title_1 title_2'

Am I missing something here in my expectation or is there a bug somewhere?

-Ethan


Re: Query: URl too long

2010-07-12 Thread Mauricio Scheffer
Frederico,
This is indeed a SolrNet issue. You can switch to POST in queries by
implementing a ISolrConnection decorator. In the Get() method you'd build a
POST request instead of the standard GET.
Please use the SolrNet forum for further questions about SolrNet.

Cheers,
Mauricio

On Mon, Jul 12, 2010 at 9:33 AM, kenf_nc ken.fos...@realestate.com wrote:


 Frederico,
 You should also pose your question on the SolrNet forum,
 http://groups.google.com/group/solrnet?hl=en
 Switching from GET to POST isn't a Solr issue, but a SolrNet issue.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Query-URl-too-long-tp959990p960208.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Copy Date Field and/or Using DateMathParser in DataImportHandler

2010-07-12 Thread Chantal Ackermann
Hi and back again,

to create a copy of my date field that holds only the date with no
time (=0:00h time).

The question is:
Do I have to create the new date (without time) in my own transformer
(using a Calendar object) or is there some convenient way to use the
DateMathParser during indexing time when using DataImportHandler?


I checked out:
https://issues.apache.org/jira/browse/SOLR-469
which looks like the original Jira issue tracking the DataImportHandler
development.
And http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer
and the thread:
[solr-user] Sorting dates with reduced precision
http://search.lucidimagination.com/search/document/f2313ffae081bf79/sorting_dates_with_reduced_precision#46566037750d7b5

In the latter, it's said:

Append /DAY to the date value you index, for example
1995-12-31T23:59:59Z/DAY will yield 1995-12-31
[...]
Thanks, this happens at indexing time?
Yes


Well, I tried the most simple idea that came to my mind:
copyField source=start_date/DAY dest=start_day /
but this does not work. Certainly - the slash is not a reserved
character and SOLR expects a field called start_date/DAY, in this
case.

Is it possible to use the DateMathParser syntax to create that new field
from the existing date field or the sourcing date string?

In the Jira issue listed above I found this:

A new interface called Evaluator has been added which makes it possible
to plugin new expression evaluators (for resolving variable names)
Using the same Evaluator interface, a few new evaluators have been added
formatDate - use as ${dataimporter.functions.formatDate('NOW',-MM-dd
HH:mm)}, this will format NOW as per the given format and return a
string which can be used in queries or urls. It supports the full
DateMathParser syntax. You can also format fields e.g.
${dataimporter.functions.formatDate(A.purchase_date,dd-MM-)}

This is from 2008, is this still true for the current DataImportHandler?

Just looking for the best method to solve this. Any insights very much
appresciated!

Thanks,
Chantal





RE: Query: URl too long

2010-07-12 Thread Frederico Azeiteiro
Ok, I posted on SOLRNet forum asking how can I reduce the URL string
using POST method.

But I'm giving a try to SOLRJ. Think should be the right way to do it
maybe.


-Original Message-
From: Mauricio Scheffer [mailto:mauricioschef...@gmail.com] 
Sent: segunda-feira, 12 de Julho de 2010 14:31
To: solr-user@lucene.apache.org
Subject: Re: Query: URl too long

Frederico,
This is indeed a SolrNet issue. You can switch to POST in queries by
implementing a ISolrConnection decorator. In the Get() method you'd
build a
POST request instead of the standard GET.
Please use the SolrNet forum for further questions about SolrNet.

Cheers,
Mauricio

On Mon, Jul 12, 2010 at 9:33 AM, kenf_nc ken.fos...@realestate.com
wrote:


 Frederico,
 You should also pose your question on the SolrNet forum,
 http://groups.google.com/group/solrnet?hl=en
 Switching from GET to POST isn't a Solr issue, but a SolrNet issue.
 --
 View this message in context:

http://lucene.472066.n3.nabble.com/Query-URl-too-long-tp959990p960208.ht
ml
 Sent from the Solr - User mailing list archive at Nabble.com.



AW: Copy Date Field and/or Using DateMathParser in DataImportHandler

2010-07-12 Thread Bastian Spitzer
Hi Chantal,

where is your Solr integrated? Where is this Date comin from? I personaly 
wouldnt use SOLR for such
conversions, ist nice if there are such built-in features, but sticking to 
java/.net or whatever
generates your documents seems much more comfortable. In Java f.e. this is just 
2 lines of code
and you are done.

cheers.

-Ursprüngliche Nachricht-
Von: Chantal Ackermann [mailto:chantal.ackerm...@btelligent.de] 
Gesendet: Montag, 12. Juli 2010 16:05
An: solr-user@lucene.apache.org
Betreff: Copy Date Field and/or Using DateMathParser in DataImportHandler

Hi and back again,

to create a copy of my date field that holds only the date with no time 
(=0:00h time).

The question is:
Do I have to create the new date (without time) in my own transformer (using a 
Calendar object) or is there some convenient way to use the DateMathParser 
during indexing time when using DataImportHandler?


I checked out:
https://issues.apache.org/jira/browse/SOLR-469
which looks like the original Jira issue tracking the DataImportHandler 
development.
And http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer
and the thread:
[solr-user] Sorting dates with reduced precision
http://search.lucidimagination.com/search/document/f2313ffae081bf79/sorting_dates_with_reduced_precision#46566037750d7b5

In the latter, it's said:

Append /DAY to the date value you index, for example 
1995-12-31T23:59:59Z/DAY will yield 1995-12-31
[...]
Thanks, this happens at indexing time?
Yes


Well, I tried the most simple idea that came to my mind:
copyField source=start_date/DAY dest=start_day / but this does 
not work. Certainly - the slash is not a reserved character and SOLR expects a 
field called start_date/DAY, in this case.

Is it possible to use the DateMathParser syntax to create that new field from 
the existing date field or the sourcing date string?

In the Jira issue listed above I found this:

A new interface called Evaluator has been added which makes it possible to 
plugin new expression evaluators (for resolving variable names) Using the same 
Evaluator interface, a few new evaluators have been added formatDate - use as 
${dataimporter.functions.formatDate('NOW',-MM-dd
HH:mm)}, this will format NOW as per the given format and return a string which 
can be used in queries or urls. It supports the full DateMathParser syntax. You 
can also format fields e.g.
${dataimporter.functions.formatDate(A.purchase_date,dd-MM-)}

This is from 2008, is this still true for the current DataImportHandler?

Just looking for the best method to solve this. Any insights very much 
appresciated!

Thanks,
Chantal





AW: Copy Date Field and/or Using DateMathParser in DataImportHandler

2010-07-12 Thread Bastian Spitzer
Hm seems i didnt read the 1st part of your Question:/ Forget what i just wrote. 
:) 

-Ursprüngliche Nachricht-
Von: Bastian Spitzer [mailto:bspit...@magix.net] 
Gesendet: Montag, 12. Juli 2010 16:41
An: solr-user@lucene.apache.org
Betreff: AW: Copy Date Field and/or Using DateMathParser in DataImportHandler

Hi Chantal,

where is your Solr integrated? Where is this Date comin from? I personaly 
wouldnt use SOLR for such conversions, ist nice if there are such built-in 
features, but sticking to java/.net or whatever generates your documents seems 
much more comfortable. In Java f.e. this is just 2 lines of code and you are 
done.

cheers.

-Ursprüngliche Nachricht-
Von: Chantal Ackermann [mailto:chantal.ackerm...@btelligent.de]
Gesendet: Montag, 12. Juli 2010 16:05
An: solr-user@lucene.apache.org
Betreff: Copy Date Field and/or Using DateMathParser in DataImportHandler

Hi and back again,

to create a copy of my date field that holds only the date with no time 
(=0:00h time).

The question is:
Do I have to create the new date (without time) in my own transformer (using a 
Calendar object) or is there some convenient way to use the DateMathParser 
during indexing time when using DataImportHandler?


I checked out:
https://issues.apache.org/jira/browse/SOLR-469
which looks like the original Jira issue tracking the DataImportHandler 
development.
And http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer
and the thread:
[solr-user] Sorting dates with reduced precision
http://search.lucidimagination.com/search/document/f2313ffae081bf79/sorting_dates_with_reduced_precision#46566037750d7b5

In the latter, it's said:

Append /DAY to the date value you index, for example 
1995-12-31T23:59:59Z/DAY will yield 1995-12-31
[...]
Thanks, this happens at indexing time?
Yes


Well, I tried the most simple idea that came to my mind:
copyField source=start_date/DAY dest=start_day / but this does 
not work. Certainly - the slash is not a reserved character and SOLR expects a 
field called start_date/DAY, in this case.

Is it possible to use the DateMathParser syntax to create that new field from 
the existing date field or the sourcing date string?

In the Jira issue listed above I found this:

A new interface called Evaluator has been added which makes it possible to 
plugin new expression evaluators (for resolving variable names) Using the same 
Evaluator interface, a few new evaluators have been added formatDate - use as 
${dataimporter.functions.formatDate('NOW',-MM-dd
HH:mm)}, this will format NOW as per the given format and return a string which 
can be used in queries or urls. It supports the full DateMathParser syntax. You 
can also format fields e.g.
${dataimporter.functions.formatDate(A.purchase_date,dd-MM-)}

This is from 2008, is this still true for the current DataImportHandler?

Just looking for the best method to solve this. Any insights very much 
appresciated!

Thanks,
Chantal





CommonsHttpSolrServer add document hangs

2010-07-12 Thread Max Lynch
Hey guys,
I'm using Solr 1.4.1 and I've been having some problems lately with code
that adds documents through a CommonsHttpSolrServer.  It seems that randomly
the call to theserver.add() will hang.  I am currently running my code in a
single thread, but I noticed this would happen in multi threaded code as
well.  The jar version of commons-httpclient is 3.1.

I got a thread dump of the process, and one thread seems to be waiting on
the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager as
shown below.  All other threads are in a RUNNABLE state (besides the
Finalizer daemon).

 [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM (16.3-b01
mixed mode):
 [java]
 [java] MultiThreadedHttpConnectionManager cleanup daemon prio=10
tid=0x7f441051c800 nid=0x527c in Object.wait() [0x7f4417e2f000]
 [java]java.lang.Thread.State: WAITING (on object monitor)
 [java] at java.lang.Object.wait(Native Method)
 [java] - waiting on 0x7f443ae5b290 (a
java.lang.ref.ReferenceQueue$Lock)
 [java] at
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
 [java] - locked 0x7f443ae5b290 (a
java.lang.ref.ReferenceQueue$Lock)
 [java] at
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
 [java] at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122)

Any ideas?

Thanks.


RE: CommonsHttpSolrServer add document hangs

2010-07-12 Thread Robert Petersen
Maybe solr is busy doing a commit or optimize?

-Original Message-
From: Max Lynch [mailto:ihas...@gmail.com] 
Sent: Monday, July 12, 2010 9:59 AM
To: solr-user@lucene.apache.org
Subject: CommonsHttpSolrServer add document hangs

Hey guys,
I'm using Solr 1.4.1 and I've been having some problems lately with code
that adds documents through a CommonsHttpSolrServer.  It seems that
randomly
the call to theserver.add() will hang.  I am currently running my code
in a
single thread, but I noticed this would happen in multi threaded code as
well.  The jar version of commons-httpclient is 3.1.

I got a thread dump of the process, and one thread seems to be waiting
on
the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager as
shown below.  All other threads are in a RUNNABLE state (besides the
Finalizer daemon).

 [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM (16.3-b01
mixed mode):
 [java]
 [java] MultiThreadedHttpConnectionManager cleanup daemon prio=10
tid=0x7f441051c800 nid=0x527c in Object.wait() [0x7f4417e2f000]
 [java]java.lang.Thread.State: WAITING (on object monitor)
 [java] at java.lang.Object.wait(Native Method)
 [java] - waiting on 0x7f443ae5b290 (a
java.lang.ref.ReferenceQueue$Lock)
 [java] at
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
 [java] - locked 0x7f443ae5b290 (a
java.lang.ref.ReferenceQueue$Lock)
 [java] at
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
 [java] at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$Referen
ceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122)

Any ideas?

Thanks.


Re: Database connections during data import

2010-07-12 Thread Gora Mohanty
On Sun, 11 Jul 2010 07:22:51 -0700 (PDT)
osocurious2 ken.fos...@realestate.com wrote:

 
 Gora,
 Our environment, currently under development, is very nearly the
 exact same thing as yours. My DB is currently only about 10GB,
 but likely to grow.
[...]

Thanks for your response. It is good to hear from people dealing
with similar issues.

 I'm still trying out different architectures to deal with this.
 I've tried doing a Bulk Copy from the DB to some flat files and
 importing from there. File handles seem to be more stable than
 database connections. But it brings it's own issues to the party.

Yes, we tried that too, but creating the XMLs turned out to be as
time-consuming. We ended up using multiple cores on several Solr
instances. Please see some further details in a separate response
to Willem.

 I'm also currently looking at using queuing (either MSMQ or
 Amazons Simple Queue service) so the database piece isn't used
 for 20 hours, but gets it's part over fairly quickly. I haven't
 done this using DataImportHandler however, not sure yet how, so
 I'm writing my own Import manager.
[...]

We are considering using Amazon, but at this point I believe that
we will have the indexing time down to our requirements through
multiple cores on multiple Solr instances.

The DataImportHandler docs are pretty good, but I will try to get
the time to write up an example on using transformers, etc., which
turned out to be a little tricky. Or, at least it took me some
trial-and-error beyond the available documentation.

 As to the GData handler and response writer. I would be very
 interested in OData versions, which wouldn't be too much of a
 stretch from GData to deal with. Would you be moving in that
 direction later? Or if you put your contrib out there could
 someone else (maybe me if time allows) be able to take it there?
 That would be a great edition for our work in a few months.

Yes, we would be happy to do that, though I do need to look at how
closely our solution meets the GData specifications. Also, at the
moment, we have only implemented the GET part, i.e., search results
can only be retrieved through the GData interface.

 Good luck, and I'd love to keep in touch about your solutions,
 I'm sure I could get some great ideas from them for our own work.
[...]

Likewise, I am sure that we can learn much from you guys. Willem
and you have already given me some ideas. We should maybe start
getting use cases up on the Solr Wiki, or at least on a blog
somewhere.

Regards,
Gora


Re: Database connections during data import

2010-07-12 Thread Gora Mohanty
On Mon, 12 Jul 2010 09:20:05 +0200
Willem Van Riet willem.vanr...@sa.24.com wrote:

 Hi Gora
 
 Also indexing 4mil + records from a MS-SQL database - index size
 is about 25Gb.

Thanks for some great pointers. More detailed responses below.

 I managed to solve both the performance and recovery issue by
 segmenting the indexing process along with the
 CachedSqlEntityProcessor. 

 Basically I populate a temp table with a subset of primary keys
 (I use a modulus of the productId to achieve this) and inner join
 from that table on both the primary query and all the child
 queries.
[...]

Thanks for that pointer. I had read about the
CachedSqlEntityProcessor, but my eyes must have been glazing over
at that point. That sounds like a great possibility, especially
your point on breaking up the data into chunks small enough to fit
into physical RAM.

We came up with something of a brute-force solution. We discovered
that indexing on each of several cores on a single multi-core Solr
instance was comparably fast to indexing on separate Solr
instances. So, we have broken up our hardware into 15 cores on five
Solr instances (three/instance seems to peg the CPU on each Solr
server at ~80%), and two MS-SQL database servers, and seem to be
down to about 6 hours for indexing (scaling almost exactly by the
number of cores). Tomorrow, we plan to bring online another five
Solr instances, and a third database server, in order to halve that
time. Beyond that, we are probably going to something like Amazon.

 The 4GB (actually 3.2GB) limit only applies to the 32bit version
 of Windows/SQL Server. That being said SQL server is not much of
 a RAM hog. After its basic querying needs memory is only used to
 cache indexes and query plans. SQL is pretty happy with 4GB but
 if you can upgrade the OS another 2GB for the disk cache will
 help a lot. 
[...]

Yes, it turns out that I was (somewhat) unwarrantedly bad-mouthing
Microsoft. The database server stands up quite well in terms of CPU
usage, though 3-4 Solr DIH instances hitting the DB seem to get up
to the RAM limit almost at once. Unfortunately, upgrading the OS is
not an option at the moment, but the database server is hardly the
bottle-neck now.

 PS: You are using the JTDS driver? (http://jtds.sourceforge.net/)
 I find it faster and more stable than the MS one.

Oh, saw that driver, but did not know that it was better than the
MS one. Thanks for the tip.

Regards,
Gora
Gora


/select handler statistics

2010-07-12 Thread Vladimir Sutskever
Hi All,

I am looking at the stats.jsp page in the SOLR admin panel.

I do not see statistics for the /select request handler.

I want to know total # of search requests  + avg time of request ... etc

Am I overlooking something?



Kind regards,

Vladimir Sutskever
Investment Bank - Technology
JPMorgan Chase, Inc.



This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  

Indexing large amount of data

2010-07-12 Thread sarfaraz masood
i have a large amount of data (120 GB) to be indexed in the index. Hence i want 
to improve the performance of indexing this data. I went through the 
documentation given on the lucene website which mentioned various ways by which 
the performance can be improved.

i am working on debian linux with amd64. hence the file size supported is very 
large. java version is 1.6

i tried many points mentioned in that documentations but got unusual results.

1) Reuse field  document objects to reduce the GC overhead using the 
field.setValue() method.. By doing this, instead of speeding up, the indexing 
speed reduced drastically. i know this is unusual but thats what happened.

2) Tuning parameters by  setMergeFactor(), setMaxBufferedDocs(). 
now the default value for both is 10.. i increased the value to 1000.. by doing 
so the no of .CSF file in the index folder increased many folds.. and i got 
java.io.IOException : Too Many Files Open. 
    IF i choose the default value 10 for both the parameters then this error is 
avoided but then size of .fdt file in index becomes really high.

so where am i going wrong ?? how to overcome these problems..how to speed up my 
indexing process..




RE: /select handler statistics

2010-07-12 Thread Markus Jelsma
Hi,

 

I think you're looking for the statistics for the standard request handler.

 

Cheers,
 
-Original message-
From: Vladimir Sutskever vladimir.sutske...@jpmorgan.com
Sent: Mon 12-07-2010 19:44
To: solr-user@lucene.apache.org; 
Subject: /select handler statistics

Hi All,

I am looking at the stats.jsp page in the SOLR admin panel.

I do not see statistics for the /select request handler.

I want to know total # of search requests  + avg time of request ... etc

Am I overlooking something?



Kind regards,

Vladimir Sutskever
Investment Bank - Technology
JPMorgan Chase, Inc.



This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.   

RE: /select handler statistics

2010-07-12 Thread Vladimir Sutskever
Yup that did it.

Thank you Markus

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@buyways.nl] 
Sent: Monday, July 12, 2010 2:30 PM
To: solr-user@lucene.apache.org
Subject: RE: /select handler statistics

Hi,

 

I think you're looking for the statistics for the standard request handler.

 

Cheers,
 
-Original message-
From: Vladimir Sutskever vladimir.sutske...@jpmorgan.com
Sent: Mon 12-07-2010 19:44
To: solr-user@lucene.apache.org; 
Subject: /select handler statistics

Hi All,

I am looking at the stats.jsp page in the SOLR admin panel.

I do not see statistics for the /select request handler.

I want to know total # of search requests  + avg time of request ... etc

Am I overlooking something?



Kind regards,

Vladimir Sutskever
Investment Bank - Technology
JPMorgan Chase, Inc.



This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.   


Re: CommonsHttpSolrServer add document hangs

2010-07-12 Thread Max Lynch
Thanks Robert,

My script did start going again, but it was waiting for about half an hour
which seems a bit excessive to me.  Is there some tuning I can do on the
solr end to optimize for my use case, which is very heavy on commits and
very light on searches (I do most of my searches on the raw Lucene index in
the background)?

Thanks.

On Mon, Jul 12, 2010 at 12:06 PM, Robert Petersen rober...@buy.com wrote:

 Maybe solr is busy doing a commit or optimize?

 -Original Message-
 From: Max Lynch [mailto:ihas...@gmail.com]
 Sent: Monday, July 12, 2010 9:59 AM
 To: solr-user@lucene.apache.org
 Subject: CommonsHttpSolrServer add document hangs

 Hey guys,
 I'm using Solr 1.4.1 and I've been having some problems lately with code
 that adds documents through a CommonsHttpSolrServer.  It seems that
 randomly
 the call to theserver.add() will hang.  I am currently running my code
 in a
 single thread, but I noticed this would happen in multi threaded code as
 well.  The jar version of commons-httpclient is 3.1.

 I got a thread dump of the process, and one thread seems to be waiting
 on
 the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager as
 shown below.  All other threads are in a RUNNABLE state (besides the
 Finalizer daemon).

 [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM (16.3-b01
 mixed mode):
 [java]
 [java] MultiThreadedHttpConnectionManager cleanup daemon prio=10
 tid=0x7f441051c800 nid=0x527c in Object.wait() [0x7f4417e2f000]
 [java]java.lang.Thread.State: WAITING (on object monitor)
 [java] at java.lang.Object.wait(Native Method)
 [java] - waiting on 0x7f443ae5b290 (a
 java.lang.ref.ReferenceQueue$Lock)
 [java] at
 java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
 [java] - locked 0x7f443ae5b290 (a
 java.lang.ref.ReferenceQueue$Lock)
 [java] at
 java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
 [java] at
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$Referen
 ceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122)

 Any ideas?

 Thanks.



RE: CommonsHttpSolrServer add document hangs

2010-07-12 Thread Robert Petersen
You could try a master slave setup using replication perhaps, then the
slave serves searches and indexing commits on the master won't hang up
searches at least...

Here is the description:  http://wiki.apache.org/solr/SolrReplication


-Original Message-
From: Max Lynch [mailto:ihas...@gmail.com] 
Sent: Monday, July 12, 2010 11:57 AM
To: solr-user@lucene.apache.org
Subject: Re: CommonsHttpSolrServer add document hangs

Thanks Robert,

My script did start going again, but it was waiting for about half an
hour
which seems a bit excessive to me.  Is there some tuning I can do on the
solr end to optimize for my use case, which is very heavy on commits and
very light on searches (I do most of my searches on the raw Lucene index
in
the background)?

Thanks.

On Mon, Jul 12, 2010 at 12:06 PM, Robert Petersen rober...@buy.com
wrote:

 Maybe solr is busy doing a commit or optimize?

 -Original Message-
 From: Max Lynch [mailto:ihas...@gmail.com]
 Sent: Monday, July 12, 2010 9:59 AM
 To: solr-user@lucene.apache.org
 Subject: CommonsHttpSolrServer add document hangs

 Hey guys,
 I'm using Solr 1.4.1 and I've been having some problems lately with
code
 that adds documents through a CommonsHttpSolrServer.  It seems that
 randomly
 the call to theserver.add() will hang.  I am currently running my code
 in a
 single thread, but I noticed this would happen in multi threaded code
as
 well.  The jar version of commons-httpclient is 3.1.

 I got a thread dump of the process, and one thread seems to be waiting
 on
 the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager
as
 shown below.  All other threads are in a RUNNABLE state (besides the
 Finalizer daemon).

 [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM
(16.3-b01
 mixed mode):
 [java]
 [java] MultiThreadedHttpConnectionManager cleanup daemon prio=10
 tid=0x7f441051c800 nid=0x527c in Object.wait()
[0x7f4417e2f000]
 [java]java.lang.Thread.State: WAITING (on object monitor)
 [java] at java.lang.Object.wait(Native Method)
 [java] - waiting on 0x7f443ae5b290 (a
 java.lang.ref.ReferenceQueue$Lock)
 [java] at
 java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
 [java] - locked 0x7f443ae5b290 (a
 java.lang.ref.ReferenceQueue$Lock)
 [java] at
 java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
 [java] at

org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$Referen
 ceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122)

 Any ideas?

 Thanks.



Problem with Wildcard searches in Solr

2010-07-12 Thread imranak


Hi,

I am having a problem doing wildcard searches in lucene syntax using the
edismax handler. I have Solr 4.0 nightly build from the trunk.

A general search like 'computer' returns results but 'com*er' doesn't return
any results. Similary, a search like 'co?mput?r' returns no results. The
only type of wildcard searches working currrently is ones with trailing
wildcards(like compute? or comput*).

I want to be able to do searches with wildcards at the beginning (*puter)
and in between (com*er). Could someone please tell me what I am doing wrong
and how to fix it.

Thanks.

Regards,
Imran.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-Wildcard-searches-in-Solr-tp961448p961448.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to find first document for the ALL search

2010-07-12 Thread Ian Connor
I have found that this search crashes:

/solr/select?q=*%3A*fq=start=0rows=1fl=id

SEVERE: java.lang.IndexOutOfBoundsException: Index: 114, Size: 90
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:288)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:217)
at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:948)
at
org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:506)
at
org.apache.solr.search.SolrIndexReader.document(SolrIndexReader.java:259)

but this one works:

/solr/select?q=*%3A*fq=start=1rows=1fl=id

It looks like just that first document is bad. I am happy to delete it - but
not sure how to get to it. Does anyone know how to find it?

- Ian


range faceting with integers

2010-07-12 Thread Jonathan Rochkind
So I want to provide some range facets with an integer (probably tint, 
that is trie field with non-0 precision) solr field.


It's clear enough how to do this, along the lines of facet.query=[1 TO 
100]facet.query=[101 TO 200]facet.query=[201 TO 300]


etc.

The issue is that I'd like to calculate N equal ranges based on the min 
and max value found in the field. 

I can't think of any way to do this that doesn't require two querries -- 
one to get the min and max (within the current search set), then 
calculate the ranges client-side (possibly making the boundaries 'nice' 
numbers instead of strictly equal ranges), then do another query with 
the calculated facet.queries set.


Is there any other trick I'm missing here?  If there were date values, 
you could possibly use facet.date.gap, although I'm not even sure if 
that works without explicitly setting the facet.date.start, not sure if 
you can leave facet.date.start unset meaning the minimum value in the 
field or not.  But I'm not dealing with dates here anyway, but with 
integers.


So anything I'm missing, or just have the client do two queries?   For 
that matter, is there an easy way to ask for minimum and maximum values 
in a field, within a result set?


Thanks for any advice,
Jonathan


RE: Problem with Wildcard searches in Solr

2010-07-12 Thread Markus Jelsma
Hi,

 

The DisMaxQParser does not support wildcards in its q parameter [1]. You must 
use the LuceneQParser instead. AFAIK, in DisMax, wildcards are part of the 
search query and may get filtered out in your query analyzer.

 

[1]: http://wiki.apache.org/solr/DisMaxRequestHandler#q

 

Cheers,
 
-Original message-
From: imranak imranak...@gmail.com
Sent: Mon 12-07-2010 22:40
To: solr-user@lucene.apache.org; 
Subject: Problem with Wildcard searches in Solr



Hi,

I am having a problem doing wildcard searches in lucene syntax using the
edismax handler. I have Solr 4.0 nightly build from the trunk.

A general search like 'computer' returns results but 'com*er' doesn't return
any results. Similary, a search like 'co?mput?r' returns no results. The
only type of wildcard searches working currrently is ones with trailing
wildcards(like compute? or comput*).

I want to be able to do searches with wildcards at the beginning (*puter)
and in between (com*er). Could someone please tell me what I am doing wrong
and how to fix it.

Thanks.

Regards,
Imran.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-Wildcard-searches-in-Solr-tp961448p961448.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Problem with Wildcard searches in Solr

2010-07-12 Thread imranak

Hi,

Thanks for you response. The dismax query parser doesn't support it but I
heard the edismax parser supports all kinds of wildcards. Been trying it out
but without any luck. Could someone please help me with that. I'm unable to
make leading and in-the-middle wildcard searches work.

Thanks.

Imran.


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-Wildcard-searches-in-Solr-tp961448p961617.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Problem with Wildcard searches in Solr

2010-07-12 Thread Markus Jelsma
Hi,

 

Check edismax' JIRA page and its unresolved related issues [1]. AFAIK, it 
hasn't been committed yet.

 

[1]: https://issues.apache.org/jira/browse/SOLR-1553

 

Cheers,
 
-Original message-
From: imranak imranak...@gmail.com
Sent: Mon 12-07-2010 23:55
To: solr-user@lucene.apache.org; 
Subject: RE: Problem with Wildcard searches in Solr


Hi,

Thanks for you response. The dismax query parser doesn't support it but I
heard the edismax parser supports all kinds of wildcards. Been trying it out
but without any luck. Could someone please help me with that. I'm unable to
make leading and in-the-middle wildcard searches work.

Thanks.

Imran.


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-Wildcard-searches-in-Solr-tp961448p961617.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem with Wildcard searches in Solr

2010-07-12 Thread Yonik Seeley
On Mon, Jul 12, 2010 at 4:39 PM, imranak imranak...@gmail.com wrote:
 A general search like 'computer' returns results but 'com*er' doesn't return
 any results.

This is due to issues with wildcards and stemming.
computer is indexed and searched as comput... but it's not
generally possible to stem wildcarded terms.

So comp*er won't match (the terms in the index are comput)
but comp*r should.

If wildcarding is important, use a field type without a stemmer.

-Yonik
http://www.lucidimagination.com



Re: Two analyzer per field

2010-07-12 Thread Erick Erickson
Could you handle this with the Dismax query handler? You could
specify that the search boost the keyword-analyzed field quite high
if you wanted those documents to come up at the top of the list

If this doesn't help, could you elaborate on the use-case? In particular
I'm wondering why you want to use the keyword analyser in the
first place.


Best
Erick

On Mon, Jul 12, 2010 at 4:36 AM, Mark N nipen.m...@gmail.com wrote:

 Is it possible to specify two analyzers per fields

 for example , consider a field  *F1  *( keyword analyzer) = cheers mate
*F2 *(keyword analyzer ) =
 hello world

 There is also a copy field  *TEXT *( standard analyzer )   which will store
 the  terms  { cheers mate hello world }

 now when user perform any search we will be looking at copy field TEXT
 only which uses standard analyzer . Suppose user search hello word
  phrase
 it will not return any result
 as hello and world terms are tokenized .

 is it possible that I index hello world as it is as well in to
 *TEXT*field ? i.e can I use keyword analyzer as well and standard
 analyzer for
 field TEXT
 what should be better approach to handle this situation ?





 --
 Nipen Mark



Re: indexing with pdf files problem

2010-07-12 Thread Lance Norskog
You need to use the ExtractingRequestHandler to parse these kinds of
files. solr/update only takes a fixed XML format and a custom binary
format.

http://wiki.apache.org/solr/ExtractingRequestHandler

On Mon, Jul 12, 2010 at 3:57 AM, satya swaroop sswaro...@gmail.com wrote:
 hi all,
      i am working with solr on tomcat. the indexing is good for xml files
 but when i send the docs or html files or pdf's through curl i get the error
 as lazy error. can u telll me the way. the output is as follows when i send
 a pdf file  i am working in ubuntu. solr home is /opt/example
  tomcat is /opt/tomcat6


 htmlheadtitleApache Tomcat/6.0.26 - Error report/titlestyle!--H1
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
 H2
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
 H3
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
 BODY
 {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
 P
 {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
 {color : black;}A.name {color : black;}HR {color : #525D76;}--/style
 /headbodyh1HTTP Status 500 - lazy loading error

 org.apache.solr.common.SolrException: lazy loading error
    at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)
    at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
    at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
    at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
    at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
    at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
    at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
    at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
    at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
    at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
    at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
    at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
    at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
    at
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
    at java.lang.Thread.run(Thread.java:619)
 Caused by: org.apache.solr.common.SolrException:
 java.lang.NullPointerException
    at
 org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:76)
    at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
    ... 16 more
 Caused by: java.lang.NullPointerException
    at
 org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:73)
    at
 org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90)
    at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:99)
    at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:84)
    at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:61)
    at
 org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:74)
    ... 17 more
 /h1HR size=1 noshade=noshadepbtype/b Status
 report/ppbmessage/b ulazy loading error

 org.apache.solr.common.SolrException: lazy loading error
    at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)
    at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
    at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
    at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
    at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
    at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
    at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
    at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
    at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
    at
 

Re: Supplementing already indexed data

2010-07-12 Thread Lance Norskog
There are two ways to interpret your mail:

1) you want to add the content to documents already in the index
It is not possible to update some fields in a document. You have to
delete and re-index the entire document.

2) you want to read a database record, use a file name, fetch that
file and index both database fields and the file content in one
document.

The DataImportHandler would let you read fields from the database, use
fields as file names and load in those files into other fields. This
is an advanced use but it might be covered on the DIH page:

http://wiki.apache.org/solr/DataImportHandler

Look for FileDataSource and FieldReaderDataSource.

On Sun, Jul 11, 2010 at 6:37 PM, Tod listac...@gmail.com wrote:
 I'm getting metadata from a RDB but the actual content is stored somewhere
 else.  I'd like to index the content too but I don't want to overlay the
 already indexed metadata.  I know this can be done but I just can't seem to
 dig up the correct docs, can anyone point me in the right direction?


 Thanks.




-- 
Lance Norskog
goks...@gmail.com