date:20110705

i want to enable boosting for query and search results. My dismax
queryHandlerConfiguration is as:

*requestHandler name=dismax class=solr.DisMaxRequestHandler   
lst name=defaults
 str name=echoParamsexplicit/str
  str name=defTypedismax/str 
 float name=tie0.01/float
 str name=qf
text^0.5 name^1.0 description^1.5 
 /str
str name=fl
UID_PK,name,price,description,score
 /str
 str name=mm
2lt;-1 5lt;-2 6lt;90%
 /str
 int name=ps100/int
 str name=q.alt*:*/str
 str name=f.name.hl.fragsize0/str
str name=f.name.hl.alternateFieldname/str
 str name=f.text.hl.fragmenterregex/str
/lst
  /requestHandler*


When querystring is q=gold^2.0 ring(boost gold) and qt=standard i got the
results for gold ring and when qt=dismax i got no result why so please
explain




-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-boost-a-querystring-at-query-time-tp3139800p3139800.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: configure dismax requesthandlar for boost a field

Add score to the fl parameter.

fl=*,score


On 7/4/11 11:09 PM, Romi romijain3...@gmail.com wrote:

I am not returning score for the queries. as i suppose it should be
reflected
in search results. means doc having query string in description field come
higher than the doc having query string in name field.

And yes i restarted solr after making changes in configuration.

-
Thanks  Regards
Romi
--
View this message in context:
http://lucene.472066.n3.nabble.com/configure-dismax-requesthandlar-for-boo
st-a-field-tp3137239p3139680.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: configure dismax requesthandlar for boost a field

will merely adding fl=score make difference in search results, i mean will i
get desired results now???

-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/configure-dismax-requesthandlar-for-boost-a-field-tp3137239p3139814.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How many fields can SOLR handle?

2011-07-05 Thread roySolr

Hi,

I know i can add components to my requesthandler. In this situation facets
are dependent of there category. So if a user choose for the category TV:

Inch:
32 inch(5)
34 inch(3)
40 inch(1)

Resolution:
Full HD(5)
HD ready(2)

When a user search for category Computer:

CPU:
Intel(12)
AMD(10)

GPU:
Ati(5)
Nvidia(2)

So i can't put it in my requesthandler as default. Every search there can be
other facets. Do you understand what i mean?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-many-fields-can-SOLR-handle-tp3033910p3139833.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Exception when using result grouping and sorting by geodist() with Solr 3.3

Did you add: fq={!geofilt} ??

On 7/3/11 11:14 AM, Thomas Heigl tho...@umschalt.com wrote:

Hello,

I just tried up(down?)grading our current Solr 4.0 trunk setup to Solr
3.3.0
as result grouping was the only reason for us to stay with the trunk.
Everything worked like a charm except for one of our queries, where we
group
results by the owning user and sort by distance.

A simplified example for my query (that still fails) looks like this:

q=*:*group=truegroup.field=user.uniqueId_sgroup.main=truegroup.format=
groupedsfield=user.location_ppt=48.20927,16.3728sort=geodist()
 asc


The exception thrown is:

Caused by: org.apache.solr.common.SolrException: Unweighted use of sort
 geodist(latlon(user.location_p),48.20927,16.3728)
 at
 
org.apache.solr.search.function.ValueSource$1.newComparator(ValueSource.j
ava:106)
 at org.apache.lucene.search.SortField.getComparator(SortField.java:413)
 at
 
org.apache.lucene.search.grouping.AbstractFirstPassGroupingCollector.ini
t(AbstractFirstPassGroupingCollector.java:81)
 at
 
org.apache.lucene.search.grouping.TermFirstPassGroupingCollector.init(T
ermFirstPassGroupingCollector.java:56)
 at
 
org.apache.solr.search.Grouping$CommandField.createFirstPassCollector(Gro
uping.java:587)
 at org.apache.solr.search.Grouping.execute(Grouping.java:256)
 at
 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.j
ava:237)
 at
 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchH
andler.java:194)
 at
 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa
se.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
 at
 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedded
SolrServer.java:140)
 ... 39 more


Any ideas how to fix this or work around this error for now? I'd really
like
to move from the trunk to the stable 3.3.0 release and this is the only
problem currently keeping me from doing so.

Cheers,

Thomas

faceting on field with two values

2011-07-05 Thread elisabeth benoit

Hello,

I have two fields TOWN and POSTALCODE and I want to concat those two in one
field to do faceting

My two fields  are declared as followed:

field name=TOWN type=string indexed=true stored=true/
field name=POSTALCODE type=string indexed=true stored=true/

The concat field is declared as followed:

field name=TOWN_POSTALCODE type=string indexed=true stored=true
multiValued=true/

and I do the copyfield as followed:

   copyField source=TOWN dest=TOWN_POSTALCODE/
   copyField source=POSTALCODE dest=TOWN_POSTALCODE/


When I do faceting on TOWN_POSTALCODE field, I only get answers like

lst name=TOWN_POSTALCODE
int name=622005/int
int name=622805/int
int name=boulogne sur mer5/int
int name=saint martin boulogne5/int
...

Which means the faceting is down on the TOWN part or the POSTALCODE part of
TOWN_POSTALCODE.

But I would like to have answers like

lst name=TOWN_POSTALCODE
int name=boulogne sur mer 622005/int
int name=paris 750165/int

Is this possible with Solr?

Thanks,
Elisabeth

Re: faceting on field with two values

The easiest way is to concat() the fields in SQL, and pass it to indexing
as one field already merged together.

Thanks,

On 7/5/11 1:12 AM, elisabeth benoit elisaelisael...@gmail.com wrote:

Hello,

I have two fields TOWN and POSTALCODE and I want to concat those two in
one
field to do faceting

My two fields  are declared as followed:

field name=TOWN type=string indexed=true stored=true/
field name=POSTALCODE type=string indexed=true stored=true/

The concat field is declared as followed:

field name=TOWN_POSTALCODE type=string indexed=true stored=true
multiValued=true/

and I do the copyfield as followed:

   copyField source=TOWN dest=TOWN_POSTALCODE/
   copyField source=POSTALCODE dest=TOWN_POSTALCODE/


When I do faceting on TOWN_POSTALCODE field, I only get answers like

lst name=TOWN_POSTALCODE
int name=622005/int
int name=622805/int
int name=boulogne sur mer5/int
int name=saint martin boulogne5/int
...

Which means the faceting is down on the TOWN part or the POSTALCODE part
of
TOWN_POSTALCODE.

But I would like to have answers like

lst name=TOWN_POSTALCODE
int name=boulogne sur mer 622005/int
int name=paris 750165/int

Is this possible with Solr?

Thanks,
Elisabeth

Re: How many fields can SOLR handle?

This is taxonomy/index design...

One way is to have a series of fields by category:

TV - tv_size, resolution
Computer - cpu, gpu

Solr can have as many fields as you need, and if you do not store them
into the index they are ignored.

So if a user picks TV, you pass these to Solr:

q=*:*facet=truefacet.field=tv_sizefacet.field=resolution

If a user picks Computer, you pass these to Solr:

q=*:*facet=truefacet.field=cpufacet.field=gpu

The other option is to return ALL of the fields facet'd, but this is not
recommended since
you would certainly have performance issues depending on the number of
fields.





On 7/5/11 1:00 AM, roySolr royrutten1...@gmail.com wrote:

Hi,

I know i can add components to my requesthandler. In this situation facets
are dependent of there category. So if a user choose for the category TV:

Inch:
32 inch(5)
34 inch(3)
40 inch(1)

Resolution:
Full HD(5)
HD ready(2)

When a user search for category Computer:

CPU:
Intel(12)
AMD(10)

GPU:
Ati(5)
Nvidia(2)

So i can't put it in my requesthandler as default. Every search there can
be
other facets. Do you understand what i mean?

--
View this message in context:
http://lucene.472066.n3.nabble.com/How-many-fields-can-SOLR-handle-tp30339
10p3139833.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How many fields can SOLR handle?

2011-07-05 Thread roySolr

Thanks Bill,

That's exactly what i mean. But first i do a request to get the right
facetFields from a category.
So a user search for TV, i do request to a db to get tv_size and resolution.
The next step is to
add this to my query like this: facet.field=tv_sizefacet.field=resolution.

I thought maybe it's possible to add the FACET fields automatically to my
query(based on category). I understand this isn't possible and i need to do
first a request to get the facet.fields.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-many-fields-can-SOLR-handle-tp3033910p3139921.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: faceting on field with two values

2011-07-05 Thread roySolr

Are you using the DIH?? You can use the transformer to concat the two fields

--
View this message in context: 
http://lucene.472066.n3.nabble.com/faceting-on-field-with-two-values-tp3139870p3139934.html
Sent from the Solr - User mailing list archive at Nabble.com.

Different Indexing formats for Older Lucene versions and Solr?

2011-07-05 Thread Sowmya V.B.

Hi All

A quick doubt on the index files of Lucene and Solr.

I had an older version of lucene (with UIMA) till recently, and had an index
built thus.
I shifted to Solr (3.3, with UIMA)..and tried to use the same index. While
everything else seems fine, the Solr does not seem to recognize the index
format.
It still shows zero documents.

I noticed from an older Solr application I used, that index directory
contained files like .fdt, .fdx, .fnm, .nrm, .prx, .tii, .tis and
segments_2v, segments_gen files.
But, my lucene application contains only .cfx, .cfs and segments files.

Is there a way to use my old lucene index in the new Solr application?

Sowmya.

-- 
Sowmya V.B.

Losing optimism is blasphemy!
http://vbsowmya.wordpress.com

Re: Different Indexing formats for Older Lucene versions and Solr?

2011-07-05 Thread Tommaso Teofili

Which Lucene version were you using?
Regards,
Tommaso

2011/7/5 Sowmya V.B. vbsow...@gmail.com

 Hi All

 A quick doubt on the index files of Lucene and Solr.

 I had an older version of lucene (with UIMA) till recently, and had an
 index
 built thus.
 I shifted to Solr (3.3, with UIMA)..and tried to use the same index. While
 everything else seems fine, the Solr does not seem to recognize the index
 format.
 It still shows zero documents.

 I noticed from an older Solr application I used, that index directory
 contained files like .fdt, .fdx, .fnm, .nrm, .prx, .tii, .tis and
 segments_2v, segments_gen files.
 But, my lucene application contains only .cfx, .cfs and segments files.

 Is there a way to use my old lucene index in the new Solr application?

 Sowmya.

 --
 Sowmya V.B.
 
 Losing optimism is blasphemy!
 http://vbsowmya.wordpress.com

Re: Different Indexing formats for Older Lucene versions and Solr?

2011-07-05 Thread Sowmya V.B.

I was using 2.4 or 2.5. It was a 2yr old Lucene version.

On Tue, Jul 5, 2011 at 10:07 AM, Tommaso Teofili
tommaso.teof...@gmail.comwrote:

 Which Lucene version were you using?
 Regards,
 Tommaso

 2011/7/5 Sowmya V.B. vbsow...@gmail.com

  Hi All
 
  A quick doubt on the index files of Lucene and Solr.
 
  I had an older version of lucene (with UIMA) till recently, and had an
  index
  built thus.
  I shifted to Solr (3.3, with UIMA)..and tried to use the same index.
 While
  everything else seems fine, the Solr does not seem to recognize the index
  format.
  It still shows zero documents.
 
  I noticed from an older Solr application I used, that index directory
  contained files like .fdt, .fdx, .fnm, .nrm, .prx, .tii, .tis and
  segments_2v, segments_gen files.
  But, my lucene application contains only .cfx, .cfs and segments files.
 
  Is there a way to use my old lucene index in the new Solr application?
 
  Sowmya.
 
  --
  Sowmya V.B.
  
  Losing optimism is blasphemy!
  http://vbsowmya.wordpress.com
  
 




-- 
Sowmya V.B.

Losing optimism is blasphemy!
http://vbsowmya.wordpress.com

Re: Exception when using result grouping and sorting by geodist() with Solr 3.3

2011-07-05 Thread Thomas Heigl

I'm pretty sure my original query contained a distance filter as well. Do I
absolutely need to filter by distance in order to sort my results by it?

I'll write another unit test including a distance filter as soon as I get a
chance.

Cheers,

Thomas

On Tue, Jul 5, 2011 at 9:04 AM, Bill Bell billnb...@gmail.com wrote:

 Did you add: fq={!geofilt} ??

 On 7/3/11 11:14 AM, Thomas Heigl tho...@umschalt.com wrote:

 Hello,
 
 I just tried up(down?)grading our current Solr 4.0 trunk setup to Solr
 3.3.0
 as result grouping was the only reason for us to stay with the trunk.
 Everything worked like a charm except for one of our queries, where we
 group
 results by the owning user and sort by distance.
 
 A simplified example for my query (that still fails) looks like this:
 
 q=*:*group=truegroup.field=user.uniqueId_sgroup.main=truegroup.format=
 groupedsfield=user.location_ppt=48.20927,16.3728sort=geodist()
  asc
 
 
 The exception thrown is:
 
 Caused by: org.apache.solr.common.SolrException: Unweighted use of sort
  geodist(latlon(user.location_p),48.20927,16.3728)
  at
 
 org.apache.solr.search.function.ValueSource$1.newComparator(ValueSource.j
 ava:106)
  at org.apache.lucene.search.SortField.getComparator(SortField.java:413)
  at
 
 org.apache.lucene.search.grouping.AbstractFirstPassGroupingCollector.ini
 t(AbstractFirstPassGroupingCollector.java:81)
  at
 
 org.apache.lucene.search.grouping.TermFirstPassGroupingCollector.init(T
 ermFirstPassGroupingCollector.java:56)
  at
 
 org.apache.solr.search.Grouping$CommandField.createFirstPassCollector(Gro
 uping.java:587)
  at org.apache.solr.search.Grouping.execute(Grouping.java:256)
  at
 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.j
 ava:237)
  at
 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchH
 andler.java:194)
  at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa
 se.java:129)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
  at
 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedded
 SolrServer.java:140)
  ... 39 more
 
 
 Any ideas how to fix this or work around this error for now? I'd really
 like
 to move from the trunk to the stable 3.3.0 release and this is the only
 problem currently keeping me from doing so.
 
 Cheers,
 
 Thomas

Re: faceting on field with two values

2011-07-05 Thread elisabeth benoit

hmmm... that sounds interesting and brings me somewhere else.

we are actually reindexing data every night but the whole process is done by
talend (reading and formatting data from a database) and this makes me
wondering if we should use Solr instead to do this.

in this case, concat two fields, the change is quite heavy (we have to
change the talend process, pollute the xml files we use to index data with
redundant fields, then modify the Solr process).

so do you think the dih (which I just discovered) would be appropriate to do
the whole process (read a database, read fields from xml contained in some
of the database columns, add informations from csv file)???

from what I just read about dih, it seems so, but I'm still very confused
about this dih thing.

thanks again,
Elisabeth

2011/7/5 roySolr royrutten1...@gmail.com

 Are you using the DIH?? You can use the transformer to concat the two
 fields

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/faceting-on-field-with-two-values-tp3139870p3139934.html
 Sent from the Solr - User mailing list archive at Nabble.com.

OOM at solr master node while updating document

2011-07-05 Thread Chengyang

Is there any memory leak when I updating the index at the master node?
Here is the stack trace.

o.a.solr.servlet.SolrDispatchFilter - java.lang.OutOfMemoryError: Java heap 
space
at 
org.apache.solr.handler.ReplicationHandler$FileStream.write(ReplicationHandler.java:1000)
at 
org.apache.solr.handler.ReplicationHandler$3.write(ReplicationHandler.java:887)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:322)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:230)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at 
org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:179)
at 
org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:84)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:157)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:262)
at 
org.apache.coyote.ajp.AjpAprProcessor.process(AjpAprProcessor.java:425)
at 
org.apache.coyote.ajp.AjpAprProtocol$AjpConnectionHandler.process(AjpAprProtocol.java:378)
at 
org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1508)
at java.lang.Thread.run(Thread.java:619)

searching a subset of SOLR index

Hi,
Let say, I have got 10^10 documents in an index with unique id being document 
id which is assigned to each of those from 1 to 10^10 .
Now I want to search a particular query string in a subset of these documents 
say ( document id 100 to 1000).

The question here is.. will SOLR able to search just in this set of documents 
rather than the entire index ? if yes what should be query to limit search into 
this subset ?

Regards,
JAME VAALET
Software Developer
EXT :8108
Capital IQ

Re: searching a subset of SOLR index

2011-07-05 Thread Shashi Kant

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote:
 Hi,
 Let say, I have got 10^10 documents in an index with unique id being document 
 id which is assigned to each of those from 1 to 10^10 .
 Now I want to search a particular query string in a subset of these documents 
 say ( document id 100 to 1000).

 The question here is.. will SOLR able to search just in this set of documents 
 rather than the entire index ? if yes what should be query to limit search 
 into this subset ?

 Regards,
 JAME VAALET
 Software Developer
 EXT :8108
 Capital IQ

RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any 
mechanism to limit this physically as well .Do we leverage time factor by using 
the range query ?

Regards,
JAME VAALET


-Original Message-
From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi 
Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote:
 Hi,
 Let say, I have got 10^10 documents in an index with unique id being document 
 id which is assigned to each of those from 1 to 10^10 .
 Now I want to search a particular query string in a subset of these documents 
 say ( document id 100 to 1000).

 The question here is.. will SOLR able to search just in this set of documents 
 rather than the entire index ? if yes what should be query to limit search 
 into this subset ?

 Regards,
 JAME VAALET
 Software Developer
 EXT :8108
 Capital IQ

Re: configure dismax requesthandlar for boost a field

On Tue, Jul 5, 2011 at 08:46, Romi romijain3...@gmail.com wrote:
 will merely adding fl=score make difference in search results, i mean will i
 get desired results now???

The fl parameter stands for field list and allows you to configure
in a request which result fields should be returned.

If you try to tweak the boosts in order to change your result order,
it's wise to add the calculated score to the output by setting
something like fl=score,*.

This reminds me of another important question: Are you sorting the
result by score? Because if not, your changes to the boosts/score
won't ever have an effect on the ordering.

http://wiki.apache.org/solr/CommonQueryParameters

Marian

Re: Spellchecker in zero-hit search result

On Mon, Jul 4, 2011 at 17:19, Juan Grande juan.gra...@gmail.com wrote:
 Hi Marian,

 I guess that your problem isn't related to the number of results, but to the
 component's configuration. The configuration that you show is meant to set
 up an autocomplete component that will suggest terms from an incomplete user
 input (something similar to what google does while you're typing in the
 search box), see http://wiki.apache.org/solr/Suggester. That's why your
 suggestions to place are places and placed, all sharing the place
 prefix. But when you search for placw, the component doesn't return any
 suggestion, because in your index no term begins with placw.

 You can learn how to correctly configure a spellchecker here:
 http://wiki.apache.org/solr/SpellCheckComponent. Also, I'd recommend to take
 a look at the example's solrconfig, because it provides an example
 spellchecker configuration.

Juan, thanks for the information!

I have read through that page for quite a while before doing my tests,
but it seems as if I had a different mental model. Then all reading
might not be worth it. I thought that the SpellCheckComponent would be
able to fetch index terms which are similar to the query term. The use
case for that, mainly (but not only) in case of a zero-hit search
would be to display the famous Did you mean ... hint.

So I'm going back to the docs and to the example. :)

Later,

Marian

Re: faceting on field with two values

On Tue, Jul 5, 2011 at 10:21, elisabeth benoit
elisaelisael...@gmail.com wrote:
 ...

 so do you think the dih (which I just discovered) would be appropriate to do
 the whole process (read a database, read fields from xml contained in some
 of the database columns, add informations from csv file)???

 from what I just read about dih, it seems so, but I'm still very confused
 about this dih thing.

As far as I can tell, the DataImportHandler is very useful if you want
to get data (only) from a database directly to Solr, with only slight
manipulation, e.g. concatenations. For that, it's much more convenient
than the path via scripts to generate XML.

It sounds like you are doing more than that in your importers.

Re: OOM at solr master node while updating document

2011/7/5 Chengyang atreey...@163.com:
 Is there any memory leak when I updating the index at the master node?
 Here is the stack trace.

 o.a.solr.servlet.SolrDispatchFilter - java.lang.OutOfMemoryError: Java heap 
 space

You don't need a memory leak to get a OOM error in Java. It might just
happen that the amount or RAM allocated by the virtual machine is used
up.

If you are running Solr as in the example, via Jetty on the command line, try

  java -server -jar start.jar

Or try the -Xmx parameter, e.g.

  java -Xmx1024M -jar start.jar

If you are using Tomcat or something else, you might want to look into
the docs on how to deal with the memory limit.

Marian

Re: what is the diff between katta and solrcloud?

2011-07-05 Thread sinoantony

Why katta stores index on HDFS?  Any advantages?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-is-the-diff-between-katta-and-solrcloud-tp2275554p3139983.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: searching a subset of SOLR index

2011-07-05 Thread Pierre GOSSE

The limit will always be logical if you have all documents in the same index. 
But filters are very efficient when working with subset of your index, 
especially if you reuse the same filter for many queries since there is a cache.

If your subsets are always the same subsets, maybe your could use shards. But 
we would need to know more about what you intend to do, to point to an adequate 
solution.

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 11:10
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any 
mechanism to limit this physically as well .Do we leverage time factor by using 
the range query ?

Regards,
JAME VAALET


-Original Message-
From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi 
Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote:
 Hi,
 Let say, I have got 10^10 documents in an index with unique id being document 
 id which is assigned to each of those from 1 to 10^10 .
 Now I want to search a particular query string in a subset of these documents 
 say ( document id 100 to 1000).

 The question here is.. will SOLR able to search just in this set of documents 
 rather than the entire index ? if yes what should be query to limit search 
 into this subset ?

 Regards,
 JAME VAALET
 Software Developer
 EXT :8108
 Capital IQ

Re: Spellchecker in zero-hit search result

On Tue, Jul 5, 2011 at 11:24, Marian Steinbach mar...@sendung.de wrote:
 On Mon, Jul 4, 2011 at 17:19, Juan Grande juan.gra...@gmail.com wrote:
 ...

 You can learn how to correctly configure a spellchecker here:
 http://wiki.apache.org/solr/SpellCheckComponent. Also, I'd recommend to take
 a look at the example's solrconfig, because it provides an example
 spellchecker configuration.


I found the problem. My suggest component had the line

  str name=classnameorg.apache.solr.spelling.suggest.Suggester/str

which I probably copied without knowing what I did. I removed it to
use the default IndexBasedSpellChecker. No I get my suggestions on
zero-hit searches as well.

Thanks again!

Marian

Re: Does Nutch make any use of solr.WhitespaceTokenizerFactory defined in schema.xml?

nice...where?

I'm trying to figure out 2 things:
1) How to create an analyzer that corresponds to the one in the schema.xml.

 analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer

2) I'd like to see the code that creates it reading it from schema.xml .

On Tue, Jul 5, 2011 at 12:33 PM, Markus Jelsma
markus.jel...@openindex.iowrote:

 No. SolrJ only builds input docs from NutchDocument objects. Solr will do
 analysis. The integration is analogous to XML post of Solr documents.

 On Tuesday 05 July 2011 12:28:21 Gabriele Kahlout wrote:
  Hello,
 
  I'm trying to understand better Nutch and Solr integration. My
  understanding is that Documents are added to Solr index from SolrWriter's
  write(NutchDocument doc) method. But does it make any use of the
  WhitespaceTokenizerFactory?

 --
 Markus Jelsma - CTO - Openindex
 http://www.linkedin.com/in/markus17
 050-8536620 / 06-50258350




-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: full text searching in cloud for minor enterprises

2011-07-05 Thread Joe Scanlon

Look at searchblox

On Monday, July 4, 2011, Li Li fancye...@gmail.com wrote:
 hi all,
     I want to provide full text searching for some small websites.
 It seems cloud computing is  popular now. And it will save costs
 because it don't need employ engineer to maintain
 the machine.
     For now, there are many services such as amazon s3, google app
 engine, ms azure etc. I am not familiar with cloud computing. Anyone
 give me a direction or some advice? thanks

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org



-- 
Joe Scanlon

jscan...@element115.net

Mobile: 603 459 3242
Office:  312 445 0018

RE: searching a subset of SOLR index

I have got two applications 

1. website
The website will enable any user to search the document repository , 
and the set they search on is known as website presentable
2. windows service 
The windows service will search on all the documents in the repository 
for fixed set of key words and store the found result in database.this set  
 is universal set of documents in the doc repository including the website 
presentable.


Website is a high prioritized app which should work smoothly without any 
interference , where as windows service should run all day long continuously 
without break to save result from incoming docs.
The problem here is website set is predefined and I don't want the windows 
service request to SOLR to slow down website request.

Suppose am segregating the website presentable docs index into a particular 
core and rest of them into different core will it solve the problem ?
I have also read about multiple ports for listening request from different apps 
, can this be used. 



Regards,
JAME VAALET


-Original Message-
From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Sent: Tuesday, July 05, 2011 3:52 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

The limit will always be logical if you have all documents in the same index. 
But filters are very efficient when working with subset of your index, 
especially if you reuse the same filter for many queries since there is a cache.

If your subsets are always the same subsets, maybe your could use shards. But 
we would need to know more about what you intend to do, to point to an adequate 
solution.

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 11:10
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any 
mechanism to limit this physically as well .Do we leverage time factor by using 
the range query ?

Regards,
JAME VAALET


-Original Message-
From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi 
Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote:
 Hi,
 Let say, I have got 10^10 documents in an index with unique id being document 
 id which is assigned to each of those from 1 to 10^10 .
 Now I want to search a particular query string in a subset of these documents 
 say ( document id 100 to 1000).

 The question here is.. will SOLR able to search just in this set of documents 
 rather than the entire index ? if yes what should be query to limit search 
 into this subset ?

 Regards,
 JAME VAALET
 Software Developer
 EXT :8108
 Capital IQ

Re: @field for child object

2011-07-05 Thread Mark Miller

Not yet - I've played around with support in this issue in the past though: 
https://issues.apache.org/jira/browse/SOLR-1945

On Jul 4, 2011, at 6:04 AM, Kiwi de coder wrote:

 hi,
 
 i wondering solrj @Field annotation support embedded child object ? e.g.
 
 class A {
 
  @field
  string somefield;
 
 @emebedded
  B b;
 
 }
 
 regards,
 kiwi

- Mark Miller
lucidimagination.com

Re: Feed index with analyzer output

2011-07-05 Thread Lox

Ok, 

the very short question is:
Is there a way to submit the analyzer response so that solr already knows
what to do with that response? (that is, which field are to be treated as
payloads, which are tokens, etc...)


Chris Hostetter-3 wrote:
 
 can you explain a bit more about what you goal is here?  what info are you 
 planning on extracting?  what do you intend to change between the info you 
 get back in the first request and the info you want to send in the second 
 request?
 

I plan to add some payloads to some terms between request#1 and request#2.


Chris Hostetter-3 wrote:
 
 your analyziers and whatnot for request#1 would be exactly what you're use 
 to, but for request#2 you'd need to specify an analyzer that would let you 
 specify, in the field value, the details about the term and position, and 
 offsets, and payloads and what not ... the 
 DelimitedPayloadTokenFilterFactory / DelimitedPayloadTokenFilter can help 
 with some of that, but not all -- you'd either need your own custom 
 analyzer or custom FieldType or something depending on teh specific 
 changes you want to make.
 
 Frankly though i really believe you are going about this backwards -- if 
 you want to manipulate the Tokenstream after analysis but before indexing, 
 then why not implement this custom logic thta you want in a TokenFilter 
 and use it in the last TokenFilterFactory you have for your analyzer?
 
 

Yeah, I thought about that. I really wanted to know if there weren't an
already implemented way to do that to avoid reinventing the wheel.

It would be cool if I were able to send info to solr formatted in a way like
I imagined in my last mail, so that a call to any Tokenizer or TokenFilter
wouldn't be necessary. It would have been like using an empty analyzer but
still retaining the various token information.

Thank you!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Feed-index-with-analyzer-output-tp3131771p3140460.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: searching a subset of SOLR index

2011-07-05 Thread Pierre GOSSE

From what you tell us, I guess a separate index for website docs would be the 
best. If you fear that request from the window service would cripple your web 
site performance, why not have a totally separated index on another server, 
and have your website documents index in both indexes ?

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 13:14
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

I have got two applications 

1. website
The website will enable any user to search the document repository , 
and the set they search on is known as website presentable
2. windows service 
The windows service will search on all the documents in the repository 
for fixed set of key words and store the found result in database.this set  
 is universal set of documents in the doc repository including the website 
presentable.


Website is a high prioritized app which should work smoothly without any 
interference , where as windows service should run all day long continuously 
without break to save result from incoming docs.
The problem here is website set is predefined and I don't want the windows 
service request to SOLR to slow down website request.

Suppose am segregating the website presentable docs index into a particular 
core and rest of them into different core will it solve the problem ?
I have also read about multiple ports for listening request from different apps 
, can this be used. 



Regards,
JAME VAALET


-Original Message-
From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Sent: Tuesday, July 05, 2011 3:52 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

The limit will always be logical if you have all documents in the same index. 
But filters are very efficient when working with subset of your index, 
especially if you reuse the same filter for many queries since there is a cache.

If your subsets are always the same subsets, maybe your could use shards. But 
we would need to know more about what you intend to do, to point to an adequate 
solution.

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 11:10
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any 
mechanism to limit this physically as well .Do we leverage time factor by using 
the range query ?

Regards,
JAME VAALET


-Original Message-
From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi 
Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote:
 Hi,
 Let say, I have got 10^10 documents in an index with unique id being document 
 id which is assigned to each of those from 1 to 10^10 .
 Now I want to search a particular query string in a subset of these documents 
 say ( document id 100 to 1000).

 The question here is.. will SOLR able to search just in this set of documents 
 rather than the entire index ? if yes what should be query to limit search 
 into this subset ?

 Regards,
 JAME VAALET
 Software Developer
 EXT :8108
 Capital IQ

Re: configure dismax requesthandlar for boost a field

I got the point that to boost search result i have to sort by score.
But as in solrconfig for dismax request handler i use  
*str name=qf
text^0.5 name^1.0 description^1.5 
 /str*


because i want docs having querystring in description field come upper in
search results.
but what i am getting is first doc in search result which does not have
querystring in its description field.


-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/configure-dismax-requesthandlar-for-boost-a-field-tp3137239p3140501.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is solrj 3.3.0 ready for field collapsing?

2011-07-05 Thread Erick Erickson

Let's see the results of adding debugQuery=on to your URL. Are you getting
any documents back at all? If not, then your query isn't getting any
documents to group.

You haven't told us much about what you're trying to do, you might want to
review: http://wiki.apache.org/solr/UsingMailingLists

Best
Erick
On Jul 4, 2011 11:55 AM, Per Newgro per.new...@gmx.ch wrote:

Re: configure dismax requesthandlar for boost a field

 I got the point that to boost search
 result i have to sort by score.
 But as in solrconfig for dismax request handler i use 
 
 *str name=qf
         text^0.5 name^1.0
 description^1.5 
  /str*
 
 
 because i want docs having querystring in description field
 come upper in
 search results.
 but what i am getting is first doc in search result which
 does not have
 querystring in its description field.

Increase the boost factor of description till you get. Lets say make it 100. 
Adding debugQuery=on will show how the actual score is calculated.

Re: How to boost a querystring at query time

 When querystring is q=gold^2.0 ring(boost gold) and
 qt=standard i got the
 results for gold ring and when qt=dismax i got no result
 why so please
 explain

q=gold^2.0 ring(boost gold)defType=dismax would return a document that 
contains exactly gold^2.0 ring(boost gold) in it.

dismax is designed to work with simple keyword queries. You can use only three 
special characters: + - 

Rest of the characters, : [ ] ^ ~ etc, don't work. i.e. they are escaped.

Please see http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/

Re: Does Nutch make any use of solr.WhitespaceTokenizerFactory defined in schema.xml?

I suspect the following should do (1). I'm just not sure about file
references as in  stopInit.put(words, stopwords.txt) . (2) should
clarify.

1)
class SchemaAnalyzer extends Analyzer{

@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
HashMapString, String stopInit = new HashMapString,String();
stopInit.put(words, stopwords.txt);
stopInit.put(ignoreCase, Boolean.TRUE.toString());
StopFilterFactory stopFilterFactory = new StopFilterFactory();
stopFilterFactory.init(stopInit);

final HashMapString, String wordDelimInit = new
HashMapString, String();
wordDelimInit.put(generateWordParts, 1);
wordDelimInit.put(generateNumberParts, 1);
wordDelimInit.put(catenateWords, 1);
wordDelimInit.put(catenateWords, 1);
wordDelimInit.put(catenateNumbers, 1);
wordDelimInit.put(catenateAll, 0);
wordDelimInit.put(splitOnCaseChange, 1);

WordDelimiterFilterFactory wordDelimiterFilterFactory = new
WordDelimiterFilterFactory();
wordDelimiterFilterFactory.init(wordDelimInit);
HashMapString, String porterInit = new HashMapString,
String();
porterInit.put(protected, protwords.txt);
EnglishPorterFilterFactory englishPorterFilterFactory = new
EnglishPorterFilterFactory();
englishPorterFilterFactory.init(porterInit);

return new
RemoveDuplicatesTokenFilter(englishPorterFilterFactory.create(new
LowerCaseFilter(wordDelimiterFilterFactory.create(stopFilterFactory.create(new
WhitespaceTokenizer(reader));
}
}

On Tue, Jul 5, 2011 at 1:00 PM, Gabriele Kahlout
gabri...@mysimpatico.comwrote:

 nice...where?

 I'm trying to figure out 2 things:
 1) How to create an analyzer that corresponds to the one in the schema.xml.


  analyzer
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1/
   /analyzer

 2) I'd like to see the code that creates it reading it from schema.xml .


 On Tue, Jul 5, 2011 at 12:33 PM, Markus Jelsma markus.jel...@openindex.io
  wrote:

 No. SolrJ only builds input docs from NutchDocument objects. Solr will do
 analysis. The integration is analogous to XML post of Solr documents.

 On Tuesday 05 July 2011 12:28:21 Gabriele Kahlout wrote:
  Hello,
 
  I'm trying to understand better Nutch and Solr integration. My
  understanding is that Documents are added to Solr index from
 SolrWriter's
  write(NutchDocument doc) method. But does it make any use of the
  WhitespaceTokenizerFactory?

 --
 Markus Jelsma - CTO - Openindex
 http://www.linkedin.com/in/markus17
 050-8536620 / 06-50258350




 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).




-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: Does Nutch make any use of solr.WhitespaceTokenizerFactory defined in schema.xml?

Not yet an answer to 2) but this is where and how Solr initializes the
Analyzer defined in the schema.xml into :

//org.apache.solr.schema.IndexSchema
 // Load the Tokenizer
// Although an analyzer only allows a single Tokenizer, we load a list
to make sure
// the configuration is ok
//

final ArrayListTokenizerFactory tokenizers = new
ArrayListTokenizerFactory(1);
AbstractPluginLoaderTokenizerFactory tokenizerLoader =
  new AbstractPluginLoaderTokenizerFactory( [schema.xml]
analyzer/tokenizer, false, false )
{
  @Override
  protected void init(TokenizerFactory plugin, Node node) throws
Exception {
if( !tokenizers.isEmpty() ) {
  throw new SolrException( SolrException.ErrorCode.SERVER_ERROR,
  The schema defines multiple tokenizers for: +node );
}
final MapString,String params =
DOMUtil.toMapExcept(node.getAttributes(),class);
// copy the luceneMatchVersion from config, if not set
if (!params.containsKey(LUCENE_MATCH_VERSION_PARAM))
  params.put(LUCENE_MATCH_VERSION_PARAM,
solrConfig.luceneMatchVersion.toString());
plugin.init( params );
tokenizers.add( plugin );
  }

  @Override
  protected TokenizerFactory register(String name, TokenizerFactory
plugin) throws Exception {
return null; // used for map registration
  }
};
tokenizerLoader.load( loader, (NodeList)xpath.evaluate(./tokenizer,
node, XPathConstants.NODESET) );

// Make sure something was loaded
if( tokenizers.isEmpty() ) {
  throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,analyzer
without class or tokenizer  filter list);
}


// Load the Filters
//

final ArrayListTokenFilterFactory filters = new
ArrayListTokenFilterFactory();
AbstractPluginLoaderTokenFilterFactory filterLoader =
  new AbstractPluginLoaderTokenFilterFactory( [schema.xml]
analyzer/filter, false, false )
{
  @Override
  protected void init(TokenFilterFactory plugin, Node node) throws
Exception {
if( plugin != null ) {
  final MapString,String params =
DOMUtil.toMapExcept(node.getAttributes(),class);
  // copy the luceneMatchVersion from config, if not set
  if (!params.containsKey(LUCENE_MATCH_VERSION_PARAM))
params.put(LUCENE_MATCH_VERSION_PARAM,
solrConfig.luceneMatchVersion.toString());
  plugin.init( params );
  filters.add( plugin );
}
  }

  @Override
  protected TokenFilterFactory register(String name, TokenFilterFactory
plugin) throws Exception {
return null; // used for map registration
  }
};
filterLoader.load( loader, (NodeList)xpath.evaluate(./filter, node,
XPathConstants.NODESET) );

return new TokenizerChain(charFilters.toArray(new
CharFilterFactory[charFilters.size()]),
tokenizers.get(0), filters.toArray(new
TokenFilterFactory[filters.size()]));
  };


On Tue, Jul 5, 2011 at 2:26 PM, Gabriele Kahlout
gabri...@mysimpatico.comwrote:

 I suspect the following should do (1). I'm just not sure about file
 references as in  stopInit.put(words, stopwords.txt) . (2) should
 clarify.

 1)
 class SchemaAnalyzer extends Analyzer{

 @Override
 public TokenStream tokenStream(String fieldName, Reader reader) {
 HashMapString, String stopInit = new
 HashMapString,String();
 stopInit.put(words, stopwords.txt);
 stopInit.put(ignoreCase, Boolean.TRUE.toString());
 StopFilterFactory stopFilterFactory = new StopFilterFactory();
 stopFilterFactory.init(stopInit);

 final HashMapString, String wordDelimInit = new
 HashMapString, String();
 wordDelimInit.put(generateWordParts, 1);
 wordDelimInit.put(generateNumberParts, 1);
 wordDelimInit.put(catenateWords, 1);
 wordDelimInit.put(catenateWords, 1);
 wordDelimInit.put(catenateNumbers, 1);
 wordDelimInit.put(catenateAll, 0);
 wordDelimInit.put(splitOnCaseChange, 1);

 WordDelimiterFilterFactory wordDelimiterFilterFactory = new
 WordDelimiterFilterFactory();
 wordDelimiterFilterFactory.init(wordDelimInit);
 HashMapString, String porterInit = new HashMapString,
 String();
 porterInit.put(protected, protwords.txt);
 EnglishPorterFilterFactory englishPorterFilterFactory = new
 EnglishPorterFilterFactory();
 englishPorterFilterFactory.init(porterInit);

 return new
 RemoveDuplicatesTokenFilter(englishPorterFilterFactory.create(new
 LowerCaseFilter(wordDelimiterFilterFactory.create(stopFilterFactory.create(new
 WhitespaceTokenizer(reader));
 }
 }

 On Tue, Jul 5, 2011 at 1:00 PM, Gabriele Kahlout

RE: searching a subset of SOLR index

But incase the website docs contribute around 50 % of the entire docs , why to 
recreate the indexes . don't you think its redundancy ?
Can two web apps (solr instances ) share a single index file to search on it 
without interfering each other 


Regards,
JAME VAALET
Software Developer 
EXT :8108
Capital IQ


-Original Message-
From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Sent: Tuesday, July 05, 2011 5:12 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

From what you tell us, I guess a separate index for website docs would be the 
best. If you fear that request from the window service would cripple your web 
site performance, why not have a totally separated index on another server, 
and have your website documents index in both indexes ?

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 13:14
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

I have got two applications 

1. website
The website will enable any user to search the document repository , 
and the set they search on is known as website presentable
2. windows service 
The windows service will search on all the documents in the repository 
for fixed set of key words and store the found result in database.this set  
 is universal set of documents in the doc repository including the website 
presentable.


Website is a high prioritized app which should work smoothly without any 
interference , where as windows service should run all day long continuously 
without break to save result from incoming docs.
The problem here is website set is predefined and I don't want the windows 
service request to SOLR to slow down website request.

Suppose am segregating the website presentable docs index into a particular 
core and rest of them into different core will it solve the problem ?
I have also read about multiple ports for listening request from different apps 
, can this be used. 



Regards,
JAME VAALET


-Original Message-
From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Sent: Tuesday, July 05, 2011 3:52 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

The limit will always be logical if you have all documents in the same index. 
But filters are very efficient when working with subset of your index, 
especially if you reuse the same filter for many queries since there is a cache.

If your subsets are always the same subsets, maybe your could use shards. But 
we would need to know more about what you intend to do, to point to an adequate 
solution.

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 11:10
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any 
mechanism to limit this physically as well .Do we leverage time factor by using 
the range query ?

Regards,
JAME VAALET


-Original Message-
From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi 
Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote:
 Hi,
 Let say, I have got 10^10 documents in an index with unique id being document 
 id which is assigned to each of those from 1 to 10^10 .
 Now I want to search a particular query string in a subset of these documents 
 say ( document id 100 to 1000).

 The question here is.. will SOLR able to search just in this set of documents 
 rather than the entire index ? if yes what should be query to limit search 
 into this subset ?

 Regards,
 JAME VAALET
 Software Developer
 EXT :8108
 Capital IQ

Re: How to boost a querystring at query time

than what should i do to get the required result. ie. if i want to boost gold
than which querytype i should use.

-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-boost-a-querystring-at-query-time-tp3139800p3140703.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Reading data from Solr MoreLikeThis

2011-07-05 Thread Sheetal

Hi Juan,

Thank you very much..Your code worked pretty awesome and was real
helpfulGreat start of the day...:)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reading-data-from-Solr-MoreLikeThis-tp3130184p3140715.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: searching a subset of SOLR index

2011-07-05 Thread Erik Hatcher

I wouldn't share the same index across two Solr webapps - as they could step on 
each others toes.  

In this scenario, I think having two Solr instances replicating from the same 
master is the way to go, to allow you to scale your load from each application 
separately.  

Erik



On Jul 5, 2011, at 09:04 , Jame Vaalet wrote:

 But incase the website docs contribute around 50 % of the entire docs , why 
 to recreate the indexes . don't you think its redundancy ?
 Can two web apps (solr instances ) share a single index file to search on it 
 without interfering each other 
 
 
 Regards,
 JAME VAALET
 Software Developer 
 EXT :8108
 Capital IQ
 
 
 -Original Message-
 From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
 Sent: Tuesday, July 05, 2011 5:12 PM
 To: solr-user@lucene.apache.org
 Subject: RE: searching a subset of SOLR index
 
 From what you tell us, I guess a separate index for website docs would be the 
 best. If you fear that request from the window service would cripple your web 
 site performance, why not have a totally separated index on another server, 
 and have your website documents index in both indexes ?
 
 Pierre
 
 -Message d'origine-
 De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
 Envoyé : mardi 5 juillet 2011 13:14
 À : solr-user@lucene.apache.org
 Objet : RE: searching a subset of SOLR index
 
 I have got two applications 
 
 1. website
   The website will enable any user to search the document repository , 
 and the set they search on is known as website presentable
 2. windows service 
   The windows service will search on all the documents in the repository 
 for fixed set of key words and store the found result in database.this set
is universal set of documents in the doc repository including the website 
 presentable.
 
 
 Website is a high prioritized app which should work smoothly without any 
 interference , where as windows service should run all day long continuously 
 without break to save result from incoming docs.
 The problem here is website set is predefined and I don't want the windows 
 service request to SOLR to slow down website request.
 
 Suppose am segregating the website presentable docs index into a particular 
 core and rest of them into different core will it solve the problem ?
 I have also read about multiple ports for listening request from different 
 apps , can this be used. 
 
 
 
 Regards,
 JAME VAALET
 
 
 -Original Message-
 From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
 Sent: Tuesday, July 05, 2011 3:52 PM
 To: solr-user@lucene.apache.org
 Subject: RE: searching a subset of SOLR index
 
 The limit will always be logical if you have all documents in the same index. 
 But filters are very efficient when working with subset of your index, 
 especially if you reuse the same filter for many queries since there is a 
 cache.
 
 If your subsets are always the same subsets, maybe your could use shards. But 
 we would need to know more about what you intend to do, to point to an 
 adequate solution.
 
 Pierre
 
 -Message d'origine-
 De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
 Envoyé : mardi 5 juillet 2011 11:10
 À : solr-user@lucene.apache.org
 Objet : RE: searching a subset of SOLR index
 
 Thanks.
 But does this range query just limit the universe logically or does it have 
 any mechanism to limit this physically as well .Do we leverage time factor by 
 using the range query ?
 
 Regards,
 JAME VAALET
 
 
 -Original Message-
 From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi 
 Kant
 Sent: Tuesday, July 05, 2011 2:26 PM
 To: solr-user@lucene.apache.org
 Subject: Re: searching a subset of SOLR index
 
 Range query
 
 
 On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote:
 Hi,
 Let say, I have got 10^10 documents in an index with unique id being 
 document id which is assigned to each of those from 1 to 10^10 .
 Now I want to search a particular query string in a subset of these 
 documents say ( document id 100 to 1000).
 
 The question here is.. will SOLR able to search just in this set of 
 documents rather than the entire index ? if yes what should be query to 
 limit search into this subset ?
 
 Regards,
 JAME VAALET
 Software Developer
 EXT :8108
 Capital IQ

Re: How to boost a querystring at query time

 than what should i do to get the
 required result. ie. if i want to boost gold
 than which querytype i should use.

If you want to boost the keyword 'gold', you can use bq parameter.

defType=dismaxbq=someField:gold^100

See the other parameters : 
http://wiki.apache.org/solr/DisMaxQParserPlugin#bq_.28Boost_Query.29

RE: searching a subset of SOLR index

2011-07-05 Thread Pierre GOSSE

It is redundancy. You have to balance the cost of redundancy with the cost in 
performance with your web index requested by your windows service. If your 
windows service is not too aggressive in its requests, go for shards.

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 15:05
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

But incase the website docs contribute around 50 % of the entire docs , why to 
recreate the indexes . don't you think its redundancy ?
Can two web apps (solr instances ) share a single index file to search on it 
without interfering each other 


Regards,
JAME VAALET
Software Developer 
EXT :8108
Capital IQ


-Original Message-
From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Sent: Tuesday, July 05, 2011 5:12 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

From what you tell us, I guess a separate index for website docs would be the 
best. If you fear that request from the window service would cripple your web 
site performance, why not have a totally separated index on another server, 
and have your website documents index in both indexes ?

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 13:14
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

I have got two applications 

1. website
The website will enable any user to search the document repository , 
and the set they search on is known as website presentable
2. windows service 
The windows service will search on all the documents in the repository 
for fixed set of key words and store the found result in database.this set  
 is universal set of documents in the doc repository including the website 
presentable.


Website is a high prioritized app which should work smoothly without any 
interference , where as windows service should run all day long continuously 
without break to save result from incoming docs.
The problem here is website set is predefined and I don't want the windows 
service request to SOLR to slow down website request.

Suppose am segregating the website presentable docs index into a particular 
core and rest of them into different core will it solve the problem ?
I have also read about multiple ports for listening request from different apps 
, can this be used. 



Regards,
JAME VAALET


-Original Message-
From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Sent: Tuesday, July 05, 2011 3:52 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

The limit will always be logical if you have all documents in the same index. 
But filters are very efficient when working with subset of your index, 
especially if you reuse the same filter for many queries since there is a cache.

If your subsets are always the same subsets, maybe your could use shards. But 
we would need to know more about what you intend to do, to point to an adequate 
solution.

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 11:10
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any 
mechanism to limit this physically as well .Do we leverage time factor by using 
the range query ?

Regards,
JAME VAALET


-Original Message-
From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi 
Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote:
 Hi,
 Let say, I have got 10^10 documents in an index with unique id being document 
 id which is assigned to each of those from 1 to 10^10 .
 Now I want to search a particular query string in a subset of these documents 
 say ( document id 100 to 1000).

 The question here is.. will SOLR able to search just in this set of documents 
 rather than the entire index ? if yes what should be query to limit search 
 into this subset ?

 Regards,
 JAME VAALET
 Software Developer
 EXT :8108
 Capital IQ

Nightly builds

2011-07-05 Thread Benson Margulies

The solr download link does not point to or mention nightly builds.
Are they out there?

Re: Does Nutch make any use of solr.WhitespaceTokenizerFactory defined in schema.xml?

the answer to 2) is new IndexSchema(solrConf, schema).getAnalyzer();


On Tue, Jul 5, 2011 at 2:48 PM, Gabriele Kahlout
gabri...@mysimpatico.comwrote:

 Not yet an answer to 2) but this is where and how Solr initializes the
 Analyzer defined in the schema.xml into :

 //org.apache.solr.schema.IndexSchema
  // Load the Tokenizer
 // Although an analyzer only allows a single Tokenizer, we load a list
 to make sure
 // the configuration is ok
 //
 
 final ArrayListTokenizerFactory tokenizers = new
 ArrayListTokenizerFactory(1);
 AbstractPluginLoaderTokenizerFactory tokenizerLoader =
   new AbstractPluginLoaderTokenizerFactory( [schema.xml]
 analyzer/tokenizer, false, false )
 {
   @Override
   protected void init(TokenizerFactory plugin, Node node) throws
 Exception {
 if( !tokenizers.isEmpty() ) {
   throw new SolrException( SolrException.ErrorCode.SERVER_ERROR,
   The schema defines multiple tokenizers for: +node );
 }
 final MapString,String params =
 DOMUtil.toMapExcept(node.getAttributes(),class);
 // copy the luceneMatchVersion from config, if not set
 if (!params.containsKey(LUCENE_MATCH_VERSION_PARAM))
   params.put(LUCENE_MATCH_VERSION_PARAM,
 solrConfig.luceneMatchVersion.toString());
 plugin.init( params );
 tokenizers.add( plugin );
   }

   @Override
   protected TokenizerFactory register(String name, TokenizerFactory
 plugin) throws Exception {
 return null; // used for map registration
   }
 };
 tokenizerLoader.load( loader, (NodeList)xpath.evaluate(./tokenizer,
 node, XPathConstants.NODESET) );

 // Make sure something was loaded
 if( tokenizers.isEmpty() ) {
   throw new
 SolrException(SolrException.ErrorCode.SERVER_ERROR,analyzer without class
 or tokenizer  filter list);
 }


 // Load the Filters
 //
 
 final ArrayListTokenFilterFactory filters = new
 ArrayListTokenFilterFactory();
 AbstractPluginLoaderTokenFilterFactory filterLoader =
   new AbstractPluginLoaderTokenFilterFactory( [schema.xml]
 analyzer/filter, false, false )
 {
   @Override
   protected void init(TokenFilterFactory plugin, Node node) throws
 Exception {
 if( plugin != null ) {
   final MapString,String params =
 DOMUtil.toMapExcept(node.getAttributes(),class);
   // copy the luceneMatchVersion from config, if not set
   if (!params.containsKey(LUCENE_MATCH_VERSION_PARAM))
 params.put(LUCENE_MATCH_VERSION_PARAM,
 solrConfig.luceneMatchVersion.toString());
   plugin.init( params );
   filters.add( plugin );
 }
   }

   @Override
   protected TokenFilterFactory register(String name, TokenFilterFactory
 plugin) throws Exception {
 return null; // used for map registration
   }
 };
 filterLoader.load( loader, (NodeList)xpath.evaluate(./filter, node,
 XPathConstants.NODESET) );

 return new TokenizerChain(charFilters.toArray(new
 CharFilterFactory[charFilters.size()]),
 tokenizers.get(0), filters.toArray(new
 TokenFilterFactory[filters.size()]));
   };



 On Tue, Jul 5, 2011 at 2:26 PM, Gabriele Kahlout gabri...@mysimpatico.com
  wrote:

 I suspect the following should do (1). I'm just not sure about file
 references as in  stopInit.put(words, stopwords.txt) . (2) should
 clarify.

 1)
 class SchemaAnalyzer extends Analyzer{

 @Override
 public TokenStream tokenStream(String fieldName, Reader reader) {
 HashMapString, String stopInit = new
 HashMapString,String();
 stopInit.put(words, stopwords.txt);
 stopInit.put(ignoreCase, Boolean.TRUE.toString());
 StopFilterFactory stopFilterFactory = new StopFilterFactory();
 stopFilterFactory.init(stopInit);

 final HashMapString, String wordDelimInit = new
 HashMapString, String();
 wordDelimInit.put(generateWordParts, 1);
 wordDelimInit.put(generateNumberParts, 1);
 wordDelimInit.put(catenateWords, 1);
 wordDelimInit.put(catenateWords, 1);
 wordDelimInit.put(catenateNumbers, 1);
 wordDelimInit.put(catenateAll, 0);
 wordDelimInit.put(splitOnCaseChange, 1);

 WordDelimiterFilterFactory wordDelimiterFilterFactory = new
 WordDelimiterFilterFactory();
 wordDelimiterFilterFactory.init(wordDelimInit);
 HashMapString, String porterInit = new HashMapString,
 String();
 porterInit.put(protected, protwords.txt);
 EnglishPorterFilterFactory englishPorterFilterFactory = new
 EnglishPorterFilterFactory();
 englishPorterFilterFactory.init(porterInit);

 return new

Apache Nutch and Solr Integration

2011-07-05 Thread serenity keningston

Hello Friends,


I am a newbie to Solr and trying to integrate Apache Nutch 1.3 and Solr 3.2
. I did the steps explained in the following two URL's :

http://wiki.apache.org/nutch/RunningNutchAndSolr

http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html


I downloaded both the softwares, however, I am getting error (*solrUrl is
not set, indexing will be skipped..*) when I am trying to crawl using
Cygwin.

Can anyone please help me out to fix this issue ?
Else any other website suggesting for Apache Nutch and Solr integration
would be greatly helpful.



Thanks  Regards,
Serenity

Re: Nightly builds

2011-07-05 Thread Tom Gross


On 07/05/2011 04:08 PM, Benson Margulies wrote:

The solr download link does not point to or mention nightly builds.
Are they out there?


http://lmgtfy.com/?q=%2Bsolr+%2Bnightlybuildsl=1

--
Auther of the book Plone 3 Multimedia - http://amzn.to/dtrp0C

Tom Gross
email.@toms-projekte.de
skype.tom_gross
web.http://toms-projekte.de
blog...http://blog.toms-projekte.de

Re: Nightly builds

2011-07-05 Thread Benson Margulies

The reason for the email is not that I can't find them, but because
the project, I claim, should be advertising them more prominently on
the web site than buried in a wiki.

Where I come from, an lmgtfy link is rather hostile.


Oh, and you might want to fix the spelling of  'Author' in your own signature.


On Tue, Jul 5, 2011 at 10:19 AM, Tom Gross itconse...@gmail.com wrote:
 On 07/05/2011 04:08 PM, Benson Margulies wrote:

 The solr download link does not point to or mention nightly builds.
 Are they out there?

 http://lmgtfy.com/?q=%2Bsolr+%2Bnightlybuildsl=1

 --
 Auther of the book Plone 3 Multimedia - http://amzn.to/dtrp0C

 Tom Gross
 email.@toms-projekte.de
 skype.tom_gross
 web.http://toms-projekte.de
 blog...http://blog.toms-projekte.de

Re: MergerFactor and MaxMergerDocs effecting num of segments created

2011-07-05 Thread Shawn Heisey


On 7/4/2011 12:51 AM, Romi wrote:

Shawn when i reindex data using full-import i got:
*_0.fdt 3310
_0.fdx  23
_0.frq  857
_0.nrm  31
_0.prx  1748
_0.tis  350
_1.fdt  3310
_1.fdx  23
_1.fnm  1
_1.frq  857
_1.nrm  31
_1.prx  1748
_1.tii  5
_1.tis  350
segments.gen1
segments_3  1*

Where all  _1  marked as archived(A)

And when i run again full import(for testing ) i got _1 and 2_ files where
all 2_ marked as archive. What does it mean.
and the problem i am not getting is while i am doing full import which
deletes the old indexes and creates new than why i m getting the old one
again.


By mentioning the Archive bit, it sounds like you are running on 
Windows.  I've only run it on Linux, but I understand from reading 
messages on this list that there are a lot of problems on Windows with 
deleting old files whenever you do anything that results in old segments 
going away -- reindex, optimize, replication, normal segment merging, 
etc.  The current solr version is 3.3, previous versions are 3.2, 3.1, 
then 1.4.1.  Others will have to comment about whether things have 
improved in more recent releases.


The archive bit is simply a DOS/Windows attribute that says this file 
needs to be backed up.  When you create or modify a file in a normal 
way, it is turned on.  Normally the only thing that turns that bit off 
is backup software, but Solr might be programmed to clear it on files 
that are no longer needed, in case the delete fails, so there's a way to 
detect that they should not be backed up.  I don't know if this is 
right, it's just speculation.


Thanks,
Shawn

RE: what s the optimum size of SOLR indexes

2011-07-05 Thread Burton-West, Tom

Hello,

On Mon, 2011-07-04 at 13:51 +0200, Jame Vaalet wrote:
 What would be the maximum size of a single SOLR index file for resulting in 
 optimum search time ?

How do you define optimimum?   Do you want the fastest possible response time 
at any cost or do you have a specific response time goal? 

Can you give us more details on your use case?   What kind of load are you 
expecting?  What kind of queries do you need to support?
Some of the trade-offs depend if you are CPU bound or I/O bound.

Assuming a fairly large index, if you *absolutely need* the fastest possible 
search response time and you can *afford the hardware*, you probably want to 
shard your index and size your indexes so they can all fit in memory (and do 
some work to make sure the index data is always in memory).  If you can't 
afford that much memory, but still need very fast response times, you might 
want to size your indexes so they all fit on SSD's.  As an example of a use 
case on the opposite side of the spectrum, here at HathiTrust, we have a very 
low number of queries per second and we are running an index that totals 6 TB 
in size with shards of about 500GB and average response times of 200ms (but 
99th percentile times of about 2 seconds).

Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search

Hit Rate

2011-07-05 Thread Briggs Thompson

Hello all,

Is there a good way to get the hit count of a search?

Example query:
textField:solr AND documentId:1000

Say document with Id = 1000 has solr 13 times in the document. Any way to
extract that number [13] in the response? I know we can return the score
which is loosely related to hit counts via tf-idf, but for this case I need
the actually hit counts. I believe you can get this information from the
logs, but that is less useful if the use case is on the presentation layer.

I tried faceting on the query but it seems like that returns the number of
documents that query matches rather than the hit count.
http://localhost:8080/solr/ExampleCore/select/?q=textField%3Asolr+AND+documentId%3A1246727version=2.2start=0rows=10indent=onfacet=trueface.field=textField:solrfacet.query=http://cobra:8080/solr/BusinessDescriptionCore/select/?q=businessDescription%3Afacebook+AND+businessDescriptionId%3A1246727version=2.2start=0rows=10indent=onfacet=trueface.field=businessDescriptionQuoted:facebookfacet.query=businessDescriptionQuoted:facebook
textField:solrhttp://cobra:8080/solr/BusinessDescriptionCore/select/?q=businessDescription%3Afacebook+AND+businessDescriptionId%3A1246727version=2.2start=0rows=10indent=onfacet=trueface.field=businessDescriptionQuoted:facebookfacet.query=businessDescriptionQuoted:facebook

I was thinking that highlighting essentially returns the hit count if you
supply unlimited amount of snippets, but I imagine there must be a more
elegant solution.

Thanks in advance,
Briggs

Re: Feed index with analyzer output

2011-07-05 Thread Andrzej Bialecki


On 7/5/11 1:37 PM, Lox wrote:

Ok,

the very short question is:
Is there a way to submit the analyzer response so that solr already knows
what to do with that response? (that is, which field are to be treated as
payloads, which are tokens, etc...)


Check this issue: http://issues.apache.org/jira/browse/SOLR-1535


--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Hit Rate


 Is there a good way to get the hit count of a search?
 
 Example query:
 textField:solr AND documentId:1000
 
 Say document with Id = 1000 has solr 13 times in the
 document. Any way to
 extract that number [13] in the response? 

Looks like you are looking for term frequency info:

Two separate solutions:
http://wiki.apache.org/solr/TermVectorComponent
http://wiki.apache.org/solr/FunctionQuery#tf

Re: Apache Nutch and Solr Integration

Can you let me know when and where you were getting the error? A screen-shot
will be helpful.

On Tue, Jul 5, 2011 at 8:15 AM, serenity keningston 
serenity.kenings...@gmail.com wrote:

 Hello Friends,


 I am a newbie to Solr and trying to integrate Apache Nutch 1.3 and Solr 3.2
 . I did the steps explained in the following two URL's :

 http://wiki.apache.org/nutch/RunningNutchAndSolr


 http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html


 I downloaded both the softwares, however, I am getting error (*solrUrl is
 not set, indexing will be skipped..*) when I am trying to crawl using
 Cygwin.

 Can anyone please help me out to fix this issue ?
 Else any other website suggesting for Apache Nutch and Solr integration
 would be greatly helpful.



 Thanks  Regards,
 Serenity

primary key made of multiple fields from multiple source tables

2011-07-05 Thread Mark juszczec

Hello all

I'm using Solr 3.2 and am trying to index a document whose primary key is
built from multiple columns selected from an Oracle DB.

I'm getting the following error:

java.lang.IllegalArgumentException: deltaQuery has no column to resolve to
declared primary key pk='ordersorderline_id'
at
org.apache.solr.handler.dataimport.DocBuilder.findMatchingPkColumn(DocBuilder.java:840)
~[apache-solr-dataimporthandler-3.2.0.jar:3.2.0 1129474 - rmuir - 2011-05-30
23:09:08]
at
org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:891)
~[apache-solr-dataimporthandler-3.2.0.jar:3.2.0 1129474 - rmuir - 2011-05-30
23:09:08]
at
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:284)
~[apache-solr-dataimporthandler-3.2.0.jar:3.2.0 1129474 - rmuir - 2011-05-30
23:09:08]
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:178)
~[apache-solr-dataimporthandler-3.2.0.jar:3.2.0 1129474 - rmuir - 2011-05-30
23:09:08]
at
org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:374)
[apache-solr-dataimporthandler-3.2.0.jar:3.2.0 1129474 - rmuir - 2011-05-30
23:09:08]
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:413)
[apache-solr-dataimporthandler-3.2.0.jar:3.2.0 1129474 - rmuir - 2011-05-30
23:09:08]
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392)
[apache-solr-dataimporthandler-3.2.0.jar:3.2.0 1129474 - rmuir - 2011-05-30
23:09:08]


The deltaQuery is:

select orders.order_id || orders.order_booked_ind ||
order_line.order_line_id as ordersorderline_id, orders.order_id,
orders.order_booked_ind, order_line.order_line_id, orders.order_dt,
orders.cancel_dt, orders.account_manager_id, orders.of_header_id,
orders.order_status_lov_id, orders.order_type_id,
orders.approved_discount_pct, orders.campaign_nm,
orders.approved_by_cd,orders.advertiser_id, orders.agency_id,
order_line.accounting_comments_desc
from orders, order_line
where order_line.order_id = orders.order_id and order_line.order_booked_ind
= orders.order_booked_ind

I've just seen in the Solr Wiki Task List at
http://wiki.apache.org/solr/TaskList?highlight=%28composite%29 a Big Idea
for The Future is:

support for *composite* keys ... either with some explicit change to the
uniqueKey declaration or perhaps just copyField with some hidden magic
that concats the resulting terms into a single key Term

Does this prohibit my creating the key with the select as above?

Mark

Re: Hit Rate

2011-07-05 Thread Briggs Thompson

Yes indeed, that is what I was missing. Thanks Ahmet!

On Tue, Jul 5, 2011 at 12:48 PM, Ahmet Arslan iori...@yahoo.com wrote:


  Is there a good way to get the hit count of a search?
 
  Example query:
  textField:solr AND documentId:1000
 
  Say document with Id = 1000 has solr 13 times in the
  document. Any way to
  extract that number [13] in the response?

 Looks like you are looking for term frequency info:

 Two separate solutions:
 http://wiki.apache.org/solr/TermVectorComponent
 http://wiki.apache.org/solr/FunctionQuery#tf

Dynamic Facets

Hi, guys,

We have more than 1000 attributes scattered around 700K docs. Each doc might
have about 50 attributes. I would like Solr to return up to 20 facets for
every searches, and each search can return facets dynamically depending on
the matched docs. Anyone done that before? That'll be awesome if the facets
returned will be changed after we drill down facets.

I have looked at the following docs:
http://wiki.apache.org/solr/SimpleFacetParameters
http://www.lucidimagination.com/devzone/technical-articles/faceted-search-solr

Wondering what's the best way to accomplish that. Any advice?

Thanks,

YH

Re: After the query component has the results, can I do more filtering on them?

2011-07-05 Thread arian487

Sorry for being vague.  Okay so these scores exist on an external server and
they change often enough.  The score for each returned user is actually
dependent on the user doing the searching (if I'm making the request, and
you make the same request, the scores are different).  So what I'm doing is
getting a bunch of scores from the external and aggregating that with the
current scores solr gave in my component.  So heres the flow (all numbers
are arbitrary):

1) Get 10,000 results from solr from the query component
2) return a list of scores and ids from the external server (it'll return a
lot of them)
3) Out of this 1, I take the top 3500 docs after aggregating the
external servers scores and netcons scores.  

The problem is, the score for each doc is specific to the user making the
request.  The algorithm in doing these scores is quite complex.  I cannot
simply re-index with new scores, hence I've written this component which
runs after querycomponent and does the magic of filtering.  

I've come up with a solution but it involved me changing a lot of solr code. 
First and foremost, I've maed the queryResultCache public and developed a
small API in accessing and changing it.  I've also changed the
QueryResultKey to include a Long userId in its hashCode and equals
functions.  When a search is made, the QueryComponent caches its results,
and then in my custom component I go into that cache, get my superset,
filter it out from the scores in my external server, and throw it back into
cache.  Of course none of this happens if my custom scored stuff is already
cached, so its actually decent.  

If you have any suggestions and improvements I'd greatly appreciate it. 
Sorry for the long response...I didn't want to be an XY problem again :D

--
View this message in context: 
http://lucene.472066.n3.nabble.com/After-the-query-component-has-the-results-can-I-do-more-filtering-on-them-tp3114775p3141652.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Apache Nutch and Solr Integration

2011-07-05 Thread Markus Jelsma

You are using the crawl job so you must specify the URL to your Solr instance.

The newly updated wiki has you answer:
http://wiki.apache.org/nutch/bin/nutch_crawl

 Hello Friends,
 
 
 I am a newbie to Solr and trying to integrate Apache Nutch 1.3 and Solr 3.2
 . I did the steps explained in the following two URL's :
 
 http://wiki.apache.org/nutch/RunningNutchAndSolr
 
 http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apach
 e-solr.html
 
 
 I downloaded both the softwares, however, I am getting error (*solrUrl is
 not set, indexing will be skipped..*) when I am trying to crawl using
 Cygwin.
 
 Can anyone please help me out to fix this issue ?
 Else any other website suggesting for Apache Nutch and Solr integration
 would be greatly helpful.
 
 
 
 Thanks  Regards,
 Serenity

Re: Apache Nutch and Solr Integration

2011-07-05 Thread serenity keningston

Please find attached screenshot

On Tue, Jul 5, 2011 at 11:53 AM, Way Cool way1.wayc...@gmail.com wrote:

 Can you let me know when and where you were getting the error? A
 screen-shot
 will be helpful.

 On Tue, Jul 5, 2011 at 8:15 AM, serenity keningston 
 serenity.kenings...@gmail.com wrote:

  Hello Friends,
 
 
  I am a newbie to Solr and trying to integrate Apache Nutch 1.3 and Solr
 3.2
  . I did the steps explained in the following two URL's :
 
  http://wiki.apache.org/nutch/RunningNutchAndSolr
 
 
 
 http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html
 
 
  I downloaded both the softwares, however, I am getting error (*solrUrl is
  not set, indexing will be skipped..*) when I am trying to crawl using
  Cygwin.
 
  Can anyone please help me out to fix this issue ?
  Else any other website suggesting for Apache Nutch and Solr integration
  would be greatly helpful.
 
 
 
  Thanks  Regards,
  Serenity

Re: Custom Cache cleared after a commit?

2011-07-05 Thread arian487

Sorry for my ignorance, but do you have any lead in the code on where to look
for this?  Also, I'd still need a way of finding out how long its been in
the cache because I don't want it to regenerate every time.  I'd want it to
regenerate only if its been in the cache for less then 6 hours (or some time
frame which I deem to be good).  Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-Cache-cleared-after-a-commit-tp3136345p3141673.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Dynamic Facets

2011-07-05 Thread Erik Hatcher

YH -

One technique (that the Smithsonian employs, I believe) is a technique to index 
the field names for the attributes into a separate field, facet on that first, 
and then facet on the fields you'd like from that response in a second request 
to Solr.

There's a basic hack here so the indexing client doesn't need to add the 
fields used field: https://issues.apache.org/jira/browse/SOLR-1280

Ideally, this could all be made part of one request to Solr - and I can 
envision a pre-faceting component (post querying) to dynamically figure out the 
best fields to facet on, set those into the request context, and the rest is 
magic.

Erik



On Jul 5, 2011, at 13:15 , Way Cool wrote:

 Hi, guys,
 
 We have more than 1000 attributes scattered around 700K docs. Each doc might
 have about 50 attributes. I would like Solr to return up to 20 facets for
 every searches, and each search can return facets dynamically depending on
 the matched docs. Anyone done that before? That'll be awesome if the facets
 returned will be changed after we drill down facets.
 
 I have looked at the following docs:
 http://wiki.apache.org/solr/SimpleFacetParameters
 http://www.lucidimagination.com/devzone/technical-articles/faceted-search-solr
 
 Wondering what's the best way to accomplish that. Any advice?
 
 Thanks,
 
 YH

Re: faceting on field with two values


: I have two fields TOWN and POSTALCODE and I want to concat those two in one
: field to do faceting

As others have pointed out, copy field doesn't do a concat, it just 
adds the field values from the source field to the desc field (so with 
those two copyField/ lines you will typically get two values for each 
doc in the dest field)

if you don't wnat to go the DIH route, and you don't want to change your 
talend process, you could use a simple UpdateProcessor for this (update 
processors are used to process add/delete requests no matter what 
source the come from, before analysis happens) ... but i don't think we 
have any off the shelf Concat update processors in solr at the moment 

there is a patch for a a Script based on which might be helpful..
https://issues.apache.org/jira/browse/SOLR-1725

All of that said, based on what you've described about your usecase i 
would question from a UI standpoint wether this field would actually a 
good idea...

isn't there an extremely large number of postal codes even in a single 
city?

why not let people fact on just the town field first, and then only when 
they click on one, offer them a facet on Postal code?

Otherwise your facet UI is going to have a tendenzy to look like this...

 Gender:
   * Male  (9000 results)
   * Female  (8000 results)
 Town/Postal:
   * paris, 75016  (560 results)
   * paris, 75015  (490 results)
   * paris, 75022  (487 results)
   * boulogne sur mer 62200 (468 results)
   * paris, 75018  (465 results)
   * (click to see more)
 Color:
   * Red (900 results)
   * Blue (800 results)

...and many of your users will never find the town they are looking for 
(let alone the post code)


-Hoss

Cannot I search documents added by IndexWriter after commit?

@Test
public void testUpdate() throws IOException,
ParserConfigurationException, SAXException, ParseException {
Analyzer analyzer = getAnalyzer();
QueryParser parser = new QueryParser(Version.LUCENE_32, content,
analyzer);
Query allQ = parser.parse(*:*);

IndexWriter writer = getWriter();
IndexSearcher searcher = new IndexSearcher(IndexReader.open(writer,
true));
TopDocs docs = searcher.search(allQ, 10);
*assertEquals(0, docs.totalHits); // empty/no index*

Document doc = getDoc();
writer.addDocument(doc);
writer.commit();

docs = searcher.search(allQ, 10);
*assertEquals(1,docs.totalHits); //it fails here. docs.totalHits
equals 0*
}
What am I doing wrong here?

If I initialize searcher with new IndexSearcher(directory) I'm told:
org.apache.lucene.index.IndexNotFoundException: no segments* file found in
org.apache.lucene.store.RAMDirectory@3caa4blockFactory=org.apache.lucene.store.SingleInstanceLockFactory@ed0220c:
files: []

-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: Dynamic Facets

2011-07-05 Thread darren


You can issue a new facet search as you drill down from your UI.
You have to specify the fields you want to facet on and they can be
dynamic.

Take a look at recent threads here on taxonomy faceting for help.
Also, look here[1]

[1] http://wiki.apache.org/solr/SimpleFacetParameters

On Tue, 5 Jul 2011 11:15:51 -0600, Way Cool way1.wayc...@gmail.com
wrote:
 Hi, guys,
 
 We have more than 1000 attributes scattered around 700K docs. Each doc
 might
 have about 50 attributes. I would like Solr to return up to 20 facets
for
 every searches, and each search can return facets dynamically depending
on
 the matched docs. Anyone done that before? That'll be awesome if the
facets
 returned will be changed after we drill down facets.
 
 I have looked at the following docs:
 http://wiki.apache.org/solr/SimpleFacetParameters

http://www.lucidimagination.com/devzone/technical-articles/faceted-search-solr
 
 Wondering what's the best way to accomplish that. Any advice?
 
 Thanks,
 
 YH

Re: A beginner problem

: follow a receipe.  So I went to the the solr site, downloaded solr and
: tried to follow the tutorial.  In the  example folder of solr, using
: java -jar start.jar  I got:
: 
: 2011-07-04 13:22:38.439:INFO::Logging to STDERR via org.mortbay.log.StdErrLog
: 2011-07-04 13:22:38.893:INFO::jetty-6.1-SNAPSHOT
: 2011-07-04 13:22:38.946:INFO::Started SocketConnector@0.0.0.0:8983

if that is everything you got in the logs, then i suspect:
  a) you download a source release (ie: has *-src-* in it's name) in 
which the solr.war app has not yet been compiled)
  b) you did not run ant example to build solr and setup the example 
instance.

If i'm wrong, then yes please more details would be helpful: what exact 
URL did you download?

-Hoss

Re: Dynamic Facets

Thanks Erik and Darren.
A pre-faceting component (post querying) will be ideal as though maybe a
little performance penalty there. :-) I will try to implement one if no one
has done so.

Darren, I did look at the taxonomy faceting thread. My main concern is that
I want to have dynamic facets to be returned because I don't know what
facets I can specify as a part of query ahead of time, and there are too
many search terms. ;-)

Thanks for help.

On Tue, Jul 5, 2011 at 11:49 AM, dar...@ontrenet.com wrote:


 You can issue a new facet search as you drill down from your UI.
 You have to specify the fields you want to facet on and they can be
 dynamic.

 Take a look at recent threads here on taxonomy faceting for help.
 Also, look here[1]

 [1] http://wiki.apache.org/solr/SimpleFacetParameters

 On Tue, 5 Jul 2011 11:15:51 -0600, Way Cool way1.wayc...@gmail.com
 wrote:
  Hi, guys,
 
  We have more than 1000 attributes scattered around 700K docs. Each doc
  might
  have about 50 attributes. I would like Solr to return up to 20 facets
 for
  every searches, and each search can return facets dynamically depending
 on
  the matched docs. Anyone done that before? That'll be awesome if the
 facets
  returned will be changed after we drill down facets.
 
  I have looked at the following docs:
  http://wiki.apache.org/solr/SimpleFacetParameters
 

 http://www.lucidimagination.com/devzone/technical-articles/faceted-search-solr
 
  Wondering what's the best way to accomplish that. Any advice?
 
  Thanks,
 
  YH

Re: Cannot I search documents added by IndexWriter after commit?

2011-07-05 Thread Michael McCandless

After your writer.commit you need to reopen your searcher to see the changes.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout
gabri...@mysimpatico.com wrote:
    @Test
    public void testUpdate() throws IOException,
 ParserConfigurationException, SAXException, ParseException {
        Analyzer analyzer = getAnalyzer();
        QueryParser parser = new QueryParser(Version.LUCENE_32, content,
 analyzer);
        Query allQ = parser.parse(*:*);

        IndexWriter writer = getWriter();
        IndexSearcher searcher = new IndexSearcher(IndexReader.open(writer,
 true));
        TopDocs docs = searcher.search(allQ, 10);
 *        assertEquals(0, docs.totalHits); // empty/no index*

        Document doc = getDoc();
        writer.addDocument(doc);
        writer.commit();

        docs = searcher.search(allQ, 10);
 *        assertEquals(1,docs.totalHits); //it fails here. docs.totalHits
 equals 0*
    }
 What am I doing wrong here?

 If I initialize searcher with new IndexSearcher(directory) I'm told:
 org.apache.lucene.index.IndexNotFoundException: no segments* file found in
 org.apache.lucene.store.RAMDirectory@3caa4blockFactory=org.apache.lucene.store.SingleInstanceLockFactory@ed0220c:
 files: []

 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).

Re: Cannot I search documents added by IndexWriter after commit?

and how do you do that? There is no reopen method

On Tue, Jul 5, 2011 at 8:09 PM, Michael McCandless 
luc...@mikemccandless.com wrote:

 After your writer.commit you need to reopen your searcher to see the
 changes.

 Mike McCandless

 http://blog.mikemccandless.com

 On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout
 gabri...@mysimpatico.com wrote:
 @Test
 public void testUpdate() throws IOException,
  ParserConfigurationException, SAXException, ParseException {
 Analyzer analyzer = getAnalyzer();
 QueryParser parser = new QueryParser(Version.LUCENE_32, content,
  analyzer);
 Query allQ = parser.parse(*:*);
 
 IndexWriter writer = getWriter();
 IndexSearcher searcher = new
 IndexSearcher(IndexReader.open(writer,
  true));
 TopDocs docs = searcher.search(allQ, 10);
  *assertEquals(0, docs.totalHits); // empty/no index*
 
 Document doc = getDoc();
 writer.addDocument(doc);
 writer.commit();
 
 docs = searcher.search(allQ, 10);
  *assertEquals(1,docs.totalHits); //it fails here. docs.totalHits
  equals 0*
 }
  What am I doing wrong here?
 
  If I initialize searcher with new IndexSearcher(directory) I'm told:
  org.apache.lucene.index.IndexNotFoundException: no segments* file found
 in
  org.apache.lucene.store.RAMDirectory@3caa4blockFactory
 =org.apache.lucene.store.SingleInstanceLockFactory@ed0220c:
  files: []
 
  --
  Regards,
  K. Gabriele
 
  --- unchanged since 20/9/10 ---
  P.S. If the subject contains [LON] or the addressee acknowledges the
  receipt within 48 hours then I don't resend the email.
  subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)
   Now + 48h) ⇒ ¬resend(I, this).
 
  If an email is sent by a sender that is not a trusted contact or the
 email
  does not contain a valid code then the email is not received. A valid
 code
  starts with a hyphen and ends with X.
  ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
  L(-[a-z]+[0-9]X)).
 




-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: Cannot I search documents added by IndexWriter after commit?

2011-07-05 Thread Michael McCandless

Sorry, you must reopen the underlying IndexReader, and then make a new
IndexSearcher from the reopened reader.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jul 5, 2011 at 2:12 PM, Gabriele Kahlout
gabri...@mysimpatico.com wrote:
 and how do you do that? There is no reopen method

 On Tue, Jul 5, 2011 at 8:09 PM, Michael McCandless 
 luc...@mikemccandless.com wrote:

 After your writer.commit you need to reopen your searcher to see the
 changes.

 Mike McCandless

 http://blog.mikemccandless.com

 On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout
 gabri...@mysimpatico.com wrote:
     @Test
     public void testUpdate() throws IOException,
  ParserConfigurationException, SAXException, ParseException {
         Analyzer analyzer = getAnalyzer();
         QueryParser parser = new QueryParser(Version.LUCENE_32, content,
  analyzer);
         Query allQ = parser.parse(*:*);
 
         IndexWriter writer = getWriter();
         IndexSearcher searcher = new
 IndexSearcher(IndexReader.open(writer,
  true));
         TopDocs docs = searcher.search(allQ, 10);
  *        assertEquals(0, docs.totalHits); // empty/no index*
 
         Document doc = getDoc();
         writer.addDocument(doc);
         writer.commit();
 
         docs = searcher.search(allQ, 10);
  *        assertEquals(1,docs.totalHits); //it fails here. docs.totalHits
  equals 0*
     }
  What am I doing wrong here?
 
  If I initialize searcher with new IndexSearcher(directory) I'm told:
  org.apache.lucene.index.IndexNotFoundException: no segments* file found
 in
  org.apache.lucene.store.RAMDirectory@3caa4blockFactory
 =org.apache.lucene.store.SingleInstanceLockFactory@ed0220c:
  files: []
 
  --
  Regards,
  K. Gabriele
 
  --- unchanged since 20/9/10 ---
  P.S. If the subject contains [LON] or the addressee acknowledges the
  receipt within 48 hours then I don't resend the email.
  subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)
   Now + 48h) ⇒ ¬resend(I, this).
 
  If an email is sent by a sender that is not a trusted contact or the
 email
  does not contain a valid code then the email is not received. A valid
 code
  starts with a hyphen and ends with X.
  ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
  L(-[a-z]+[0-9]X)).
 




 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).

The OR operator in a query ?

2011-07-05 Thread duddy67

Hi all,

Someone could tell me what is the OR syntax in SOLR and how to use it in a
search query ?
I tried:  

fq=sometag:1+sometag:5
fq=sometag:[1+5]
fq=sometag:[1OR5]
fq=sometag:1+5

and many more but impossible to get what I want.


Thanks for advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/The-OR-operator-in-a-query-tp3141843p3141843.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Cannot I search documents added by IndexWriter after commit?

Still won't work (same as before).

 @Test
public void testUpdate() throws IOException,
ParserConfigurationException, SAXException, ParseException {
Analyzer analyzer = getAnalyzer();
QueryParser parser = new QueryParser(Version.LUCENE_32, content,
analyzer);
Query allQ = parser.parse(*:*);

IndexWriter writer = getWriter();
final IndexReader indexReader = IndexReader.open(writer, true);

IndexSearcher searcher = new IndexSearcher(indexReader);
TopDocs docs = searcher.search(allQ, 10);
assertEquals(0, docs.totalHits); // empty/no index

Document doc = getDoc();
writer.addDocument(doc);
writer.commit();

*indexReader.reopen();
searcher = new IndexSearcher(indexReader);
docs = searcher.search(allQ, 10);*
assertEquals(1,docs.totalHits);
}

  private Document getDoc() {
Document doc = new Document();
doc.add(new Field(id, 0, Field.Store.YES,
Field.Index.NOT_ANALYZED));
return doc;
}

 private IndexWriter getWriter() throws IOException {// 2
return new IndexWriter(directory, new WhitespaceAnalyzer(), // 2
IndexWriter.MaxFieldLength.UNLIMITED); // 2
}

On Tue, Jul 5, 2011 at 8:15 PM, Michael McCandless 
luc...@mikemccandless.com wrote:

 Sorry, you must reopen the underlying IndexReader, and then make a new
 IndexSearcher from the reopened reader.

 Mike McCandless

 http://blog.mikemccandless.com

 On Tue, Jul 5, 2011 at 2:12 PM, Gabriele Kahlout
 gabri...@mysimpatico.com wrote:
  and how do you do that? There is no reopen method
 
  On Tue, Jul 5, 2011 at 8:09 PM, Michael McCandless 
  luc...@mikemccandless.com wrote:
 
  After your writer.commit you need to reopen your searcher to see the
  changes.
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 
  On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout
  gabri...@mysimpatico.com wrote:
  @Test
  public void testUpdate() throws IOException,
   ParserConfigurationException, SAXException, ParseException {
  Analyzer analyzer = getAnalyzer();
  QueryParser parser = new QueryParser(Version.LUCENE_32,
 content,
   analyzer);
  Query allQ = parser.parse(*:*);
  
  IndexWriter writer = getWriter();
  IndexSearcher searcher = new
  IndexSearcher(IndexReader.open(writer,
   true));
  TopDocs docs = searcher.search(allQ, 10);
   *assertEquals(0, docs.totalHits); // empty/no index*
  
  Document doc = getDoc();
  writer.addDocument(doc);
  writer.commit();
  
  docs = searcher.search(allQ, 10);
   *assertEquals(1,docs.totalHits); //it fails here.
 docs.totalHits
   equals 0*
  }
   What am I doing wrong here?
  
   If I initialize searcher with new IndexSearcher(directory) I'm told:
   org.apache.lucene.index.IndexNotFoundException: no segments* file
 found
  in
   org.apache.lucene.store.RAMDirectory@3caa4blockFactory
  =org.apache.lucene.store.SingleInstanceLockFactory@ed0220c:
   files: []
  
   --
   Regards,
   K. Gabriele
  
   --- unchanged since 20/9/10 ---
   P.S. If the subject contains [LON] or the addressee acknowledges the
   receipt within 48 hours then I don't resend the email.
   subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
  time(x)
Now + 48h) ⇒ ¬resend(I, this).
  
   If an email is sent by a sender that is not a trusted contact or the
  email
   does not contain a valid code then the email is not received. A valid
  code
   starts with a hyphen and ends with X.
   ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y
 ∈
   L(-[a-z]+[0-9]X)).
  
 
 
 
 
  --
  Regards,
  K. Gabriele
 
  --- unchanged since 20/9/10 ---
  P.S. If the subject contains [LON] or the addressee acknowledges the
  receipt within 48 hours then I don't resend the email.
  subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)
   Now + 48h) ⇒ ¬resend(I, this).
 
  If an email is sent by a sender that is not a trusted contact or the
 email
  does not contain a valid code then the email is not received. A valid
 code
  starts with a hyphen and ends with X.
  ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
  L(-[a-z]+[0-9]X)).
 




-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: The OR operator in a query ?

2011-07-05 Thread Juan Grande

Hi,

This two are valid and equivalent:

   - fq=sometag:1 OR sometag:5
   - fq=sometag:(1 OR 5)

Also, beware that fq defines a filter query, which is different from a
regular query (http://wiki.apache.org/solr/CommonQueryParameters#fq). For
more details on the query syntax see
http://lucene.apache.org/java/2_4_0/queryparsersyntax.html

Regards,

*Juan*



On Tue, Jul 5, 2011 at 3:15 PM, duddy67 san...@littlemarc.com wrote:

 Hi all,

 Someone could tell me what is the OR syntax in SOLR and how to use it in a
 search query ?
 I tried:

 fq=sometag:1+sometag:5
 fq=sometag:[1+5]
 fq=sometag:[1OR5]
 fq=sometag:1+5

 and many more but impossible to get what I want.


 Thanks for advance



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/The-OR-operator-in-a-query-tp3141843p3141843.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: The OR operator in a query ?

2011-07-05 Thread duddy67

Thanks for your response. I'll check this. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/The-OR-operator-in-a-query-tp3141843p3141916.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is solrj 3.3.0 ready for field collapsing?

2011-07-05 Thread Per Newgro


Thanks for your response.

Am 05.07.2011 13:53, schrieb Erick Erickson:

Let's see the results of addingdebugQuery=on to your URL. Are you getting
any documents back at all? If not, then your query isn't getting any
documents to group.
I didn't get any docs back. But they have been in the response (I saw 
them in debugger).
But the structure had changed so that DocumentBuilder didn't brought me 
any results (getBeans()).
I investigated a bit further and found out that i had to set the 
|*group_main param to true. 
https://builds.apache.org/job/Solr-trunk/javadoc/org/apache/solr/common/params/GroupParams.html#GROUP_MAIN

Now is get results. So the answer seems to be yes :-).
*|


You haven't told us much about what you're trying to do, you might want to
review: http://wiki.apache.org/solr/UsingMailingLists

Sorry for that.


Best
Erick
On Jul 4, 2011 11:55 AM, Per Newgroper.new...@gmx.ch  wrote:


Cheers
Per

Re: A beginner problem

You can follow the links below to setup Nutch and Solr:
http://thetechietutorials.blogspot.com/2011/06/solr-and-nutch-integration.html

http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html
http://wiki.apache.org/nutch/RunningNutchAndSolr

Of course, more details will be helpful for troubleshooting your env issue.
:-)

Have fun!

On Tue, Jul 5, 2011 at 11:49 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:

 : follow a receipe.  So I went to the the solr site, downloaded solr and
 : tried to follow the tutorial.  In the  example folder of solr, using
 : java -jar start.jar  I got:
 :
 : 2011-07-04 13:22:38.439:INFO::Logging to STDERR via
 org.mortbay.log.StdErrLog
 : 2011-07-04 13:22:38.893:INFO::jetty-6.1-SNAPSHOT
 : 2011-07-04 13:22:38.946:INFO::Started SocketConnector@0.0.0.0:8983

 if that is everything you got in the logs, then i suspect:
  a) you download a source release (ie: has *-src-* in it's name) in
 which the solr.war app has not yet been compiled)
  b) you did not run ant example to build solr and setup the example
 instance.

 If i'm wrong, then yes please more details would be helpful: what exact
 URL did you download?

 -Hoss

Re: Is solrj 3.3.0 ready for field collapsing?

2011-07-05 Thread Yonik Seeley

On Mon, Jul 4, 2011 at 11:54 AM, Per Newgro per.new...@gmx.ch wrote:
 i've tried to add the params for group=true and group.field=myfield by using
 the SolrQuery.
 But the result is null. Do i have to configure something? In wiki part for
 field collapsing i couldn't
 find anything.

No specific (type-safe) support for grouping is in SolrJ currently.
But you should still have access to the complete generic solr response
via SolrJ regardless (i.e. use getResponse())

-Yonik
http://www.lucidimagination.com

Re: Cannot I search documents added by IndexWriter after commit?

Re-open doens't work, but open does.

@Test
public void testUpdate() throws IOException,
ParserConfigurationException, SAXException, ParseException {
Analyzer analyzer = getAnalyzer();
QueryParser parser = new QueryParser(Version.LUCENE_32, content,
analyzer);
Query allQ = parser.parse(*:*);

IndexWriter writer = getWriter();
final IndexReader indexReader = IndexReader.open(writer, true);

IndexSearcher searcher = new IndexSearcher(indexReader);
TopDocs docs = searcher.search(allQ, 10);
assertEquals(0, docs.totalHits); // empty/no index

Document doc = getDoc();
writer.addDocument(doc);
writer.commit();

searcher = new IndexSearcher(IndexReader.open(writer, true));//new
IndexSearcher(directory);
docs = searcher.search(allQ, 10);
assertEquals(1, docs.totalHits);
}

On Tue, Jul 5, 2011 at 8:23 PM, Gabriele Kahlout
gabri...@mysimpatico.comwrote:

 Still won't work (same as before).


  @Test
 public void testUpdate() throws IOException,
 ParserConfigurationException, SAXException, ParseException {
 Analyzer analyzer = getAnalyzer();
 QueryParser parser = new QueryParser(Version.LUCENE_32, content,
 analyzer);
 Query allQ = parser.parse(*:*);

 IndexWriter writer = getWriter();
 final IndexReader indexReader = IndexReader.open(writer, true);

 IndexSearcher searcher = new IndexSearcher(indexReader);

 TopDocs docs = searcher.search(allQ, 10);
 assertEquals(0, docs.totalHits); // empty/no index

 Document doc = getDoc();
 writer.addDocument(doc);
 writer.commit();

 *indexReader.reopen();
 searcher = new IndexSearcher(indexReader);

 docs = searcher.search(allQ, 10);
 *
 assertEquals(1,docs.totalHits);
 }

   private Document getDoc() {
 Document doc = new Document();
 doc.add(new Field(id, 0, Field.Store.YES,
 Field.Index.NOT_ANALYZED));
 return doc;
 }

  private IndexWriter getWriter() throws IOException {// 2
 return new IndexWriter(directory, new WhitespaceAnalyzer(), // 2
 IndexWriter.MaxFieldLength.UNLIMITED); // 2

 }

 On Tue, Jul 5, 2011 at 8:15 PM, Michael McCandless 
 luc...@mikemccandless.com wrote:

 Sorry, you must reopen the underlying IndexReader, and then make a new
 IndexSearcher from the reopened reader.

 Mike McCandless

 http://blog.mikemccandless.com

 On Tue, Jul 5, 2011 at 2:12 PM, Gabriele Kahlout
 gabri...@mysimpatico.com wrote:
  and how do you do that? There is no reopen method
 
  On Tue, Jul 5, 2011 at 8:09 PM, Michael McCandless 
  luc...@mikemccandless.com wrote:
 
  After your writer.commit you need to reopen your searcher to see the
  changes.
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 
  On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout
  gabri...@mysimpatico.com wrote:
  @Test
  public void testUpdate() throws IOException,
   ParserConfigurationException, SAXException, ParseException {
  Analyzer analyzer = getAnalyzer();
  QueryParser parser = new QueryParser(Version.LUCENE_32,
 content,
   analyzer);
  Query allQ = parser.parse(*:*);
  
  IndexWriter writer = getWriter();
  IndexSearcher searcher = new
  IndexSearcher(IndexReader.open(writer,
   true));
  TopDocs docs = searcher.search(allQ, 10);
   *assertEquals(0, docs.totalHits); // empty/no index*
  
  Document doc = getDoc();
  writer.addDocument(doc);
  writer.commit();
  
  docs = searcher.search(allQ, 10);
   *assertEquals(1,docs.totalHits); //it fails here.
 docs.totalHits
   equals 0*
  }
   What am I doing wrong here?
  
   If I initialize searcher with new IndexSearcher(directory) I'm told:
   org.apache.lucene.index.IndexNotFoundException: no segments* file
 found
  in
   org.apache.lucene.store.RAMDirectory@3caa4blockFactory
  =org.apache.lucene.store.SingleInstanceLockFactory@ed0220c:
   files: []
  
   --
   Regards,
   K. Gabriele
  
   --- unchanged since 20/9/10 ---
   P.S. If the subject contains [LON] or the addressee acknowledges
 the
   receipt within 48 hours then I don't resend the email.
   subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
  time(x)
Now + 48h) ⇒ ¬resend(I, this).
  
   If an email is sent by a sender that is not a trusted contact or the
  email
   does not contain a valid code then the email is not received. A valid
  code
   starts with a hyphen and ends with X.
   ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧
 y ∈
   L(-[a-z]+[0-9]X)).
  
 
 
 
 
  --
  Regards,
  K. Gabriele
 
  --- unchanged since 20/9/10 ---
  P.S. If the subject contains [LON] or the addressee acknowledges the
  receipt within 48 hours then I don't resend the email.
  subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this)

Re: Cannot I search documents added by IndexWriter after commit?

2011-07-05 Thread Robert Muir

re-open does work, but you cannot ignore its return value! see the
javadocs for an example.

On Tue, Jul 5, 2011 at 3:10 PM, Gabriele Kahlout
gabri...@mysimpatico.com wrote:
 Re-open doens't work, but open does.

 @Test
    public void testUpdate() throws IOException,
 ParserConfigurationException, SAXException, ParseException {
        Analyzer analyzer = getAnalyzer();
        QueryParser parser = new QueryParser(Version.LUCENE_32, content,
 analyzer);
        Query allQ = parser.parse(*:*);

        IndexWriter writer = getWriter();
        final IndexReader indexReader = IndexReader.open(writer, true);

        IndexSearcher searcher = new IndexSearcher(indexReader);
        TopDocs docs = searcher.search(allQ, 10);
        assertEquals(0, docs.totalHits); // empty/no index

        Document doc = getDoc();
        writer.addDocument(doc);
        writer.commit();

        searcher = new IndexSearcher(IndexReader.open(writer, true));//new
 IndexSearcher(directory);
        docs = searcher.search(allQ, 10);
        assertEquals(1, docs.totalHits);
    }

 On Tue, Jul 5, 2011 at 8:23 PM, Gabriele Kahlout
 gabri...@mysimpatico.comwrote:

 Still won't work (same as before).


  @Test
     public void testUpdate() throws IOException,
 ParserConfigurationException, SAXException, ParseException {
         Analyzer analyzer = getAnalyzer();
         QueryParser parser = new QueryParser(Version.LUCENE_32, content,
 analyzer);
         Query allQ = parser.parse(*:*);

         IndexWriter writer = getWriter();
         final IndexReader indexReader = IndexReader.open(writer, true);

         IndexSearcher searcher = new IndexSearcher(indexReader);

         TopDocs docs = searcher.search(allQ, 10);
         assertEquals(0, docs.totalHits); // empty/no index

         Document doc = getDoc();
         writer.addDocument(doc);
         writer.commit();

     *    indexReader.reopen();
         searcher = new IndexSearcher(indexReader);

         docs = searcher.search(allQ, 10);
 *
         assertEquals(1,docs.totalHits);
     }

   private Document getDoc() {
         Document doc = new Document();
         doc.add(new Field(id, 0, Field.Store.YES,
 Field.Index.NOT_ANALYZED));
         return doc;
     }

  private IndexWriter getWriter() throws IOException {            // 2
         return new IndexWriter(directory, new WhitespaceAnalyzer(), // 2
                 IndexWriter.MaxFieldLength.UNLIMITED); // 2

     }

 On Tue, Jul 5, 2011 at 8:15 PM, Michael McCandless 
 luc...@mikemccandless.com wrote:

 Sorry, you must reopen the underlying IndexReader, and then make a new
 IndexSearcher from the reopened reader.

 Mike McCandless

 http://blog.mikemccandless.com

 On Tue, Jul 5, 2011 at 2:12 PM, Gabriele Kahlout
 gabri...@mysimpatico.com wrote:
  and how do you do that? There is no reopen method
 
  On Tue, Jul 5, 2011 at 8:09 PM, Michael McCandless 
  luc...@mikemccandless.com wrote:
 
  After your writer.commit you need to reopen your searcher to see the
  changes.
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 
  On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout
  gabri...@mysimpatico.com wrote:
      @Test
      public void testUpdate() throws IOException,
   ParserConfigurationException, SAXException, ParseException {
          Analyzer analyzer = getAnalyzer();
          QueryParser parser = new QueryParser(Version.LUCENE_32,
 content,
   analyzer);
          Query allQ = parser.parse(*:*);
  
          IndexWriter writer = getWriter();
          IndexSearcher searcher = new
  IndexSearcher(IndexReader.open(writer,
   true));
          TopDocs docs = searcher.search(allQ, 10);
   *        assertEquals(0, docs.totalHits); // empty/no index*
  
          Document doc = getDoc();
          writer.addDocument(doc);
          writer.commit();
  
          docs = searcher.search(allQ, 10);
   *        assertEquals(1,docs.totalHits); //it fails here.
 docs.totalHits
   equals 0*
      }
   What am I doing wrong here?
  
   If I initialize searcher with new IndexSearcher(directory) I'm told:
   org.apache.lucene.index.IndexNotFoundException: no segments* file
 found
  in
   org.apache.lucene.store.RAMDirectory@3caa4blockFactory
  =org.apache.lucene.store.SingleInstanceLockFactory@ed0220c:
   files: []
  
   --
   Regards,
   K. Gabriele
  
   --- unchanged since 20/9/10 ---
   P.S. If the subject contains [LON] or the addressee acknowledges
 the
   receipt within 48 hours then I don't resend the email.
   subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
  time(x)
Now + 48h) ⇒ ¬resend(I, this).
  
   If an email is sent by a sender that is not a trusted contact or the
  email
   does not contain a valid code then the email is not received. A valid
  code
   starts with a hyphen and ends with X.
   ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧
 y ∈
   L(-[a-z]+[0-9]X)).
  
 
 
 
 
  --
  Regards,
  K. Gabriele
 
  --- unchanged since 20/9/10 ---
  P.S. If the

Re: Apache Nutch and Solr Integration

2011-07-05 Thread serenity keningston

On Tue, Jul 5, 2011 at 1:11 PM, Way Cool way1.wayc...@gmail.com wrote:

 Sorry, Serenity, somehow I don't see the attachment.

 On Tue, Jul 5, 2011 at 11:23 AM, serenity keningston 
 serenity.kenings...@gmail.com wrote:

  Please find attached screenshot
 
 
  On Tue, Jul 5, 2011 at 11:53 AM, Way Cool way1.wayc...@gmail.com
 wrote:
 
  Can you let me know when and where you were getting the error? A
  screen-shot
  will be helpful.
 
  On Tue, Jul 5, 2011 at 8:15 AM, serenity keningston 
  serenity.kenings...@gmail.com wrote:
 
   Hello Friends,
  
  
   I am a newbie to Solr and trying to integrate Apache Nutch 1.3 and
 Solr
  3.2
   . I did the steps explained in the following two URL's :
  
   http://wiki.apache.org/nutch/RunningNutchAndSolr
  
  
  
 
 http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html
  
  
   I downloaded both the softwares, however, I am getting error (*solrUrl
  is
   not set, indexing will be skipped..*) when I am trying to crawl using
   Cygwin.
  
   Can anyone please help me out to fix this issue ?
   Else any other website suggesting for Apache Nutch and Solr
 integration
   would be greatly helpful.
  
  
  
   Thanks  Regards,
   Serenity

Re: Solr vs Hibernate Search (Huge number of DB DMLs)

2011-07-05 Thread fire fox

Please suggest..

On Mon, Jul 4, 2011 at 10:37 PM, fire fox fyr3...@gmail.com wrote:

 From my exploration so far, I understood that we can opt Solr straightaway
 if the index changes are kept to minimal. However, mine is absolutely the
 opposite. I'm still vague about the perfect solution for the scenario
 mentioned.

 Please share..


 On Mon, Jul 4, 2011 at 6:28 PM, fire fox fyr3...@gmail.com wrote:

 Hi all,
There were several places I could find a discussion on this but I
 failed to find the suited one for me.

 I'd like to be clear on my requirements, so that you may suggest me the
 better solution.

 - A project deals with tons of database tables (with *millions *of
 records) out of which some are to be indexed which should be searchable
 of-course. It uses Hibernate for MySQL transactions.

  As per my knowledge, there could be two solutions to maintain sync
 between index and database effectively.

 -- There'd be a *huge number of transactions (DMLs) on the DB*, so I'm
 wondering which one of the following will be able to handle it
 effectively.

  1) Configure *Solr *server, query it to search / send events to update.
 This might be better than handling Lucene solely which provides index
 read/write and load balancing. The problem here could be to implement 
 maintain sync between index and DB with no lag as the updations (DMLs on DB)
 are very frequent. Too many events to be sent!

  2) Using *Hibernate Search*. I'm just wondering about its 
 *performance*considering high volume of transactions on DB every minute.

 Please suggest.

 Thanks in advance.

Can I invert the inverted index?

Hello,

With an inverted index the term is the key, and the documents are the
values. Is it still however possible that given a document id I get the
terms indexed for that document?

-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: Can I invert the inverted index?

2011-07-05 Thread lboutros

Hi Gabriele,

I'm not sure to understand your problem, but the TermVectorComponent may fit
your needs ?

http://wiki.apache.org/solr/TermVectorComponent
http://wiki.apache.org/solr/TermVectorComponentExampleEnabled

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-I-invert-the-inverted-index-tp3142206p3142269.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: what s the optimum size of SOLR indexes

2011-07-05 Thread arian487

It depends on how many queries you'd be making per second.  I know for us, I
have a gradient of index sizes.  The first machine, which gets hit most
often is about 2.5 gigs.  Most of the queries would only ever need to hit
this index but then I have a bigger indices of about 5-10 gigs each which
are slower, but don't get queried as often so I can afford them to be a
little slower (and hence the bigger index)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-s-the-optimum-size-of-SOLR-indexes-tp3137314p3142309.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can I invert the inverted index?

I had looked an term vectors but don't understand them to solve my problem.
Consider the following index entries:

t0, doc0, doc1
t1, doc0

From the 2nd entry we know that t1 is only present in doc0.
Now, my problem, given doc0 how can I know which terms occur in in (t0 and
t1) (without storing the content)?
One way is go over all terms in the index using the term dictionary.


On Tue, Jul 5, 2011 at 10:14 PM, lboutros boutr...@gmail.com wrote:

 Hi Gabriele,

 I'm not sure to understand your problem, but the TermVectorComponent may
 fit
 your needs ?

 http://wiki.apache.org/solr/TermVectorComponent
 http://wiki.apache.org/solr/TermVectorComponentExampleEnabled

 Ludovic.

 -
 Jouve
 France.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Can-I-invert-the-inverted-index-tp3142206p3142269.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: Can I invert the inverted index?

2011-07-05 Thread Rob Casson

sounds like the Luke request handler will get what you're after:

 http://wiki.apache.org/solr/LukeRequestHandler
 http://wiki.apache.org/solr/LukeRequestHandler#id

cheers,
rob

On Tue, Jul 5, 2011 at 3:59 PM, Gabriele Kahlout
gabri...@mysimpatico.com wrote:
 Hello,

 With an inverted index the term is the key, and the documents are the
 values. Is it still however possible that given a document id I get the
 terms indexed for that document?

 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).

Re: Can I invert the inverted index?

2011-07-05 Thread Erick Erickson

You can do this, kind of, but it's a lossy process. Consider indexing
the cat in the hat strikes back, with the, in being stopwords and
strikes getting stemmed to strike. At very best, you can reconstruct
that the original doc contained cat, hat, strike, back. Is
that sufficient?

And it's a very expensive process.

What is the problem you're trying to solve? Perhaps there are other ways
to get what you need.

Best
Erick

On Tue, Jul 5, 2011 at 4:22 PM, Gabriele Kahlout
gabri...@mysimpatico.com wrote:
I had looked an term vectors but don't understand them to solve my problem.
Consider the following index entries:

t0, doc0, doc1
t1, doc0

From the 2nd entry we know that t1 is only present in doc0.
Now, my problem, given doc0 how can I know which terms occur in in (t0 and
t1) (without storing the content)?
One way is go over all terms in the index using the term dictionary.

On Tue, Jul 5, 2011 at 10:14 PM, lboutros boutr...@gmail.com wrote:

Hi Gabriele,

I'm not sure to understand your problem, but the TermVectorComponent may
fit
your needs ?

http://wiki.apache.org/solr/TermVectorComponent
http://wiki.apache.org/solr/TermVectorComponentExampleEnabled

Ludovic.

-
Jouve
France.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Can-I-invert-the-inverted-index-tp3142206p3142269.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: Using FieldCache in SolrIndexSearcher - crazy idea?


: Correct me if I am wrong:  In a standard distributed search with 
: QueryComponent, the first query sent to the shards asks for 
: fl=myUniqueKey or fl=myUniqueKey,score.  When the response is being 
: generated to send back to the coordinator, SolrIndexSearcher.doc (int i, 
: SetString fields) is called for each document.  As I understand it, 
: this will read each document from the index _on disk_ and retrieve the 
: myUniqueKey field value for each document.
: 
: My idea is to have a FieldCache for the myUniqueKey field in 
: SolrIndexSearcher (or somewhere else?) that would be used in cases where 
: the only field that needs to be retrieved is myUniqueKey.  Is this 
: something that would improve performance?

Quite probably ... you typically can't assume that a FieldCache can be 
constructed for *any* field, but it should be a safe assumption for the 
uniqueKey field, so for that initial request of the mutiphase distributed 
search it's quite possible it would speed things up.

if you want to try this and report back results, i'm sure a lot of people 
would be interested in a patch ... i would guess the best place to make 
the chance would be in the QueryComponent so thta it used the FieldCache 
(probably best to do it via getValueSource() on the uniqueKey's 
SchemaField) to put the ids in teh response instead of using a 
SolrDocList.

Hmm, actually...

there's no reason why this kind of optimization would need to be specific 
to distributed queries, it could be done by the ResponseWriters directly 
-- if the field list they are being asked to return only contains the 
uniqueKeyField and computed values (like score) then don't bother calling 
SolrIndexSearcher.doc at all ... the only hitch is that with distributed 
search and using function values as psuedo fields and what not there are 
more places calling SolrIndexSearcher.doc then their use to be ... so 
maybe putting this change directly into SolrIndexSearcher.doc would make 
the most sense?



-Hoss

Re: A beginner problem

2011-07-05 Thread carmmello


Thank you for your answer. I downloaded solr from the link you sugested
and now it is ok, I can see the administration page.  But it is strange
that a download from the solr site does not work. Tanks also to Way Cool.




 I don't know why, but it happened the same to me in the past (with
 3.2). Apparently the zip I downloaded was not correct. I think you have to
 have a solr.war file on the webapps directory, do you have it?
 Do you know which version of Solr you downloaded?
 Download this one:
 http://apache.dattatec.com/lucene/solr/3.3.0/apache-solr-3.3.0.zip
 I just tried it and it's there.

 On Mon, Jul 4, 2011 at 1:49 PM, carmme...@qualidade.info wrote:

 I use nutch, as a search engine.  Until now nutch did the crawl and the
 search functions.  The newest version, however, delegated the search to
 solr. I don't know almost nothing about programming, but i'm able to
 follow a receipe.  So I went to the the solr site, downloaded solr and
 tried to follow the tutorial.  In the  example folder of solr, using
 java -jar start.jar  I got:

 2011-07-04 13:22:38.439:INFO::Logging to STDERR via
 org.mortbay.log.StdErrLog
 2011-07-04 13:22:38.893:INFO::jetty-6.1-SNAPSHOT
 2011-07-04 13:22:38.946:INFO::Started SocketConnector@0.0.0.0:8983

 When I tried  to go to http://localhost:8983/solr/admin/  I got:

 HTTP ERROR: 404
 Problem accessing /solr/admin/. Reason:
 NOT_FOUND

 Can someone help me with this?

 Tanks

Re: ampersand, dismax, combining two fields, one of which is keywordTokenizer

: Maybe what I really need is a query parser that does not do disjunction
: maximum at all, but somehow still combines different 'qf' type fields with
: different boosts on each field. I personally don't _neccesarily_ need the
: actual disjunction max calculation, but I do need combining of mutiple
: fields with different boosts. Of course, I'm not sure exactly how it would
: combine multiple fields if not disjunction maximum, but perhaps one is
: conceivable that wouldn't be subject to this particular gotcha with differing
: analysis.

you can sort of do that today, something like this should work...

q = _query_:$q1^100 _query_:$q2^10 _query_:$q3^5 _query_:$q4
q1 = {!lucene df=title v=$qq}
q2 = {!lucene df=summary v=$qq}
q3 = {!lucene df=author v=$qq}
q4 = {!lucene df=body v=$qq}
qq = ...user input here...

..but you might want to replace lucene with field depending on what
metacharacters you want to support.

in general though the reason i wrote the dismax parser (instead of a
parser that works like this) is because of how a multiword queries wind up
matching/scoring. A guy named Chuck Williams wrote the earliest
versoin of the DisjunctionMaxQuery class and his albino elephant
example totally sold me on this approach back in 2005...

http://www.lucidimagination.com/search/document/8ce795c4b6752a1f/contribution_better_multi_field_searching
https://issues.apache.org/jira/browse/LUCENE-323

: I also remain kind of confused about how the existing dismax figures out how
: many terms for the 'mm' type calculations. If someone wanted to explain that,
: I would find it enlightening and helpful for understanding what's going on.

it's not really about terms -- it's just the total number of clauses in
the outer BooleanQuery that it builds. if a chunk of input produces a
valid DisjunctionMaxQuery (because the analyzer for at least one qf field
generated tokens) then that's a clause, if a chunk of input doesn't
produce a token (because none of hte analyzers from any of the qf ields
generated tokens) then that's not a clause.

-Hoss

Re: CopyField into another CopyField?


: In solr, is it possible to 'chain' copyfields so that you can copy the value
: of one into another?
...
: copyField source=name dest=autocomplete /
: copyField source=autocomplete dest=ac_spellcheck /
: 
: Point being, every time I add a new field to the autocomplete, I want it to
: automatically also be added to ac_spellcheck without having to do it twice.

Sorry no, the IndexSchema won't recursively apply copyFields.  As i 
recall it was implemented this way partly for simplicity, and largly to 
protect people against the risk of infinite loops.  we could probably 
make a more sophisticated impl that detects infinite loops, but that check 
would slow things down and all solr could really do with it is throw an 
error.


-Hoss

Re: Using FieldCache in SolrIndexSearcher - crazy idea?

2011-07-05 Thread Yonik Seeley

On Tue, Jul 5, 2011 at 5:13 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
 : Correct me if I am wrong:  In a standard distributed search with
 : QueryComponent, the first query sent to the shards asks for
 : fl=myUniqueKey or fl=myUniqueKey,score.  When the response is being
 : generated to send back to the coordinator, SolrIndexSearcher.doc (int i,
 : SetString fields) is called for each document.  As I understand it,
 : this will read each document from the index _on disk_ and retrieve the
 : myUniqueKey field value for each document.
 :
 : My idea is to have a FieldCache for the myUniqueKey field in
 : SolrIndexSearcher (or somewhere else?) that would be used in cases where
 : the only field that needs to be retrieved is myUniqueKey.  Is this
 : something that would improve performance?

 Quite probably ... you typically can't assume that a FieldCache can be
 constructed for *any* field, but it should be a safe assumption for the
 uniqueKey field, so for that initial request of the mutiphase distributed
 search it's quite possible it would speed things up.

Ah, thanks Hoss - I had meant to respond to the original email, but
then I lost track of it.

Via pseudo-fields, we actually already have the ability to retrieve
values via FieldCache.
fl=id:{!func}id

But using CSF would probably be better here - no memory overhead for
the FieldCache entry.

-Yonik
http://www.lucidimagination.com



 if you want to try this and report back results, i'm sure a lot of people
 would be interested in a patch ... i would guess the best place to make
 the chance would be in the QueryComponent so thta it used the FieldCache
 (probably best to do it via getValueSource() on the uniqueKey's
 SchemaField) to put the ids in teh response instead of using a
 SolrDocList.

 Hmm, actually...

 there's no reason why this kind of optimization would need to be specific
 to distributed queries, it could be done by the ResponseWriters directly
 -- if the field list they are being asked to return only contains the
 uniqueKeyField and computed values (like score) then don't bother calling
 SolrIndexSearcher.doc at all ... the only hitch is that with distributed
 search and using function values as psuedo fields and what not there are
 more places calling SolrIndexSearcher.doc then their use to be ... so
 maybe putting this change directly into SolrIndexSearcher.doc would make
 the most sense?



 -Hoss

Re: Using FieldCache in SolrIndexSearcher - crazy idea?

2011-07-05 Thread Ryan McKinley


 Ah, thanks Hoss - I had meant to respond to the original email, but
 then I lost track of it.

 Via pseudo-fields, we actually already have the ability to retrieve
 values via FieldCache.
 fl=id:{!func}id

 But using CSF would probably be better here - no memory overhead for
 the FieldCache entry.


Not sure if this is related, but we should also consider using the
memory codec for id field
https://issues.apache.org/jira/browse/LUCENE-3209

Re: Is solrj 3.3.0 ready for field collapsing?

2011-07-05 Thread Ryan McKinley

patches are always welcome!


On Tue, Jul 5, 2011 at 3:04 PM, Yonik Seeley yo...@lucidimagination.com wrote:
 On Mon, Jul 4, 2011 at 11:54 AM, Per Newgro per.new...@gmx.ch wrote:
 i've tried to add the params for group=true and group.field=myfield by using
 the SolrQuery.
 But the result is null. Do i have to configure something? In wiki part for
 field collapsing i couldn't
 find anything.

 No specific (type-safe) support for grouping is in SolrJ currently.
 But you should still have access to the complete generic solr response
 via SolrJ regardless (i.e. use getResponse())

 -Yonik
 http://www.lucidimagination.com

Re: configure dismax requesthandlar for boost a field