help: DIH and Multivalue does not work

2010-12-23 Thread lun zhong
Hi, Solr gurus:

I am totally new to solr and hope somebody can help me on this.

I have multivalue field channel in my schema:

field name=channel type=string indexed=true stored=true
multivalue=true/

and the DIH config:

entity name=autocomplete query=select * from view_autocomplete
transformer=RegexTransformer
field column=channel sourceColcolumn=channels splitBy=,/
/entity

the DIH full-import is Ok with other fields, but channel has nothing (solr
query: q=*:*).

If i remove field column=channel sourceColcolumn=channels splitBy=,/
or make my db returns channel column, channel would have the correct data,
i.e. a,b,c but problem is a,b,c is no more multivalue but single value.

I am using postgresql 9 and the sql is pretty simple: select ...,
'a,b,c'::text AS channels from table-xxx; I did try out solr 1.4.1 and 3.1
snapshot from svn, both having the same problem.

Thanks.


Re: help: DIH and Multivalue does not work

2010-12-23 Thread Ahmet Arslan
There are several typos,

multiValued=true = multivalue
sourceColName  = sourceColcolumn

By the way, after you correct those and restart tomcat, you can debug on 
/admin/dataimport.jsp


--- On Thu, 12/23/10, lun zhong zhong...@gmail.com wrote:

 From: lun zhong zhong...@gmail.com
 Subject: help: DIH and Multivalue does not work
 To: solr-user@lucene.apache.org
 Date: Thursday, December 23, 2010, 10:44 AM
 Hi, Solr gurus:
 
 I am totally new to solr and hope somebody can help me on
 this.
 
 I have multivalue field channel in my schema:
 
 field name=channel type=string indexed=true
 stored=true
 multivalue=true/
 
 and the DIH config:
 
 entity name=autocomplete query=select * from
 view_autocomplete
 transformer=RegexTransformer
             field
 column=channel sourceColcolumn=channels
 splitBy=,/
         /entity
 
 the DIH full-import is Ok with other fields, but channel
 has nothing (solr
 query: q=*:*).
 
 If i remove field column=channel
 sourceColcolumn=channels splitBy=,/
 or make my db returns channel column, channel would have
 the correct data,
 i.e. a,b,c but problem is a,b,c is no more multivalue
 but single value.
 
 I am using postgresql 9 and the sql is pretty simple:
 select ...,
 'a,b,c'::text AS channels from table-xxx; I did try out
 solr 1.4.1 and 3.1
 snapshot from svn, both having the same problem.
 
 Thanks.
 





Re: help: DIH and Multivalue does not work

2010-12-23 Thread lun zhong
Yup, it is working, thanks so much.

On Thu, Dec 23, 2010 at 4:55 PM, Ahmet Arslan iori...@yahoo.com wrote:

 There are several typos,

 multiValued=true = multivalue
 sourceColName  = sourceColcolumn

 By the way, after you correct those and restart tomcat, you can debug on
 /admin/dataimport.jsp


 --- On Thu, 12/23/10, lun zhong zhong...@gmail.com wrote:

  From: lun zhong zhong...@gmail.com
  Subject: help: DIH and Multivalue does not work
  To: solr-user@lucene.apache.org
  Date: Thursday, December 23, 2010, 10:44 AM
  Hi, Solr gurus:
 
  I am totally new to solr and hope somebody can help me on
  this.
 
  I have multivalue field channel in my schema:
 
  field name=channel type=string indexed=true
  stored=true
  multivalue=true/
 
  and the DIH config:
 
  entity name=autocomplete query=select * from
  view_autocomplete
  transformer=RegexTransformer
  field
  column=channel sourceColcolumn=channels
  splitBy=,/
  /entity
 
  the DIH full-import is Ok with other fields, but channel
  has nothing (solr
  query: q=*:*).
 
  If i remove field column=channel
  sourceColcolumn=channels splitBy=,/
  or make my db returns channel column, channel would have
  the correct data,
  i.e. a,b,c but problem is a,b,c is no more multivalue
  but single value.
 
  I am using postgresql 9 and the sql is pretty simple:
  select ...,
  'a,b,c'::text AS channels from table-xxx; I did try out
  solr 1.4.1 and 3.1
  snapshot from svn, both having the same problem.
 
  Thanks.
 






Re: Solr index directory '/solr/data/index' doesn't exist. Creating new index... on Geronimo

2010-12-23 Thread Bac Hoang
Just to share with solr community that the problem has been resolved in 
a simple way: move the solr/data/index out of the /opt/dev/config


The root cause is permission. It seems Geronimo doesn't allow write 
permission to /opt/dev/config and its sub-folders


Cheers,
Bac Hoang



On 12/22/2010 6:25 PM, Bac Hoang wrote:

Hello Erick,

Could you kindly give a hand on my problem. Any ideas, hints, 
suggestions are highly appreciated. Many thanks


1. The problem: Solr index directory '/solr/data/index' doesn't exist. 
Creating new index...

2. Some other info.:

- use the solr example 1.4.1
- Geronimo 2.1.6
- solr home: /opt/dev/config/solr
- dataDir: /opt/dev/config/solr/data/index. I set the read, write 
right to every and each folder, from opt, dev...to the last one, index 
(just for sure ;) )

- lockType:
- single/ simple: Cannot create directory: /solr/data/index at 
org.apache.solr.core.SolrCore.initIndex(SolrCore.java:397)
- native: Cannot create directory: /solr/data/index at 
org.apache.lucene.store.NativeFSLockFactory.acquireTestLock


- the Geronimo log:
===
2010-12-22 15:13:03,001 INFO  [SupportedModesServiceImpl] Portlet mode 
'edit' not found for portletId: '/console-base.WARModules!874780194|0'
2010-12-22 15:13:03,001 INFO  [SupportedModesServiceImpl] Portlet mode 
'help' not found for portletId: '/console-base.WARModules!874780194|0'
2010-12-22 15:13:07,941 INFO  [DirectoryMonitor] Hot deployer notified 
that an artifact was removed: default/solr2/1293005281314/war
2010-12-22 15:13:09,148 INFO  [SupportedModesServiceImpl] Portlet mode 
'edit' not found for portletId: '/console-base.WARModules!874780194|0'
2010-12-22 15:13:09,148 INFO  [SupportedModesServiceImpl] Portlet mode 
'help' not found for portletId: '/console-base.WARModules!874780194|0'
2010-12-22 15:13:14,139 INFO  [SupportedModesServiceImpl] Portlet mode 
'edit' not found for portletId: '/plugin.Deployment!227983155|0'
2010-12-22 15:13:18,795 WARN  [TomcatModuleBuilder] Web application . 
does not contain a WEB-INF/geronimo-web.xml deployment plan.  This may 
or may not be a problem, depending on whether you have things like 
resource references that need to be resolved.  You can also give the 
deployer a separate deployment plan file on the command line.
2010-12-22 15:13:19,040 INFO  [SolrResourceLoader] Using JNDI 
solr.home: /opt/dev/config/solr
2010-12-22 15:13:19,040 INFO  [SolrResourceLoader] Solr home set to 
'/opt/dev/config/solr/'
2010-12-22 15:13:19,051 INFO  [SolrDispatchFilter] 
SolrDispatchFilter.init()


2010-12-22 15:13:19,462 INFO  [IndexSchema] default search field is text
2010-12-22 15:13:19,463 INFO  [IndexSchema] query parser default 
operator is OR

2010-12-22 15:13:19,464 INFO  [IndexSchema] unique key field: id
2010-12-22 15:13:19,490 INFO  [JmxMonitoredMap] JMX monitoring is 
enabled. Adding Solr mbeans to JMX Server: 
com.sun.jmx.mbeanserver.jmxmbeanser...@144752d
2010-12-22 15:13:19,525 INFO  [SolrCore] Added SolrEventListener: 
org.apache.solr.core.QuerySenderListener{queries=[]}
2010-12-22 15:13:19,525 INFO  [SolrCore] Added SolrEventListener: 
org.apache.solr.core.QuerySenderListener{queries=[{q=solr 
rocks,start=0,rows=10}, {q=static firstSearcher warming query from 
solrconfig.xml}]}
2010-12-22 15:13:19,533 WARN  [SolrCore] Solr index directory 
'/solr/data/index' doesn't exist. Creating new index...
2010-12-22 15:13:19,599 ERROR [SolrDispatchFilter] Could not start 
SOLR. Check solr/home property
java.lang.RuntimeException: java.io.IOException: Cannot create 
directory: /solr/data/index

at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:397)
at org.apache.solr.core.SolrCore.init(SolrCore.java:545)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)

...
2010-12-22 15:13:19,601 INFO  [SolrDispatchFilter] 
SolrDispatchFilter.init() done

2010-12-22 15:13:19,601 INFO  [SolrServlet] SolrServlet.init()
2010-12-22 15:13:19,602 INFO  [SolrResourceLoader] Using JNDI 
solr.home: /opt/dev/config/solr

2010-12-22 15:13:19,602 INFO  [SolrServlet] SolrServlet.init() done
2010-12-22 15:13:19,606 INFO  [SolrResourceLoader] Using JNDI 
solr.home: /opt/dev/config/solr
2010-12-22 15:13:19,606 INFO  [SolrUpdateServlet] 
SolrUpdateServlet.init() done
2010-12-22 15:13:19,721 INFO  [SupportedModesServiceImpl] Portlet mode 
'edit' not found for portletId: '/plugin.Deployment!227983155|0'


===

With regards,
Bac Hoang




error in html???

2010-12-23 Thread satya swaroop
Hi All,

 I am able to get the response in the success case in json format by
stating wt=json in the query. But as in case if any errors i am geting in
html format.
 1) Is there any specified reason to get in html format??
  2)cant we get the error result in json format??

Regards,
satya


Re: Configuration option for disableReplication

2010-12-23 Thread Francis Rhys-Jones
Hi,

Were running a cloud based cluster of servers and its not that easy to
get a list of the current slaves. Since my problem is only around the
restart/redeployment of the master it seems an unnecessary
complication to have to start interacting with slaves as part of the
scripts that do this.

As you say there seems to be a proliferation of features you can
enable and disable for the replication handler. Setting enabled=false
for the master turns off all the features relating the the instance
being a master. This is slightly different to the calling the
'disablereplication' command, which simply causes the 'indexversion'
command to return 0 which effectively stops the slaves from knowing if
there is a new version and hence trying to replicate it.

Im not entirely clear whether this distinction is actually a useful
one, combining them would be a fairly reasonable re factoring of the
update handler, and would probably have an affect on backwards
compatibility.

Having the replicateAfter parameter set to just 'commit' (ie not on
start up) has a similar affect to the 'disablereplication' command
until you do the first commit after startup. So this is a workable
solution for me, as the the process that pushes updates and commits to
the index can also check and swap the cores before it does any work.

However it feels like a bit of a tenuous way of disabling replication,
particularly as there is an explicit mechanism for doing so, its just
not configurable on startup.

I have a patch, I was looking for a bit of feedback as to whether I
should submit it.

Thanks,

Francis



On 22 December 2010 21:30, Upayavira u...@odoko.co.uk wrote:
 I've just done a bit of playing here, because I've spent a lot of time
 reading the SolrReplication wiki page[1], and have often wondered how
 some features interact.

 Unfortunately, if you specify str name=enablefalse/str in your
 replication request handler for your master, you cannot re-enable it
 with a call to /solr/replication?command=enablereplication

 Therefore, it would seem your best bet is to call
 /solr/replication?command=disablepolling on all of your slaves prior to
 upgrading. Then, when you're sure everything is right, call
 /solr/replication?command=enablepolling on each slave, and you should be
 good to go.

 I tried this, watching the request log on my master, and the incoming
 replication requests did actually stop due to the disablepolling
 command, so you should be fine with this approach.

 Does this get you to where you want to be?

 Upayavira

 On Wed, 22 Dec 2010 17:10 +, Francis Rhys-Jones
 francis.rhys-jo...@guardian.co.uk wrote:
 Hi,

 I am looking into using a multi core configuration to allow us to
 fully rebuild our index while still applying updates.

 I have two cores main-core and rebuild-core. I push the whole dataset
 into the rebuild core, during which time I can happily keep pushing
 updates into the main-core. Once the rebuild is complete I swap the
 cores and delete *:* from the rebuild core.

 This works fine however there are a couple of edge cases:

 On server restart solr needs to remember which core has been swapped
 in to be the main core, this can be solved by adding the
 persistent=true attribute to the solr config, however this does
 require the solr.xml to be writeable.

 While deploying a new version of our application we overwrite the
 solr.xml, as the new version could potentially have legitimate changes
 to the solr.xml that need to be rolled out, again leaving the cores
 out of sync.

 My proposed solution is to have the indexing process do some sanity
 checking at the start of each run, and swap in the correct core if
 necessary.

 This works however there is the potential for the slaves to start
 replicating the empty index before the correct index is swapped in.

 To get round this problem I would like to have replication disabled on
 start up.

 Removing  replicateAfter=startup has this affect but it would be more
 future proof to be able to specify a default for the
 replicationEnabled field (see SOLR-1175) in the ReplcationHandler,
 stopping replication until I explicitly turn it on.

 The change looks fairly simple.
 ---
 Enterprise Search Consultant at Sourcesense UK,
 Making Sense of Open Source


Please consider the environment before printing this email.
--
Visit guardian.co.uk - newspaper website of the year
www.guardian.co.uk  www.observer.co.uk

To save up to 33% when you subscribe to the Guardian and the Observer
visit http://www.guardian.co.uk/subscriber

-

This e-mail and all attachments are confidential and may also
be privileged. If you are not the named recipient, please notify
the sender and delete the e-mail and all attachments immediately.
Do not disclose the contents to another person. You may not use
the information for any purpose, or store, or copy, it in any 

Item catagorization problem.

2010-12-23 Thread Hasnain

Hi all, 

   I am using solr in my web application for search purposes. However, i
am having a problem with the default behaviour of the solr search. 

From my understanding, if i query for a keyword, let's say Laptop,
preference is given to result rows having more occurences of the search
keyword Laptop in the field name. This, however, is producing
undesirable scenarios, for example: 

1. I index an item A with name value Sony Laptop. 
2. I index another item B with name value: Laptop bags for laptops. 
3. I search for the keyword Laptop 

According to the default behaviour, precedence would be given to item B
since the keyword appears more times in the name field for that item. 

In my schema, i have another field by the name of Category and, for
example's sake, let's assume that my application supports only two
categories: computers and accessories. Now, what i require is a mechanism to
assign correct categories to the items during item indexing so that this
field can be used to better filter the search results, item A would belong
to Computer category and item B would belong to Accessories category. So
then, searching for Laptop would only look for items in the Computers
category and return item A only. 

I would like to point out here that setting the category field manually is
not an option since the data might be in the vicinity of thousands of
records. I am not asking for an in-depth algorithm. Just a high level design
would be sufficient to set me in the right direction. 

thanks.  


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Item-catagorization-problem-tp2136415p2136415.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Configuration option for disableReplication

2010-12-23 Thread Upayavira
Having played with it, I can see that it would be extremely useful to be
able to disable replication in the solrconfig.xml, and then enable it
with a URL.

So, as to your patch, I'd say yes, submit it. But do try to make it
backwards compatible. It'll make it much more likely to get accepted.

Upayavira

On Thu, 23 Dec 2010 12:12 +, Francis Rhys-Jones
francis.rhys-jo...@guardian.co.uk wrote:
 Hi,
 
 Were running a cloud based cluster of servers and its not that easy to
 get a list of the current slaves. Since my problem is only around the
 restart/redeployment of the master it seems an unnecessary
 complication to have to start interacting with slaves as part of the
 scripts that do this.
 
 As you say there seems to be a proliferation of features you can
 enable and disable for the replication handler. Setting enabled=false
 for the master turns off all the features relating the the instance
 being a master. This is slightly different to the calling the
 'disablereplication' command, which simply causes the 'indexversion'
 command to return 0 which effectively stops the slaves from knowing if
 there is a new version and hence trying to replicate it.
 
 Im not entirely clear whether this distinction is actually a useful
 one, combining them would be a fairly reasonable re factoring of the
 update handler, and would probably have an affect on backwards
 compatibility.
 
 Having the replicateAfter parameter set to just 'commit' (ie not on
 start up) has a similar affect to the 'disablereplication' command
 until you do the first commit after startup. So this is a workable
 solution for me, as the the process that pushes updates and commits to
 the index can also check and swap the cores before it does any work.
 
 However it feels like a bit of a tenuous way of disabling replication,
 particularly as there is an explicit mechanism for doing so, its just
 not configurable on startup.
 
 I have a patch, I was looking for a bit of feedback as to whether I
 should submit it.
 
 Thanks,
 
 Francis
 
 
 
 On 22 December 2010 21:30, Upayavira u...@odoko.co.uk wrote:
  I've just done a bit of playing here, because I've spent a lot of time
  reading the SolrReplication wiki page[1], and have often wondered how
  some features interact.
 
  Unfortunately, if you specify str name=enablefalse/str in your
  replication request handler for your master, you cannot re-enable it
  with a call to /solr/replication?command=enablereplication
 
  Therefore, it would seem your best bet is to call
  /solr/replication?command=disablepolling on all of your slaves prior to
  upgrading. Then, when you're sure everything is right, call
  /solr/replication?command=enablepolling on each slave, and you should be
  good to go.
 
  I tried this, watching the request log on my master, and the incoming
  replication requests did actually stop due to the disablepolling
  command, so you should be fine with this approach.
 
  Does this get you to where you want to be?
 
  Upayavira
 
  On Wed, 22 Dec 2010 17:10 +, Francis Rhys-Jones
  francis.rhys-jo...@guardian.co.uk wrote:
  Hi,
 
  I am looking into using a multi core configuration to allow us to
  fully rebuild our index while still applying updates.
 
  I have two cores main-core and rebuild-core. I push the whole dataset
  into the rebuild core, during which time I can happily keep pushing
  updates into the main-core. Once the rebuild is complete I swap the
  cores and delete *:* from the rebuild core.
 
  This works fine however there are a couple of edge cases:
 
  On server restart solr needs to remember which core has been swapped
  in to be the main core, this can be solved by adding the
  persistent=true attribute to the solr config, however this does
  require the solr.xml to be writeable.
 
  While deploying a new version of our application we overwrite the
  solr.xml, as the new version could potentially have legitimate changes
  to the solr.xml that need to be rolled out, again leaving the cores
  out of sync.
 
  My proposed solution is to have the indexing process do some sanity
  checking at the start of each run, and swap in the correct core if
  necessary.
 
  This works however there is the potential for the slaves to start
  replicating the empty index before the correct index is swapped in.
 
  To get round this problem I would like to have replication disabled on
  start up.
 
  Removing  replicateAfter=startup has this affect but it would be more
  future proof to be able to specify a default for the
  replicationEnabled field (see SOLR-1175) in the ReplcationHandler,
  stopping replication until I explicitly turn it on.
 
  The change looks fairly simple.
  ---
  Enterprise Search Consultant at Sourcesense UK,
  Making Sense of Open Source
 
 
 Please consider the environment before printing this email.
 --
 Visit guardian.co.uk - newspaper website of the year
 www.guardian.co.uk  

Solr 1.4.1 stats component count not matching facet count for multi valued field

2010-12-23 Thread Johannes Goll
Hi,

I have a facet field called option which may be multi-valued and
a weight field which is single-valued.

When I use the Solr 1.4.1 stats component with a facet field, i.e.

q=*:*version=2.2stats=true
stats.field=weightstats.facet=option

I get conflicting results for the stats count result
long name=count1/long

when compared with the faceting counts obtained by

q=*:*version=2.2facet=truefacet.field=option

I would expect the same count for either method.

This happens if multiple values are stored in the options field.

It seem that for a multiple values only the last entered value is being
considered in the stats component? What am I doing wrong here?

Thanks,
Johannes


Total number of groups after collapsing

2010-12-23 Thread samarth s
Hi,

I have been using collapsing in my application. I have a requirement of
finding the no of groups matching some filter criteria.
Something like a COUNT(DISTINCT columnName). The only solution I can
currently think of is using the query:

q=*:*rows=Integer.MAX_VALUEstart=0fl=scorecollapse.field=abccollapse.threshold=1collapse.type=normal

I get the number of groups from 'numFound', but this seems like a bad
solution in terms of performance. Is there a cleaner way?

Thanks,
Samarth


Using remote Nutch Server to crawl, then merging results into local index

2010-12-23 Thread Dietrich
I want to use Solr to index two types of documents:
- local documents in Drupal (ca. 10M)
- a large number of web sites to be crawled thru Nutch (ca 100M)

Our data center does not have the necessary bandwith to crawl all the
external sites and we want to use a hosting provider to do the
crawling for us, but we want the actual serving of results to happen
locally.
It seems as if it  would be probably be easiest to delegate all the
indexing to a remote server and replicated those indexes to a slave in
our data center using built-in Solr replication, but then the indexing
of our internal sites would have to happen remotely, too, which I
would like to avoid.

I think Hadoop/MapReduce would be overkill for this scenario, so what
other options are there?
I was considering
- using Solr merge to merge the Drupal  Nutch indexes
- have Nutch post the crawled results to the local Solr index

Any suggestions would be highly appreciated.

Dietrich Schmidt
http://www.linkedin.com/in/dietrichschmidt


Custom match scoring

2010-12-23 Thread Nelson Branco
Hi,

 

I'm implementing a search that has peculiar scoring rules that, as I can
see, isn't supported natively.

 

The rules are like:

-  Given a set of tokens, the final score would be the sum of scores
of all token by each token can only be scored for its best match over a set
of fields that it might match

i.e. restaurant food (2 tokens) must match Category^10 Name^5
Description

the token restaurant might match documents with all fields but it
must only be given the score of Category match

the token food also counts for the score, again, with its best
match on any of indicated fields

 

Can anyone guide me truth a solution or for a extension point where I can
capture only the best match for a given field?

 

Thanks in advance.

 

--

Nelson Branco



smime.p7s
Description: S/MIME cryptographic signature


Re: full text search in multiple fields

2010-12-23 Thread PeterKerk

Correct! Thanks again, it now works! :)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2137284.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using remote Nutch Server to crawl, then merging results into local index

2010-12-23 Thread Dominique Bejean

Hi,

In order to crawl and index your web sites, may you can have a look at 
www.crawl-anywhere.com. It includes a web crawler, a document processing 
pipeline and a solr indexer.


Dominique

Le 23/12/10 16:27, Dietrich a écrit :

I want to use Solr to index two types of documents:
- local documents in Drupal (ca. 10M)
- a large number of web sites to be crawled thru Nutch (ca 100M)

Our data center does not have the necessary bandwith to crawl all the
external sites and we want to use a hosting provider to do the
crawling for us, but we want the actual serving of results to happen
locally.
It seems as if it  would be probably be easiest to delegate all the
indexing to a remote server and replicated those indexes to a slave in
our data center using built-in Solr replication, but then the indexing
of our internal sites would have to happen remotely, too, which I
would like to avoid.

I think Hadoop/MapReduce would be overkill for this scenario, so what
other options are there?
I was considering
- using Solr merge to merge the Drupal  Nutch indexes
- have Nutch post the crawled results to the local Solr index

Any suggestions would be highly appreciated.

Dietrich Schmidt
http://www.linkedin.com/in/dietrichschmidt



Re: DIH for taxonomy faceting in Lucid webcast

2010-12-23 Thread Erick Erickson
SolrJ is often used when DIH doesn't do what you wish. Using SolrJ is
really quite easy, but you're doing the DB queries yourself, often with
the appropriate jdbc driver.

Within DIH, the transformers, as Chris says, *might* work for you.

Best
Erick

On Wed, Dec 22, 2010 at 6:16 PM, Andy angelf...@yahoo.com wrote:


 --- On Wed, 12/22/10, Chris Hostetter hossman_luc...@fucit.org wrote:

  : 2) Once I have the fully spelled out category path such
  as
  : NonFic/Science, how do I turn that into 0/NonFic
  
  : 1/NonFic/Science using the DIH?
 
  I don't have any specific suggestions for you -- i've never
  tried it in
  DIH myself.  the ScriptTransformer might be able to
  help you out, but i'm
  not sure.

 Thanks Chris.

 What did you use to generate those encodings if not DIH?







Re: error in html???

2010-12-23 Thread Erick Erickson
What html format? Solr responds in XML, not HTML. Any HTML
has to be created somewhere in the chain. Your browser
may not be set up to render XML, so you could be seeing problems
because of that.

If hit is off-base, could you explain your issue in a bit more detail?

Best
Erick

On Thu, Dec 23, 2010 at 6:30 AM, satya swaroop satya.yada...@gmail.comwrote:

 Hi All,

 I am able to get the response in the success case in json format by
 stating wt=json in the query. But as in case if any errors i am geting in
 html format.
  1) Is there any specified reason to get in html format??
  2)cant we get the error result in json format??

 Regards,
 satya



Re: Item catagorization problem.

2010-12-23 Thread Erick Erickson
What you're asking for appears to me to be auto-categorization, and
there's nothing built into Solr to do this. Somehow you need to analyze
the documents at index time and add the proper categories, but I have
no clue how. This is especially hard with short fields since most
auto-categorization algorithms try to do some statistical analysis
of the document to figure this out.

Best
Erick

On Thu, Dec 23, 2010 at 8:12 AM, Hasnain hasn...@hotmail.com wrote:


 Hi all,

   I am using solr in my web application for search purposes. However, i
 am having a problem with the default behaviour of the solr search.

 From my understanding, if i query for a keyword, let's say Laptop,
 preference is given to result rows having more occurences of the search
 keyword Laptop in the field name. This, however, is producing
 undesirable scenarios, for example:

 1. I index an item A with name value Sony Laptop.
 2. I index another item B with name value: Laptop bags for laptops.
 3. I search for the keyword Laptop

 According to the default behaviour, precedence would be given to item B
 since the keyword appears more times in the name field for that item.

 In my schema, i have another field by the name of Category and, for
 example's sake, let's assume that my application supports only two
 categories: computers and accessories. Now, what i require is a mechanism
 to
 assign correct categories to the items during item indexing so that this
 field can be used to better filter the search results, item A would belong
 to Computer category and item B would belong to Accessories category.
 So
 then, searching for Laptop would only look for items in the Computers
 category and return item A only.

 I would like to point out here that setting the category field manually is
 not an option since the data might be in the vicinity of thousands of
 records. I am not asking for an in-depth algorithm. Just a high level
 design
 would be sufficient to set me in the right direction.

 thanks.


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Item-catagorization-problem-tp2136415p2136415.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Custom match scoring

2010-12-23 Thread Erick Erickson
Hmmm, have you looked at dismax? If I'm reading your message correctly, it
sounds
like this may already be there. Of course I've missed the point of messages
before.

Best
Erick

On Thu, Dec 23, 2010 at 10:29 AM, Nelson Branco nelson-bra...@telecom.ptwrote:

 Hi,



 I’m implementing a search that has peculiar scoring rules that, as I can
 see, isn’t supported natively.



 The rules are like:

 -  Given a set of tokens, the final score would be the sum of
 scores of all token by each token can only be scored for its best match over
 a set of fields that it might match

 i.e. “restaurant food” (2 tokens) must match “Category^10 Name^5
 Description”

 the token “restaurant” might match documents with all fields but it
 must only be given the score of “Category” match

 the token “food” also counts for the score, again, with its best
 match on any of indicated fields



 Can anyone guide me truth a solution or for a extension point where I can
 capture only the best match for a given field?



 Thanks in advance.



 --

 Nelson Branco



Re: Item precedence search problem

2010-12-23 Thread Gora Mohanty
On Wed, Dec 22, 2010 at 3:53 PM, Hasnain hasn...@hotmail.com wrote:
[...]
 In my schema, i have another field by the name of Category and, for
 example's sake, let's assume that my application supports only two
 categories: computers and accessories. Now, what i require is a mechanism to
 assign correct categories to the items during item indexing so that this
 field can be used to better filter the search results. Continuing from the
 example in my original post, item A would belong to Computer category and
 item B would belong to Accessories category. So then, searching for
 Laptop would only look for items in the Computers category and return
 item A only.

 I would like to point out here that setting the category field manually is
 not an option since the data might be in the vicinity of thousands of
 records. I am not asking for an in-depth algorithm. Just a high level design
 would be sufficient to set me in the right direction.
[...]

How do you do your indexing? You would need to have the indexer decide
on what the proper category for a document should be, and add that value
to the category field.

Depending on your requirements, it might be possible to use synonyms in
Solr to arrive at something like this. Other than that, Solr has no mechanism
to automatically assign a category. You could possibly look at things like
Apache Mahout to help you here.

Regards,
Gora


Re: error in html???

2010-12-23 Thread Markus Jelsma
These

HTTP Status 500 - null java.lang.NullPointerException at 
java.io.StringReader.init(StringReader.java:50) at 

are returned in HTML. I use Nginx to detect the HTTP error code and return a 
JSON encoded body with the appropriate content type. Maybe it could be done in 
the servlet container but i never tried.

 What html format? Solr responds in XML, not HTML. Any HTML
 has to be created somewhere in the chain. Your browser
 may not be set up to render XML, so you could be seeing problems
 because of that.
 
 If hit is off-base, could you explain your issue in a bit more detail?
 
 Best
 Erick
 
 On Thu, Dec 23, 2010 at 6:30 AM, satya swaroop 
satya.yada...@gmail.comwrote:
  Hi All,
  
  I am able to get the response in the success case in json format
  by
  
  stating wt=json in the query. But as in case if any errors i am geting in
  html format.
  
   1) Is there any specified reason to get in html format??
   2)cant we get the error result in json format??
  
  Regards,
  satya


Re: Using remote Nutch Server to crawl, then merging results into local index

2010-12-23 Thread Erick Erickson
Merging the indexes seems problematical. It's easy enough to
#code#, but I'm not sure it would produce results you want. And it
supposes that your schemas are identical (or at least compatible)
between the crawled data and your local data, which I wonder about...

Instead, I'd think about cores. Cores can be thought of as a
virtual Solr index accessible by a single Solr instance. I'd guess that
your requirements for handling the crawled data are different enough
from the local documents that this might be what you want to do anyway.

Federating these would probably involve two queries and some kind of
manual integration of them though.

 Best
Erick

On Thu, Dec 23, 2010 at 10:27 AM, Dietrich diet...@gmail.com wrote:

 I want to use Solr to index two types of documents:
 - local documents in Drupal (ca. 10M)
 - a large number of web sites to be crawled thru Nutch (ca 100M)

 Our data center does not have the necessary bandwith to crawl all the
 external sites and we want to use a hosting provider to do the
 crawling for us, but we want the actual serving of results to happen
 locally.
 It seems as if it  would be probably be easiest to delegate all the
 indexing to a remote server and replicated those indexes to a slave in
 our data center using built-in Solr replication, but then the indexing
 of our internal sites would have to happen remotely, too, which I
 would like to avoid.

 I think Hadoop/MapReduce would be overkill for this scenario, so what
 other options are there?
 I was considering
 - using Solr merge to merge the Drupal  Nutch indexes
 - have Nutch post the crawled results to the local Solr index

 Any suggestions would be highly appreciated.

 Dietrich Schmidt
 http://www.linkedin.com/in/dietrichschmidt



Re: Item catagorization problem.

2010-12-23 Thread Dennis Gearon
Doesn't indexing analyzing do this to some degree anyway?

Not sure the alogrithm, but something like: How often, hom much near the top, 
how many differnt forms, subject or object of a sentence. That has to have some 
relevance to what category something is in.

The simplest extension to that would be something like a 'sub vocabulary' cross 
listing. If such and such words were hi relevance, then the subject is about 
this or that.

The smartest categorizer is your users, though. So the best way to make that 
list is to keep track of how close to the top of the search results did a user 
respond to his search results and what were the words, and how many search 
attempts did it take. That's waht netflix does. Their goal is to have users get 
something in theh top three off the first search attempt.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org
Sent: Thu, December 23, 2010 10:00:05 AM
Subject: Re: Item catagorization problem.

What you're asking for appears to me to be auto-categorization, and
there's nothing built into Solr to do this. Somehow you need to analyze
the documents at index time and add the proper categories, but I have
no clue how. This is especially hard with short fields since most
auto-categorization algorithms try to do some statistical analysis
of the document to figure this out.

Best
Erick

On Thu, Dec 23, 2010 at 8:12 AM, Hasnain hasn...@hotmail.com wrote:


 Hi all,

   I am using solr in my web application for search purposes. However, i
 am having a problem with the default behaviour of the solr search.

 From my understanding, if i query for a keyword, let's say Laptop,
 preference is given to result rows having more occurences of the search
 keyword Laptop in the field name. This, however, is producing
 undesirable scenarios, for example:

 1. I index an item A with name value Sony Laptop.
 2. I index another item B with name value: Laptop bags for laptops.
 3. I search for the keyword Laptop

 According to the default behaviour, precedence would be given to item B
 since the keyword appears more times in the name field for that item.

 In my schema, i have another field by the name of Category and, for
 example's sake, let's assume that my application supports only two
 categories: computers and accessories. Now, what i require is a mechanism
 to
 assign correct categories to the items during item indexing so that this
 field can be used to better filter the search results, item A would belong
 to Computer category and item B would belong to Accessories category.
 So
 then, searching for Laptop would only look for items in the Computers
 category and return item A only.

 I would like to point out here that setting the category field manually is
 not an option since the data might be in the vicinity of thousands of
 records. I am not asking for an in-depth algorithm. Just a high level
 design
 would be sufficient to set me in the right direction.

 thanks.


 --
 View this message in context:
http://lucene.472066.n3.nabble.com/Item-catagorization-problem-tp2136415p2136415.html
l
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: full text search in multiple fields

2010-12-23 Thread PeterKerk

Sorry to bother you again, but it still doesnt seem to work all the time...

This (what you solved earlier) works:
q=title_search:PappegaydefType=lucenefl=id,title


But for another location, which value in DB is: de tuinkamer

When I query the id of that location:
q=id:431fl=id,title
the location is found, so it IS indexed...


But this query DOESNT work:

q=title_search:tuinkamer*defType=lucenefl=id,title

And this one DOES:
q=title_search:tuin*defType=lucenefl=id,title

for me this is unexpected...what can it be?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2137983.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: full text search in multiple fields

2010-12-23 Thread Ahmet Arslan
 But for another location, which value in DB is: de
 tuinkamer
 
 When I query the id of that location:
 q=id:431fl=id,title
 the location is found, so it IS indexed...
 
 
 But this query DOESNT work:
 
 q=title_search:tuinkamer*defType=lucenefl=id,title
 
 And this one DOES:
 q=title_search:tuin*defType=lucenefl=id,title
 
 for me this is unexpected...what can it be?

As you can verify from /solr/admin/analysis.jsp, tuinkamer is reduced to 
tuinkam by EnglishPorterFilterFactory. So it expected/normal that 
q=title_search:tuinkamer* won't return that document.  Remember tuinkamer* is 
not analyzed and tested against what is indexed. That said, if you plan using 
wildcards, remove EnglishPorterFilterFactory from your analyzers.



  


Re: synonyms database

2010-12-23 Thread lee carroll
Hi ramzesua,
Synonym lists will often be application specific and will of course be
language specific. Given this I don't think you can talk about a generic
solr synonym list, just won't be very helpful in lots of cases.

What are you hoping to achieve with your synonyms for your app?




On 23 December 2010 11:50, ramzesua michaelnaza...@gmail.com wrote:


 Hi all. Where can I get synonyms database for Solr?
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/synonyms-database-tp2136076p2136076.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: full text search in multiple fields

2010-12-23 Thread PeterKerk

@iorixxx: removing that line did solve the problem, thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2138629.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Spellcheker automatically tokenizes on period marks

2010-12-23 Thread Sebastian M

Is it possible that the spellcheck query can be configured to stop tokenizing
on period marks through a parameter, rather than through the analyzer?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Spellcheker-automatically-tokenizes-on-period-marks-tp2131844p2138753.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 1.4.1 stats component count not matching facet count for multi valued field

2010-12-23 Thread Chris Hostetter
: I have a facet field called option which may be multi-valued and
: a weight field which is single-valued.
: 
: When I use the Solr 1.4.1 stats component with a facet field, i.e.
...
: I get conflicting results for the stats count result

a jira search for solr stats multivalued would have given you...

https://issues.apache.org/jira/browse/SOLR-1782

-Hoss


Re: DIH for taxonomy faceting in Lucid webcast

2010-12-23 Thread Lance Norskog
The DIH lets you code in Javascript- you can do anything.

On 12/23/10, Erick Erickson erickerick...@gmail.com wrote:
 SolrJ is often used when DIH doesn't do what you wish. Using SolrJ is
 really quite easy, but you're doing the DB queries yourself, often with
 the appropriate jdbc driver.

 Within DIH, the transformers, as Chris says, *might* work for you.

 Best
 Erick

 On Wed, Dec 22, 2010 at 6:16 PM, Andy angelf...@yahoo.com wrote:


 --- On Wed, 12/22/10, Chris Hostetter hossman_luc...@fucit.org wrote:

  : 2) Once I have the fully spelled out category path such
  as
  : NonFic/Science, how do I turn that into 0/NonFic
  
  : 1/NonFic/Science using the DIH?
 
  I don't have any specific suggestions for you -- i've never
  tried it in
  DIH myself.  the ScriptTransformer might be able to
  help you out, but i'm
  not sure.

 Thanks Chris.

 What did you use to generate those encodings if not DIH?








-- 
Lance Norskog
goks...@gmail.com


Problem of results ordering

2010-12-23 Thread Ruixiang Zhang
When I search guitar center 94305, it gives the results:

guitar center guitar center Hollywood
guitar center 94305
guitar center 94305 location

But I want results to be like this:

guitar center 94305
guitar center 94305 location
guitar center guitar center Hollywood

How can I make the results that match all keywords come first?
Or how can I reduce the weight of the word that appears the second or more
time?

Thanks
Ruixiang


Re: Problem of results ordering

2010-12-23 Thread Erick Erickson
What does your query look like? Especially what is the output
when you append debugQuery=on? You can examine the
scoring at the end of the response to gain more insight.

Best
Erick

On Thu, Dec 23, 2010 at 8:34 PM, Ruixiang Zhang rxzh...@gmail.com wrote:

 When I search guitar center 94305, it gives the results:

 guitar center guitar center Hollywood
 guitar center 94305
 guitar center 94305 location

 But I want results to be like this:

 guitar center 94305
 guitar center 94305 location
 guitar center guitar center Hollywood

 How can I make the results that match all keywords come first?
 Or how can I reduce the weight of the word that appears the second or more
 time?

 Thanks
 Ruixiang



Re: Problem of results ordering

2010-12-23 Thread Anurag

Try boosting 94305 as guitar center 94305^10
On Fri, Dec 24, 2010 at 9:23 AM, Erick Erickson [via Lucene] 
ml-node+2139685-1248268645-146...@n3.nabble.comml-node%2b2139685-1248268645-146...@n3.nabble.com
 wrote:

 What does your query look like? Especially what is the output
 when you append debugQuery=on? You can examine the
 scoring at the end of the response to gain more insight.

 Best
 Erick

 On Thu, Dec 23, 2010 at 8:34 PM, Ruixiang Zhang [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=2139685i=0
 wrote:

  When I search guitar center 94305, it gives the results:
 
  guitar center guitar center Hollywood
  guitar center 94305
  guitar center 94305 location
 
  But I want results to be like this:
 
  guitar center 94305
  guitar center 94305 location
  guitar center guitar center Hollywood
 
  How can I make the results that match all keywords come first?
  Or how can I reduce the weight of the word that appears the second or
 more
  time?
 
  Thanks
  Ruixiang
 


 --
  View message @
 http://lucene.472066.n3.nabble.com/Problem-of-results-ordering-tp2139314p2139685.html

 To start a new topic under Solr - User, email
 ml-node+472068-1941297125-146...@n3.nabble.comml-node%2b472068-1941297125-146...@n3.nabble.com
 To unsubscribe from Solr - User, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472068code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXw0NzIwNjh8LTIwOTgzNDQxOTY=.





-- 
Kumar Anurag


-
Kumar Anurag

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-of-results-ordering-tp2139314p2139978.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr 1.4.1 stats component count not matching facet count for multi valued field

2010-12-23 Thread Jonathan Rochkind
Interesting, the wiki page on StatsComponent says multi-valued fields may be 
slow , and may use lots of memory. http://wiki.apache.org/solr/StatsComponent

Apparently it should also warn that multi-valued fields may not work at all? 
I'm going to add that with a link to the JIRA ticket. 

From: Chris Hostetter [hossman_luc...@fucit.org]
Sent: Thursday, December 23, 2010 7:22 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 1.4.1 stats component count not matching facet count for 
multi valued field

: I have a facet field called option which may be multi-valued and
: a weight field which is single-valued.
:
: When I use the Solr 1.4.1 stats component with a facet field, i.e.
...
: I get conflicting results for the stats count result

a jira search for solr stats multivalued would have given you...

https://issues.apache.org/jira/browse/SOLR-1782

-Hoss


RE: Solr 1.4.1 stats component count not matching facet count for multi valued field

2010-12-23 Thread Chris Hostetter

: Interesting, the wiki page on StatsComponent says multi-valued fields 
: may be slow , and may use lots of memory. 
: http://wiki.apache.org/solr/StatsComponent

*stats* over multivalued fields work, but use lots of memory -- that bug 
only hits you when you compute stats over any field, that are faceted by a 
multivalued field.


-Hoss


RE: Solr 1.4.1 stats component count not matching facet count for multi valued field

2010-12-23 Thread Jonathan Rochkind
Aha! Thanks, sorry, I'll clarify on my wiki edit. 

From: Chris Hostetter [hossman_luc...@fucit.org]
Sent: Friday, December 24, 2010 12:11 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr 1.4.1 stats component count not matching facet count for 
multi valued field

: Interesting, the wiki page on StatsComponent says multi-valued fields
: may be slow , and may use lots of memory.
: http://wiki.apache.org/solr/StatsComponent

*stats* over multivalued fields work, but use lots of memory -- that bug
only hits you when you compute stats over any field, that are faceted by a
multivalued field.


-Hoss


Re: error in html???

2010-12-23 Thread satya swaroop
Hi Erick,
   Every result comes in xml format. But when you get any errors
like http 500 or http 400 like wise we will get in html format. My query is
cant we make that html file into json or vice versa..

Regards,
satya


Map failed at getSearcher

2010-12-23 Thread Rok Rejc
Hi all,

I have created a new index (using Solr trunk version from 17th December,
running on Windows 7  Tomcat 6, 64 bit JVM) with around 1.1 billion of
documents (index size around 550GB, mergeFactor=20).

After the (csv) import I have commited the data and got this error:

HTTP Status 500 - Severe errors in solr configuration. Check your log files
for more detailed information on what may be wrong.
-
java.lang.RuntimeException: java.io.IOException: Map failed at
org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1095) at
org.apache.solr.core.SolrCore.init(SolrCore.java:587) at
org.apache.solr.core.CoreContainer.create(CoreContainer.java:660) at
org.apache.solr.core.CoreContainer.load(CoreContainer.java:412) at
org.apache.solr.core.CoreContainer.load(CoreContainer.java:294) at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:243)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:86)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
at
org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4001)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4651)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:546) at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:637)
at
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:563)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:498) at
org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277) at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321)
at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at
org.apache.catalina.core.StandardHost.start(StandardHost.java:785) at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:445) at
org.apache.catalina.core.StandardService.start(StandardService.java:519) at
org.apache.catalina.core.StandardServer.start(StandardServer.java:710) at
org.apache.catalina.startup.Catalina.start(Catalina.java:581) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at
java.lang.reflect.Method.invoke(Unknown Source) at
org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at
org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) Caused by:
java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(Unknown
Source) at
org.apache.lucene.store.MMapDirectory$MultiMMapIndexInput.init(MMapDirectory.java:327)
at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:209)
at
org.apache.lucene.index.CompoundFileReader.init(CompoundFileReader.java:68)
at
org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:208)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:529) at
org.apache.lucene.index.SegmentReader.get(SegmentReader.java:504) at
org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:123) at
org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:91) at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:623)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:86) at
org.apache.lucene.index.IndexReader.open(IndexReader.java:437) at
org.apache.lucene.index.IndexReader.open(IndexReader.java:316) at
org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1084) ... 33 more
Caused by: java.lang.OutOfMemoryError: Map failed at
sun.nio.ch.FileChannelImpl.map0(Native Method) ... 48 more

I can see that the error is going down to lucene and java, but I don't have
a clue what should I do... Any suggestions?

Thanks and merry christmas:)

Rok


Re: Total number of groups after collapsing

2010-12-23 Thread samarth s
Hi,

I figured out a better way of doing it. The following query would be a
better option:
q=*:*start=2147483647rows=0collapse=truecollapse.field=abccollapse.threshold=1

Thanks,
Samarth

On Thu, Dec 23, 2010 at 8:57 PM, samarth s samarth.s.seksa...@gmail.comwrote:

 Hi,

 I have been using collapsing in my application. I have a requirement of
 finding the no of groups matching some filter criteria.
 Something like a COUNT(DISTINCT columnName). The only solution I can
 currently think of is using the query:


 q=*:*rows=Integer.MAX_VALUEstart=0fl=scorecollapse.field=abccollapse.threshold=1collapse.type=normal

 I get the number of groups from 'numFound', but this seems like a bad
 solution in terms of performance. Is there a cleaner way?

 Thanks,
 Samarth