date:20110203

Hi all,
Even after making facet.mincount=1 , it is showing the results with count =
0.
Does anyone know why this is happening.

-- 
Thanks  Regards,
Isan Fulia.

Re: facet.mincount

could you post the query you are submitting to Solr?

On 3 February 2011 09:33, Isan Fulia isan.fu...@germinait.com wrote:

 Hi all,
 Even after making facet.mincount=1 , it is showing the results with count =
 0.
 Does anyone know why this is happening.

 --
 Thanks  Regards,
 Isan Fulia.

Re: DataImportHandler: no queries when using entity=something

2011-02-03 Thread Darx Oman

add to url clean=false
http://solr:8983/solr/dataimport?command=full-importentity=games;
clean=false

*clean* : (default 'true'). Tells whether to clean up the index before the
indexing is started

Re: DataImportHandler: no queries when using entity=something

2011-02-03 Thread Darx Oman

check your log file you might have a connection problem

Re: DataImportHandler: no queries when using entity=something

2011-02-03 Thread Gora Mohanty

On Thu, Feb 3, 2011 at 3:23 PM, Darx Oman darxo...@gmail.com wrote:
 add to url clean=false
 http://solr:8983/solr/dataimport?command=full-importentity=games;
 clean=false

 *clean* : (default 'true'). Tells whether to clean up the index before the
 indexing is started
[...]

Sorry, what does that have to do with the original poster's question?

Regards,
Gora

Re: Open Too Many Files

2011-02-03 Thread Markus Jelsma

Or decrease the mergeFactor.

 or change the index to a compound-index
 
 solrconfig.xml: useCompoundFiletrue/useCompoundFile
 
 so solr creates one index file and not thousands.
 
 -
 --- System
 
 
 One Server, 12 GB RAM, 2 Solr Instances, 7 Cores,
 1 Core with 31 Million Documents other Cores  100.000
 
 - Solr1 for Search-Requests - commit every Minute  - 4GB Xmx
 - Solr2 for Update-Request  - delta every 2 Minutes - 4GB Xmx

Re: facet.mincount

Any query followed by

facet=onfacet.date=aUpdDtfacet.date.start=2011-01-02T08:00:00.000Zfacet.date.end=2011-02-03T08:00:00.000Zfacet.date.gap=%2B1HOURfacet.mincount=1

On 3 February 2011 15:14, Savvas-Andreas Moysidis 
savvas.andreas.moysi...@googlemail.com wrote:

 could you post the query you are submitting to Solr?

 On 3 February 2011 09:33, Isan Fulia isan.fu...@germinait.com wrote:

  Hi all,
  Even after making facet.mincount=1 , it is showing the results with count
 =
  0.
  Does anyone know why this is happening.
 
  --
  Thanks  Regards,
  Isan Fulia.
 




-- 
Thanks  Regards,
Isan Fulia.

Re: facet.mincount


Have you seen your log file ,what saying the log file . Is there any
exception occur?
I have never seen that facet.mincont=1 not working.
What version of solr you are using?

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/facet-mincount-tp2411930p2412389.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Open Too Many Files


best option to use 
useCompoundFiletrue/useCompoundFile 

decreasing mergeFactor may cause indexing slow

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Open-Too-Many-Files-tp2406289p2412415.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: from long to tlong, compatible?

2011-02-03 Thread Dan G

Thanks for the fast answer.
Yeah, I was afraid that I needed to re-index for the precision to take effect 
in 
this case.

- Original Message 
From: Yonik Seeley yo...@lucidimagination.com
To: solr-user@lucene.apache.org
Sent: Wed, February 2, 2011 10:12:42 PM
Subject: Re: from long to tlong, compatible?

On Wed, Feb 2, 2011 at 3:46 PM, Dan G diser...@yahoo.se wrote:

 My question is if it would be possible to just change the field to the
 preferred
 type tlong with a precision of 8?

 Would this change be compatible with my indexed data or should I re-indexed
 the
 date (a pain with 800+M docs :))?

I think you'll need to re-index, or range queries on that field will miss
many of the documents you've already indexed with precisionStep=0

-Yonik
http://lucidimagination.com

Re: facet.mincount

2011-02-03 Thread dan sutton

I don't think facet.mincount works with date faceting, see here:

http://wiki.apache.org/solr/SimpleFacetParameters

Dan

On Thu, Feb 3, 2011 at 10:11 AM, Isan Fulia isan.fu...@germinait.com wrote:
 Any query followed by

 facet=onfacet.date=aUpdDtfacet.date.start=2011-01-02T08:00:00.000Zfacet.date.end=2011-02-03T08:00:00.000Zfacet.date.gap=%2B1HOURfacet.mincount=1

 On 3 February 2011 15:14, Savvas-Andreas Moysidis 
 savvas.andreas.moysi...@googlemail.com wrote:

 could you post the query you are submitting to Solr?

 On 3 February 2011 09:33, Isan Fulia isan.fu...@germinait.com wrote:

  Hi all,
  Even after making facet.mincount=1 , it is showing the results with count
 =
  0.
  Does anyone know why this is happening.
 
  --
  Thanks  Regards,
  Isan Fulia.
 




 --
 Thanks  Regards,
 Isan Fulia.

Re: facet.mincount

I am using solr1.4.1 release version
I got the following error while using facet.mincount
java.lang.IllegalStateException: STREAM
at org.mortbay.jetty.Response.getWriter(Response.java:571)
at
org.apache.jasper.runtime.JspWriterImpl.initOut(JspWriterImpl.java:158)
at
org.apache.jasper.runtime.JspWriterImpl.flushBuffer(JspWriterImpl.java:151)
at
org.apache.jasper.runtime.PageContextImpl.release(PageContextImpl.java:208)
at
org.apache.jasper.runtime.JspFactoryImpl.internalReleasePageContext(JspFactoryImpl.java:144)
at
org.apache.jasper.runtime.JspFactoryImpl.releasePageContext(JspFactoryImpl.java:95)
at
org.apache.jsp.admin.index_jsp._jspService(org.apache.jsp.admin.index_jsp:397)
at
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:80)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:373)
at
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:464)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:358)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:367)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:268)
at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
at
org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:431)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1098)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:286)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)


On 3 February 2011 16:17, dan sutton danbsut...@gmail.com wrote:

 I don't think facet.mincount works with date faceting, see here:

 http://wiki.apache.org/solr/SimpleFacetParameters

 Dan

 On Thu, Feb 3, 2011 at 10:11 AM, Isan Fulia isan.fu...@germinait.com
 wrote:
  Any query followed by
 
 
 facet=onfacet.date=aUpdDtfacet.date.start=2011-01-02T08:00:00.000Zfacet.date.end=2011-02-03T08:00:00.000Zfacet.date.gap=%2B1HOURfacet.mincount=1
 
  On 3 February 2011 15:14, Savvas-Andreas Moysidis 
  savvas.andreas.moysi...@googlemail.com wrote:
 
  could you post the query you are submitting to Solr?
 
  On 3 February 2011 09:33, Isan Fulia isan.fu...@germinait.com wrote:
 
   Hi all,
   Even after making facet.mincount=1 , it is showing the results with
 count
  =
   0.
   Does anyone know why this is happening.
  
   --
   Thanks  Regards,
   Isan Fulia.
  
 
 
 
 
  --
  Thanks  Regards,
  Isan Fulia.
 




-- 
Thanks  Regards,
Isan Fulia.

Re: facet.mincount

Hi Dan,

I'm probably just not able to spot this, but where does the wiki page
mention that the facet.mincount is not applicable on date fields?

On 3 February 2011 10:55, Isan Fulia isan.fu...@germinait.com wrote:

 I am using solr1.4.1 release version
 I got the following error while using facet.mincount
 java.lang.IllegalStateException: STREAM
at org.mortbay.jetty.Response.getWriter(Response.java:571)
at
 org.apache.jasper.runtime.JspWriterImpl.initOut(JspWriterImpl.java:158)
at
 org.apache.jasper.runtime.JspWriterImpl.flushBuffer(JspWriterImpl.java:151)
at
 org.apache.jasper.runtime.PageContextImpl.release(PageContextImpl.java:208)
at

 org.apache.jasper.runtime.JspFactoryImpl.internalReleasePageContext(JspFactoryImpl.java:144)
at

 org.apache.jasper.runtime.JspFactoryImpl.releasePageContext(JspFactoryImpl.java:95)
at

 org.apache.jsp.admin.index_jsp._jspService(org.apache.jsp.admin.index_jsp:397)
at
 org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:80)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at

 org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:373)
at
 org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:464)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:358)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:367)
at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:268)
at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
at
 org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:431)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
at

 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1098)
at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:286)
at

 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at

 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at

 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at

 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at

 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at

 org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)


 On 3 February 2011 16:17, dan sutton danbsut...@gmail.com wrote:

  I don't think facet.mincount works with date faceting, see here:
 
  http://wiki.apache.org/solr/SimpleFacetParameters
 
  Dan
 
  On Thu, Feb 3, 2011 at 10:11 AM, Isan Fulia isan.fu...@germinait.com
  wrote:
   Any query followed by
  
  
 
 facet=onfacet.date=aUpdDtfacet.date.start=2011-01-02T08:00:00.000Zfacet.date.end=2011-02-03T08:00:00.000Zfacet.date.gap=%2B1HOURfacet.mincount=1
  
   On 3 February 2011 15:14, Savvas-Andreas Moysidis 
   savvas.andreas.moysi...@googlemail.com wrote:
  
   could you post the query you are submitting to Solr?
  
   On 3 February 2011 09:33, Isan Fulia isan.fu...@germinait.com
 wrote:
  
Hi all,
Even after making facet.mincount=1 , it is showing the results with
  count
   =
0.
Does anyone know why this is happening.
   
--
Thanks  Regards,
Isan Fulia.

Re: facet.mincount


I am also not getting where in wiki its mention that facet.mincount will not
work with date faceting.

But I have checked by query its not working with me also.
Have to report a bug.

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/facet-mincount-tp2411930p2412660.html
Sent from the Solr - User mailing list archive at Nabble.com.

How effective are faceted queries ?

2011-02-03 Thread csj


Hi,

I was wondering if there exists any performance characteristica for facets.
As I understand facets, they are a subqueries, that will perform certain
counts on the resultset. This mean that a facet will be evaluated on every
shard along with the main query. 

But how will the facet query evaluate? If the resultset is sorted, will the
facet query take advantage of that when evaluating? 

Example: a search is done for all document within a given range of dates by
the field createdDate. The resultset is sorted by that field. Would a facet
query then be able to use this sorting, when it counts how many documents
were created per week, or per day for that matter?

Kind regards,

Christian Sonne Jensen
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-effective-are-faceted-queries-tp2412689p2412689.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: facet.mincount

2011-02-03 Thread dan sutton

facet.mincount is grouped only under field faceting parameters not
date faceting parameters

On Thu, Feb 3, 2011 at 11:08 AM, Savvas-Andreas Moysidis
savvas.andreas.moysi...@googlemail.com wrote:
 Hi Dan,

 I'm probably just not able to spot this, but where does the wiki page
 mention that the facet.mincount is not applicable on date fields?

 On 3 February 2011 10:55, Isan Fulia isan.fu...@germinait.com wrote:

 I am using solr1.4.1 release version
 I got the following error while using facet.mincount
 java.lang.IllegalStateException: STREAM
        at org.mortbay.jetty.Response.getWriter(Response.java:571)
        at
 org.apache.jasper.runtime.JspWriterImpl.initOut(JspWriterImpl.java:158)
        at
 org.apache.jasper.runtime.JspWriterImpl.flushBuffer(JspWriterImpl.java:151)
        at
 org.apache.jasper.runtime.PageContextImpl.release(PageContextImpl.java:208)
        at

 org.apache.jasper.runtime.JspFactoryImpl.internalReleasePageContext(JspFactoryImpl.java:144)
        at

 org.apache.jasper.runtime.JspFactoryImpl.releasePageContext(JspFactoryImpl.java:95)
        at

 org.apache.jsp.admin.index_jsp._jspService(org.apache.jsp.admin.index_jsp:397)
        at
 org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:80)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at

 org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:373)
        at
 org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:464)
        at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:358)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
        at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:367)
        at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
        at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
        at
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
        at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:268)
        at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
        at
 org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:431)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
        at

 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1098)
        at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:286)
        at

 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
        at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
        at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
        at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
        at
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
        at

 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
        at

 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
        at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
        at org.mortbay.jetty.Server.handle(Server.java:285)
        at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
        at

 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
        at

 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
        at

 org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)


 On 3 February 2011 16:17, dan sutton danbsut...@gmail.com wrote:

  I don't think facet.mincount works with date faceting, see here:
 
  http://wiki.apache.org/solr/SimpleFacetParameters
 
  Dan
 
  On Thu, Feb 3, 2011 at 10:11 AM, Isan Fulia isan.fu...@germinait.com
  wrote:
   Any query followed by
  
  
 
 facet=onfacet.date=aUpdDtfacet.date.start=2011-01-02T08:00:00.000Zfacet.date.end=2011-02-03T08:00:00.000Zfacet.date.gap=%2B1HOURfacet.mincount=1
  
   On 3 February 2011 15:14, Savvas-Andreas Moysidis 
   savvas.andreas.moysi...@googlemail.com wrote:
  
   could you post the query you are submitting to Solr?
  
   On 3 February 2011 09:33, Isan Fulia isan.fu...@germinait.com
 wrote:
  
Hi all,

Re: Malformed XML with exotic characters

2011-02-03 Thread Markus Jelsma

I've seen almost all funky charsets but gothic is always trouble. I'm also
unsure if its really a bug in Solr. It could well be the Xerces being unable
to cope. Besides, most systems indeed don't go well with gothic. This mail
client does, but my terminal can't find its cursor after (properly) displaying
such text.

http://got.wikipedia.org/wiki/%F0%90%8C%B7%F0%90%8C%B0%F0%90%8C%BF%F0%90%8C%B1%F0%90%8C%B9%F0%90%8C%B3%F0%90%8C%B0%F0%90%8C%B1%F0%90%8C%B0%F0%90%8C%BF%F0%90%8D%82%F0%90%8C%B2%F0%90%8D%83/Haubidabaurgs

Thanks for the input.

Cheers,

On Tuesday 01 February 2011 19:59:33 Robert Muir wrote:
Hi, it might only be a problem with your xml tools (e.g. firefox).
the problem here is characters outside of the basic multilingual plane
(in this case Gothic).
XML tools typically fall apart on these portions of unicode (in lucene
we recently reverted to a patched/hacked copy of xerces specifically
for this reason).

If you care about characters outside of the basic multilingual plane
actually working, unfortunately you have to start being very very very
particular about what software you use... you can assume most
software/setups WON'T work.
For example, if you were to use mysql's utf8 character set you would
find it doesn't actually support all of UTF-8! in this case you would
need to use the recent 'utf8mb4' or something instead, that is
actually utf-8!
Thats just one example of a well-used piece of software that suffers
from issues like this, there are others.

Its for reasons like these that if support for these languages is
important to you, I would stick with the most simple/textual methods
for input and output: e.g. using things like CSV and JSON if you can.
I would also fully test every component/jar in your application
individually and once you get it working, don't ever upgrade.

In any case, if you are having problems with characters outside of the
basic multilingual plane, and you suspect its actually a bug in Solr,
please open a JIRA issue, especially if you can provide some way to
reproduce it

Re: escaping parenthesis in search query don't work...

WordDelimiterFilterFactory is probably stripping out the parens. If you try
running your terms through http://localhost:8983/solr/admin/analysis.jsp
http://localhost:8983/solr/admin/analysis.jspyou'll see the effects of
various tokenizers and filters, be sure to check
the verbose checkbox.

Here's a very good place to start understanding the intention of
the various options:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

In particular, about WordDelimiterFilterFactory:
split on intra-word delimiters (all non alpha-numeric characters).

   -

   Wi-Fi - Wi, Fi


http://wiki.apache.org/solr/AnalyzersTokenizersTokenFiltersBest
Erick

On Tue, Feb 1, 2011 at 8:52 AM, Pierre-Yves LANDRON pland...@hotmail.comwrote:


 Hello !I've seen that in order to search term with parenthesis=2C those
 have to be=escaped as in title:\(term\).But it doesn't seem to work -
 parenthesis are=n't taken in account.here is the field type I'm using to
 index these data :   fieldType name=text
 class=solr.TextField positionIncrementGap=100
 analyzer type=index tokenizer
 class=solr.WhitespaceTokenizerFactory/!-- in
 this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
 ignoreCase=true expand=false/  --
 !-- Case insensitive stop word removal.
enablePositionIncrements=true ensures that a 'gap' is left to
   allow for accurate phrase queries.
  -- filter
 class=solr.StopFilterFactory
  ignoreCase=true
 words=stopwords.txt
 enablePositionIncrements=true /  filter
 class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1/ filter
 class=solr.LowerCaseFilterFactory/   !-- filter
 class=solr.EnglishPorterFilterFactory protected=protwords.txt/ --
filter class=solr.SnowballPorterFilterFactory
 language=French /   filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory
   synonyms=synonyms.txt
   ignoreCase=true
   expand=true/ filter
 class=solr.StopFilterFactory
  words=stopwords.txt
 ignoreCase=true /filter
 class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=0 catenateNumbers=0
 catenateAll=0 splitOnCaseChange=1/ filter
 class=solr.LowerCaseFilterFactory/   !-- filter
 class=solr.EnglishPorterFilterFactory protected=protwords.txt/ --
filter class=solr.SnowballPorterFilterFactory
 language=French /   filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer /fieldType
 How can I search parenthesis within my query ?Thanks,P.

Re: Terms and termscomponent questions

There are a couple of things going on here. First,
WordDelimiterFilterFactory is
splitting things up on letter/number boundaries. Take a look at:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

for a list of *some* of the available tokenizers. You may want to just use
one of the others, or change the parameters to
WordDelimiterFilterFilterFactory
to not split as it is.

See the page: http://localhost:8983/solr/admin/analysis.jsp and check the
verbose
box to see what the effects of the various elements in your analysis chain
are.
This is a very important page for understanding the analysis part of the
whole
operation.

Second, if you've been trying different things out, you may well have some
old stuff in your index. When you delete documents, the terms are still in
the index until an optimize. I'd advise starting with a clean slate for your
experiments each time. The cheap way to do this is stop your server and
delete solr_home/data/index. Delete the index directory too, not just the
contents. So it's possible your TermsComponent is returning data from
previous
attempts, because I sure don't see how the concatenated terms would be
in this index given the definition you've posted.

And if none of that works, well, we'll try something else G..

Best
Erick

On Tue, Feb 1, 2011 at 10:07 AM, openvictor Open openvic...@gmail.comwrote:

 Dear Erick,

 Thank you for your answer, here is my fieldtype definition. I took the
 standard one because I don't need a better one for this field

 fieldType name=text class=solr.TextField positionIncrementGap=100
 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory language=English
 protected=protwords.txt/
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=0 catenateNumbers=0
 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory language=English
 protected=protwords.txt/
 /analyzer
 /fieldType

 Now my field :

 field name=p_field type=text indexed=true stored=true/

 But I have a doubt now... Do I really put a space between words or is it
 just a coma... If I only put a coma then the whole process is going to be
 impacted ? What I don't really understand is that I find the separate
 words,
 but also their concatenation (but again in one direction only). Let me
 explain : if a have man bear pig I will find :
 manbearpig bearpig but never pigman or anyother combination in a
 different order.

 Thank you very much
 Best Regards,
 Victor

 2011/2/1 Erick Erickson erickerick...@gmail.com

  Nope, this isn't what I'd expect. There are a couple of possibilities:
  1 check out what WordDelimiterFilterFactory is doing, although
  if you're really sending spaces that's probably not it.
  2 Let's see the field and fieldType definitions for the field
  in question. type=text doesn't say anything about analysis,
  and that's where I'd expect you're having trouble. In particular
  if your analysis chain uses KeywordTokenizerFactory for instance.
  3 Look at the admin/schema browse page, look at your field and
  see what the actual tokens are. That'll tell you what TermsComponents
  is returning, perhaps the concatenation is happening somewhere
  else.
 
  Bottom line: Solr will not concatenate terms like this unless you tell it
  to,
  so I suspect you're telling it to, you just don't realize it G...
 
  Best
  Erick
 
  On Tue, Feb 1, 2011 at 1:33 AM, openvictor Open openvic...@gmail.com
  wrote:
 
   Dear Solr users,
  
   I am currently using SolR and TermsComponents to make an auto suggest
 for
   my
   website.
  
   I have a field called p_field indexed and stored with type=text in
 the
   schema xml. Nothing out of the usual.
   I feed to Solr a set of words separated by a coma and a space such as
  (for
   two documents) :
  
   Document 1:
   word11, word12, word13. word14
  
   Document 2:
   word21, word22, word23. word24
  
  
   When I use my newly designed field I get things for the prefix word1
 :
   word11, word12, word13. word14 word11word12 word11word13 etc...
   Is it normal to have the concatenation of words and not only the words
   indexed ? Did I miss something about Terms ?
  
   Thank you very much,
   Best regards all,
   Victor

Re: chaning schema

Erik:

Is this a Tomcat-specific issue? Because I regularly delete just the
data/index directory on my Windows
box running Jetty without any problems. (3_x and trunk)

Mostly want to know because I just encouraged someone to just delete the
index dir based on my
experience...

Thanks
Erick

On Tue, Feb 1, 2011 at 12:24 PM, Erik Hatcher erik.hatc...@gmail.comwrote:

 the trick is, you have to remove the data/ directory, not just the
 data/index subdirectory.  and of course then restart Solr.

 or delete *:*?commit=true, depending on what's the best fit for your ops.

Erik

 On Feb 1, 2011, at 11:41 , Dennis Gearon wrote:

  I tried removing the index directory once, and tomcat refused to sart up
 because
  it didn't have a segments file.
 
 
 
 
  - Original Message 
  From: Erick Erickson erickerick...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Tue, February 1, 2011 5:04:51 AM
  Subject: Re: chaning schema
 
  That sounds right. You can cheat and just remove solr_home/data/index
  rather than delete *:* though (you should probably do that with the Solr
  instance stopped)
 
  Make sure to remove the directory index as well.
 
  Best
  Erick
 
  On Tue, Feb 1, 2011 at 1:27 AM, Dennis Gearon gear...@sbcglobal.net
 wrote:
 
  Anyone got a great little script for changing a schema?
 
  i.e., after changing:
  database,
  the view in the database for data import
  the data-config.xml file
  the schema.xml file
 
  I BELIEVE that I have to run:
  a delete command for the whole index *:*
  a full import and optimize
 
  This all sound right?
 
  Dennis Gearon
 
 
  Signature Warning
  
  It is always a good idea to learn from your own mistakes. It is usually
 a
  better
  idea to learn from others’ mistakes, so you do not have to make them
  yourself.
  from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
 
  EARTH has a Right To Life,
  otherwise we all die.

Re: Partial matches don't work (solr.NGramFilterFactory

2011-02-03 Thread Tomás Fernández Löbbe

On Wed, Feb 2, 2011 at 4:44 PM, Script Head scripth...@gmail.com wrote:

 Yes, I have tried searching on text_ngrams as well and it produces no
 results.

 On a related note, since I have copyField source=text_ngrams
 dest=text/ wouldn't the ngrams produced by text_ngrams field
 definition also be available within the text field?

No, look at:
http://wiki.apache.org/solr/SchemaXml#Copy_Fields

Solr will apply the corresponding analysis chain for each field.

http://wiki.apache.org/solr/SchemaXml#Copy_FieldsAnyway, you should be
able to find the document when doing queries like text_ngrams:hippo
I can see you are storing the field text_ngrams, when you search for
Hippopotamus (and find results), how do you see the field text_ngrams on
the returned docs? you should see the NGrams there (the same data that you
should see when using the analysis page of Solr admin)

Tomás



 2011/2/2 Tomás Fernández Löbbe tomasflo...@gmail.com:
  About this:
 
  copyField source=text_ngrams dest=text/
 
  The NGrams are going to be indexed on the field text_ngrams, not on
  text. For the field text, Solr will apply the text analysis (which I
  guess doesn't have NGrams). You have to search on the text_ngrams
 field,
  something like text_ngrams:hippo or text_ngrams:potamu. Are you
  searching like this?
 
  Tomás
 
  On Wed, Feb 2, 2011 at 4:07 PM, Script Head scripth...@gmail.com
 wrote:
 
  Hello,
 
  I have the following definitions in my schema.xml:
 
  fieldType name=testedgengrams class=solr.TextField
 analyzer type=index
 tokenizer class=solr.LowerCaseTokenizerFactory/
 filter class=solr.NGramFilterFactory minGramSize=3
  maxGramSize=15/
 /analyzer
 analyzer type=query
 tokenizer class=solr.LowerCaseTokenizerFactory/
 /analyzer
  /fieldType
  ...
  field name=text_ngrams type=testedgengrams indexed=true
  stored=true/
  ...
  copyField source=text_ngrams dest=text/
 
  There is a document Hippopotamus is fatter than a Platypus indexed.
  When I search for Hippopotamus I receive the expected result. When I
  search for any partial such as Hippo or potamu I get nothing. I
  could use some guidance.
 
  Script Head

Re: value for maxFieldLength

This is not really vary large, Solr should handle this easily (assuming
you've given it enough memory) so I'd go with a large number, say
20M. If you start running out of memory, then you've probably given
the JVM too little memory.

But Solr should handle this without a burp.

Best
Erick

On Wed, Feb 2, 2011 at 10:20 AM, McGibbney, Lewis John 
lewis.mcgibb...@gcu.ac.uk wrote:

 Hello list,

 I am aware that setting the value of maxFieldLength in solrconfig.xml too
 high may/will result in out-of-mem errors. I wish to provide content
 extraction on a number of pdf documents which are large, by large I mean
 8-11MB (occasionally more), and I am also not sure how many terms reside in
 each field when it is indexed. My question is therefore what is a sensible
 number to set this value to in order to include the majority/all terms
 within documents of this size.

 Thank you

 Lewis


 Glasgow Caledonian University is a registered Scottish charity, number
 SC021474

 Winner: Times Higher Education's Widening Participation Initiative of the
 Year 2009 and Herald Society's Education Initiative of the Year 2009.

 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

 Winner: Times Higher Education's Outstanding Support for Early Career
 Researchers of the Year 2010, GCU as a lead with Universities Scotland
 partners.

 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

Re: geodist and spacial search

Further down that very page G...

Here's an example of sorting by distance ascending:

   -

   ...q=*:*sfield=storept=45.15,-93.85sort=geodist()
aschttp://localhost:8983/solr/select?wt=jsonindent=truefl=name,storeq=*:*sfield=storept=45.15,-93.85sort=geodist()%20asc




The key is just the sort=geodist(), I'm pretty sure that's independent of
the bbox, but
I could be wrong.

Best
Erick

On Wed, Feb 2, 2011 at 11:18 AM, Eric Grobler impalah...@googlemail.comwrote:

 Hi

 In http://wiki.apache.org/solr/SpatialSearch
 there is an example of a bbox filter and a geodist function.

 Is it possible to do a bbox filter and sort by distance - combine the two?

 Thanks
 Ericz

Re: Reg filter criteria on multivalued attribute

Hmmm, why doesn't +relationship:DEF_BY -relationship:BEL_TO
work?

Then I don't think the second part matters...

Best
Erick

On Wed, Feb 2, 2011 at 12:09 PM, bbarani bbar...@gmail.com wrote:


 Hi,

 I have a question on filters on multivalued atrribute. Is there a way to
 filter a multivalue attribute based on a particular value inside that
 attribute?

 Consider the below example.

 arr name=relationship
 strDEF_BY/str
 strBEL_TO/str
 /arr

 I want to do a search which returns the result which just has only the
 relationship DEF_BY and not BEL_TO. Currently if I do a normal search for
 DEF_BY, the documens which contains DEF_BY along with other relationship is
 being returned rather I want the documents that contain only DEF_BY under
 relationship shoudl be returned. Also is there a way to make SOLR return
 the
 documents based on the number of elements in multivalue attribute? If thats
 possible I can first make SOLR return those documents and then do a filter
 against that for my search on top of the results returned.

 Is there a way to write a query to do this? Any pointers or help in this
 regard would be appreciated..

 Thanks,
 Barani
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Reg-filter-criteria-on-multivalued-attribute-tp2406904p2406904.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: DataImportHandler: no queries when using entity=something

Here's a magic URL, not available from the admin page that may help
debugging:

/solr/admin/dataimport.jsp

Best
Erick

On Wed, Feb 2, 2011 at 7:38 PM, Jon Drukman j...@cluttered.com wrote:

 So I'm trying to update a single entity in my index using
 DataImportHandler.

 http://solr:8983/solr/dataimport?command=full-importentity=games

 It ends near-instantaneously without hitting the database at all,
 apparently.

 Status shows:

 str name=Total Requests made to DataSource0/str
 str name=Total Rows Fetched0/str
 str name=Total Documents Processed0/str
 str name=Total Documents Skipped0/str
 str name=
 Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
 /str
 str name=Committed2011-02-02 16:24:13/str
 str name=Optimized2011-02-02 16:24:13/str
 str name=Time taken 0:0:0.20/str

 The query isn't that extreme.  It returns 8771 rows in about 3 seconds.

 How can I debug this?

RE: value for maxFieldLength

2011-02-03 Thread McGibbney, Lewis John

Thank you Erick

Lewis

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 03 February 2011 13:25
To: solr-user@lucene.apache.org
Subject: Re: value for maxFieldLength

This is not really vary large, Solr should handle this easily (assuming
you've given it enough memory) so I'd go with a large number, say
20M. If you start running out of memory, then you've probably given
the JVM too little memory.

But Solr should handle this without a burp.

Best
Erick

On Wed, Feb 2, 2011 at 10:20 AM, McGibbney, Lewis John 
lewis.mcgibb...@gcu.ac.uk wrote:

 Hello list,

 I am aware that setting the value of maxFieldLength in solrconfig.xml too
 high may/will result in out-of-mem errors. I wish to provide content
 extraction on a number of pdf documents which are large, by large I mean
 8-11MB (occasionally more), and I am also not sure how many terms reside in
 each field when it is indexed. My question is therefore what is a sensible
 number to set this value to in order to include the majority/all terms
 within documents of this size.

 Thank you

 Lewis

 Glasgow Caledonian University is a registered Scottish charity, number
 SC021474

 Winner: Times Higher Education's Widening Participation Initiative of the
 Year 2009 and Herald Society's Education Initiative of the Year 2009.

 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

 Winner: Times Higher Education's Outstanding Support for Early Career
 Researchers of the Year 2010, GCU as a lead with Universities Scotland
 partners.

 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

Email has been scanned for viruses by Altman Technologies' email management 
service - www.altman.co.uk/emailsystems

Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 
2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education’s Outstanding Support for Early Career 
Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

Re: facet.mincount

ahh..I see your point..well if that's true, then facet.missing/facet.method
are also not supported?

I'm not sure if this is the case, or the Date Faceting Parameters = Field
Value Faceting  Parameters + the extra ones.
Maybe the page author(s) can clarify.

On 3 February 2011 11:32, dan sutton danbsut...@gmail.com wrote:

 facet.mincount is grouped only under field faceting parameters not
 date faceting parameters

 On Thu, Feb 3, 2011 at 11:08 AM, Savvas-Andreas Moysidis
 savvas.andreas.moysi...@googlemail.com wrote:
  Hi Dan,
 
  I'm probably just not able to spot this, but where does the wiki page
  mention that the facet.mincount is not applicable on date fields?
 
  On 3 February 2011 10:55, Isan Fulia isan.fu...@germinait.com wrote:
 
  I am using solr1.4.1 release version
  I got the following error while using facet.mincount
  java.lang.IllegalStateException: STREAM
 at org.mortbay.jetty.Response.getWriter(Response.java:571)
 at
  org.apache.jasper.runtime.JspWriterImpl.initOut(JspWriterImpl.java:158)
 at
 
 org.apache.jasper.runtime.JspWriterImpl.flushBuffer(JspWriterImpl.java:151)
 at
 
 org.apache.jasper.runtime.PageContextImpl.release(PageContextImpl.java:208)
 at
 
 
 org.apache.jasper.runtime.JspFactoryImpl.internalReleasePageContext(JspFactoryImpl.java:144)
 at
 
 
 org.apache.jasper.runtime.JspFactoryImpl.releasePageContext(JspFactoryImpl.java:95)
 at
 
 
 org.apache.jsp.admin.index_jsp._jspService(org.apache.jsp.admin.index_jsp:397)
 at
  org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:80)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 at
 
 
 org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:373)
 at
  org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:464)
 at
 org.apache.jasper.servlet.JspServlet.service(JspServlet.java:358)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 at
  org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
 at
  org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:367)
 at
 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at
  org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
 at
  org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
 at
  org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
 at
 org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:268)
 at
 org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
 at
  org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:431)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 at
  org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
 at
 
 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1098)
 at
 
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:286)
 at
 
 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
 at
  org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
 at
 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at
  org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
 at
  org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
 at
  org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
 at
 
 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
 at
 
 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at
  org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
 at org.mortbay.jetty.Server.handle(Server.java:285)
 at
  org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
 at
 
 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
 at
 org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
 at
 org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
 at
 
 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
 at
 
 
 org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
 
 
  On 3 February 2011 16:17, dan sutton danbsut...@gmail.com wrote:
 
   I don't think facet.mincount works with date faceting, see here:
  
   http://wiki.apache.org/solr/SimpleFacetParameters
  
   Dan
  
   On Thu, Feb 3, 2011 at 10:11 AM, Isan Fulia

Re: My spellchecker experiment

2011-02-03 Thread Robert Muir

On Thu, Feb 3, 2011 at 8:55 AM, Emmanuel Espina
espinaemman...@gmail.com wrote:
 It uses fuzzy queries instead of a ngram query, and then I rank the results
 by word frequency in the text with the aid of a python script (all that is
 explained in the post). I got pretty good results (between 50% and 90%
 improvements), but slower (about double time).


Hi Emmanuel:

I think its great you are evaluating different techniques here, our
spelling could use some help :)

By the way: we added a new spellchecking technique that sounds quite
similar to what you describe (DirectSpellChecker),
but hopefully without the performance issues.
Its only available in trunk (http://svn.apache.org/repos/asf/lucene/dev/trunk/)

I tried to do a very rough evaluation on its jira issue:
https://issues.apache.org/jira/browse/LUCENE-2507, but nothing very
serious and as in-depth as what it looks like you did.

Anyway, if you want to play you can experiment with it either at the
lucene level (its in contrib/spellchecker) or via solr, by using
DirectSolrSpellChecker... though I think the parameters in the example
solrconfig are likely not the best :)

I have an app using this more fleshed-out config (in combination with
the new collation options), and it seems to be reasonable:

!-- a spellchecker that uses no auxiliary index --
lst name=spellchecker
  str name=namedefault/str
  str name=fieldtext/str
  str name=classnamesolr.DirectSolrSpellChecker/str
  str name=minPrefix1/str
  str name=maxEdits2/str
  str name=maxInspections25/str !-- probably way too high
for most apps though --
  str name=minQueryLength3/str
  str name=comparatorClassfreq/str
  str name=thresholdTokenFrequency1/str
  str 
name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str
/lst

Re: Terms and termscomponent questions

Dear Erick,

You were totally right about the fact that I didn't use any space to
separate words, cause SolR to concatenate words !
Everything is solved now. Thank you very much for your help !

Best regards,
Victor Kabdebon

2011/2/3 Erick Erickson erickerick...@gmail.com

 There are a couple of things going on here. First,
 WordDelimiterFilterFactory is
 splitting things up on letter/number boundaries. Take a look at:
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

 for a list of *some* of the available tokenizers. You may want to just use
 one of the others, or change the parameters to
 WordDelimiterFilterFilterFactory
 to not split as it is.

 See the page: http://localhost:8983/solr/admin/analysis.jsp and check the
 verbose
 box to see what the effects of the various elements in your analysis chain
 are.
 This is a very important page for understanding the analysis part of the
 whole
 operation.

 Second, if you've been trying different things out, you may well have some
 old stuff in your index. When you delete documents, the terms are still in
 the index until an optimize. I'd advise starting with a clean slate for
 your
 experiments each time. The cheap way to do this is stop your server and
 delete solr_home/data/index. Delete the index directory too, not just the
 contents. So it's possible your TermsComponent is returning data from
 previous
 attempts, because I sure don't see how the concatenated terms would be
 in this index given the definition you've posted.

 And if none of that works, well, we'll try something else G..

 Best
 Erick

 On Tue, Feb 1, 2011 at 10:07 AM, openvictor Open openvic...@gmail.com
 wrote:

  Dear Erick,
 
  Thank you for your answer, here is my fieldtype definition. I took the
  standard one because I don't need a better one for this field
 
  fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true/
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=1 catenateNumbers=1
  catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt/
  /analyzer
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true/
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true/
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=0 catenateNumbers=0
  catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt/
  /analyzer
  /fieldType
 
  Now my field :
 
  field name=p_field type=text indexed=true stored=true/
 
  But I have a doubt now... Do I really put a space between words or is it
  just a coma... If I only put a coma then the whole process is going to be
  impacted ? What I don't really understand is that I find the separate
  words,
  but also their concatenation (but again in one direction only). Let me
  explain : if a have man bear pig I will find :
  manbearpig bearpig but never pigman or anyother combination in a
  different order.
 
  Thank you very much
  Best Regards,
  Victor
 
  2011/2/1 Erick Erickson erickerick...@gmail.com
 
   Nope, this isn't what I'd expect. There are a couple of possibilities:
   1 check out what WordDelimiterFilterFactory is doing, although
   if you're really sending spaces that's probably not it.
   2 Let's see the field and fieldType definitions for the field
   in question. type=text doesn't say anything about analysis,
   and that's where I'd expect you're having trouble. In particular
   if your analysis chain uses KeywordTokenizerFactory for instance.
   3 Look at the admin/schema browse page, look at your field and
   see what the actual tokens are. That'll tell you what
 TermsComponents
   is returning, perhaps the concatenation is happening somewhere
   else.
  
   Bottom line: Solr will not concatenate terms like this unless you tell
 it
   to,
   so I suspect you're telling it to, you just don't realize it G...
  
   Best
   Erick
  
   On Tue, Feb 1, 2011 at 1:33 AM, openvictor Open openvic...@gmail.com
   wrote:
  
Dear Solr users,
   
I am currently using SolR and TermsComponents to make an auto suggest
  for
my
website.
   
I have a field called p_field indexed and stored with type=text in
  the
schema xml. Nothing out of the usual.
I feed to Solr a set of words separated by a coma and a space such as
   (for
two documents) :
   
Document 1:
word11, word12, word13. word14
   
Document

Re: facet.mincount


Hi

facet.mincount not works with facet.date option afaik.
There is an issue for it as solr-343, but resolved.
Try apply patch, provided as a solution in this issue may solve the
problem.
Fix version for this may be 1.5

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/facet-mincount-tp2411930p2414232.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using terms and N-gram

Thank you, I will do that and hopefuly it will be handy !

But can someone explain me difference between CommonGramFIlterFactory et
NGramFilterFactory ? ( Maybe the solution is there)

Thank you all,
best regards

2011/2/3 Grijesh pintu.grij...@gmail.com


 Use analysis.jsp to see what happening at index time and query time with
 your
 input data.You can use highlighting to see if match found.

 -
 Thanx:
 Grijesh
 http://lucidimagination.com
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Using-terms-and-N-gram-tp2410938p2411244.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: geodist and spacial search

2011-02-03 Thread Eric Grobler

Hi Erick,

Thanks I saw that example, but I am trying to sort by distance AND specify
the max distance in 1 query.

The reason is:
running bbox on 2 million documents with a 20km distance takes only 200ms.
Sorting 2 million documents by distance takes over 1.5 seconds!

So it will be much faster for solr to first filter the 20km documents and
then to sort them.

Regards
Ericz

On Thu, Feb 3, 2011 at 1:27 PM, Erick Erickson erickerick...@gmail.comwrote:

 Further down that very page G...

 Here's an example of sorting by distance ascending:

   -

   ...q=*:*sfield=storept=45.15,-93.85sort=geodist()
 asc
 http://localhost:8983/solr/select?wt=jsonindent=truefl=name,storeq=*:*sfield=storept=45.15,-93.85sort=geodist()%20asc
 




 The key is just the sort=geodist(), I'm pretty sure that's independent of
 the bbox, but
 I could be wrong.

 Best
 Erick

 On Wed, Feb 2, 2011 at 11:18 AM, Eric Grobler impalah...@googlemail.com
 wrote:

  Hi
 
  In http://wiki.apache.org/solr/SpatialSearch
  there is an example of a bbox filter and a geodist function.
 
  Is it possible to do a bbox filter and sort by distance - combine the
 two?
 
  Thanks
  Ericz

Re: chaning schema

2011-02-03 Thread Dennis Gearon

Well, the nice thing is that I have an Amazon based dev server, and it's AMI 
stored. So if I screw something up, I just throw away that server and get a 
fresh one all configured and full of dev data and BAM back to where I was.

So I'll try it again with the -rf flags. 

I did shut down the server and I am using Tomcat.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Gora Mohanty g...@mimirtech.com
To: solr-user@lucene.apache.org
Sent: Thu, February 3, 2011 6:56:29 AM
Subject: Re: chaning schema

On Thu, Feb 3, 2011 at 6:47 PM, Erick Erickson erickerick...@gmail.com wrote:
 Erik:

 Is this a Tomcat-specific issue? Because I regularly delete just the
 data/index directory on my Windows
 box running Jetty without any problems. (3_x and trunk)

 Mostly want to know because I just encouraged someone to just delete the
 index dir based on my
 experience...

 Thanks
 Erick

 On Tue, Feb 1, 2011 at 12:24 PM, Erik Hatcher erik.hatc...@gmail.comwrote:

 the trick is, you have to remove the data/ directory, not just the
 data/index subdirectory.  and of course then restart Solr.

 or delete *:*?commit=true, depending on what's the best fit for your ops.

Erik

 On Feb 1, 2011, at 11:41 , Dennis Gearon wrote:

  I tried removing the index directory once, and tomcat refused to sart up
 because
  it didn't have a segments file.
[...]

I have seen this error with Tomcat, but in my experience, this has been due
to doing a rm data/index/* rather than rm -rf /data/index, or due to doing
this without first shutting down Tomcat.

Regards,
Gora

Re: Open Too Many Files

2011-02-03 Thread Gustavo Maia

Try it.

ulimit -n20



2011/2/3 Grijesh pintu.grij...@gmail.com


 best option to use
 useCompoundFiletrue/useCompoundFile

 decreasing mergeFactor may cause indexing slow

 -
 Thanx:
 Grijesh
 http://lucidimagination.com
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Open-Too-Many-Files-tp2406289p2412415.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Terms and termscomponent questions

Ah, good. Good luck with the rest of your app! WordDelimiterFilterFactory
is powerful, but tricky G...

Best
Erick

On Thu, Feb 3, 2011 at 9:51 AM, openvictor Open openvic...@gmail.comwrote:

 Dear Erick,

 You were totally right about the fact that I didn't use any space to
 separate words, cause SolR to concatenate words !
 Everything is solved now. Thank you very much for your help !

 Best regards,
 Victor Kabdebon

 2011/2/3 Erick Erickson erickerick...@gmail.com

  There are a couple of things going on here. First,
  WordDelimiterFilterFactory is
  splitting things up on letter/number boundaries. Take a look at:
  http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
 
  for a list of *some* of the available tokenizers. You may want to just
 use
  one of the others, or change the parameters to
  WordDelimiterFilterFilterFactory
  to not split as it is.
 
  See the page: http://localhost:8983/solr/admin/analysis.jsp and check
 the
  verbose
  box to see what the effects of the various elements in your analysis
 chain
  are.
  This is a very important page for understanding the analysis part of the
  whole
  operation.
 
  Second, if you've been trying different things out, you may well have
 some
  old stuff in your index. When you delete documents, the terms are still
 in
  the index until an optimize. I'd advise starting with a clean slate for
  your
  experiments each time. The cheap way to do this is stop your server and
  delete solr_home/data/index. Delete the index directory too, not just
 the
  contents. So it's possible your TermsComponent is returning data from
  previous
  attempts, because I sure don't see how the concatenated terms would be
  in this index given the definition you've posted.
 
  And if none of that works, well, we'll try something else G..
 
  Best
  Erick
 
  On Tue, Feb 1, 2011 at 10:07 AM, openvictor Open openvic...@gmail.com
  wrote:
 
   Dear Erick,
  
   Thank you for your answer, here is my fieldtype definition. I took the
   standard one because I don't need a better one for this field
  
   fieldType name=text class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
   words=stopwords.txt enablePositionIncrements=true/
   filter class=solr.WordDelimiterFilterFactory generateWordParts=1
   generateNumberParts=1 catenateWords=1 catenateNumbers=1
   catenateAll=0 splitOnCaseChange=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.SnowballPorterFilterFactory language=English
   protected=protwords.txt/
   /analyzer
   analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
   ignoreCase=true expand=true/
   filter class=solr.StopFilterFactory ignoreCase=true
   words=stopwords.txt enablePositionIncrements=true/
   filter class=solr.WordDelimiterFilterFactory generateWordParts=1
   generateNumberParts=1 catenateWords=0 catenateNumbers=0
   catenateAll=0 splitOnCaseChange=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.SnowballPorterFilterFactory language=English
   protected=protwords.txt/
   /analyzer
   /fieldType
  
   Now my field :
  
   field name=p_field type=text indexed=true stored=true/
  
   But I have a doubt now... Do I really put a space between words or is
 it
   just a coma... If I only put a coma then the whole process is going to
 be
   impacted ? What I don't really understand is that I find the separate
   words,
   but also their concatenation (but again in one direction only). Let me
   explain : if a have man bear pig I will find :
   manbearpig bearpig but never pigman or anyother combination in a
   different order.
  
   Thank you very much
   Best Regards,
   Victor
  
   2011/2/1 Erick Erickson erickerick...@gmail.com
  
Nope, this isn't what I'd expect. There are a couple of
 possibilities:
1 check out what WordDelimiterFilterFactory is doing, although
if you're really sending spaces that's probably not it.
2 Let's see the field and fieldType definitions for the field
in question. type=text doesn't say anything about analysis,
and that's where I'd expect you're having trouble. In particular
if your analysis chain uses KeywordTokenizerFactory for instance.
3 Look at the admin/schema browse page, look at your field and
see what the actual tokens are. That'll tell you what
  TermsComponents
is returning, perhaps the concatenation is happening somewhere
else.
   
Bottom line: Solr will not concatenate terms like this unless you
 tell
  it
to,
so I suspect you're telling it to, you just don't realize it G...
   
Best
Erick
   
On Tue, Feb 1, 2011 at 1:33 AM, openvictor Open 
 openvic...@gmail.com
wrote:
   
 Dear Solr users,

 I am currently using SolR and TermsComponents to make an auto

Solr for finding similar word between two documents

2011-02-03 Thread rohan rai

Is there a way to use solr and get similar words between two document
(files).

Any ideas

Regards
Rohan

What is the best protocol for data transfer rate HTTP or RMI?

2011-02-03 Thread Gustavo Maia

Hello,



I am doing a comparative study between Lucene and Solr and wish to obtain
more concrete data on the data transfer using the lucene RemoteSearch that
uses RMI and data transfer of SOLR that uses the HTTP protocol.




Gustavo Maia

Use Parallel Search

2011-02-03 Thread Gustavo Maia

Hello,

Let me give a brief description of my scenario.
Today I am only using Lucene 2.9.3. I have an index of 30 million documents
distributed on three machines and each machine with 6 hds (15k rmp).
The server queries the search index using the remote class search. And each
machine is made to search using the parallel search (search simultaneously
in 6 hds).
So during the search are simulating using the three machines and 18 hds,
returning me to a very good response time.


Today I am studying the SOLR and am interested in knowing more about the
searches and use of distributed parallel search on the same machine. What
would be the best scenario using SOLR that is better than I already am using
today only with lucene?
  Note: I need to have installed on each machine 6 SOLR instantiate from my
server? One for each hd? Or would some other alternative way for me to use
the 6 hds without having 6 instances of SORL server?

  Another question would be if the SOLR would have some limiting size index
for Hard drive? It would be interesting not index too big because when the
index increased the longer the search.

Thanks for everything.


Gustavo Maia

Re: chaning schema

2011-02-03 Thread Jonathan Rochkind

It could be related Tomcat.  I've had inconsistent experiences there 
too, I _thought_ I could delete just the contents of the data/ 
directory, but at some point I realized that wasn't working, confusing 
me as to whether I was remembering correctly that deleting just the 
contents ever worked.   At the moment, on my setup, I definitely need to 
delete the whole data/ directory .


At one point I switched my setup from jetty to tomcat, but at about the 
same point I switched my setup from single core to multi-core too. So it 
could be a multi-core thing too (which seems somewhat more likely than 
jetty vs tomcat making a difference). Or it could be something 
completely else that none of us know, I just report my limited 
observations from experience. :)


Jonathan

On 2/3/2011 8:17 AM, Erick Erickson wrote:

Erik:

Is this a Tomcat-specific issue? Because I regularly delete just the
data/index directory on my Windows
box running Jetty without any problems. (3_x and trunk)

Mostly want to know because I just encouraged someone to just delete the
index dir based on my
experience...

Thanks
Erick

On Tue, Feb 1, 2011 at 12:24 PM, Erik Hatchererik.hatc...@gmail.comwrote:


the trick is, you have to remove the data/ directory, not just the
data/index subdirectory.  and of course then restart Solr.

or delete *:*?commit=true, depending on what's the best fit for your ops.

Erik

On Feb 1, 2011, at 11:41 , Dennis Gearon wrote:


I tried removing the index directory once, and tomcat refused to sart up

because

it didn't have a segments file.




- Original Message 
From: Erick Ericksonerickerick...@gmail.com
To: solr-user@lucene.apache.org
Sent: Tue, February 1, 2011 5:04:51 AM
Subject: Re: chaning schema

That sounds right. You can cheat and just removesolr_home/data/index
rather than delete *:* though (you should probably do that with the Solr
instance stopped)

Make sure to remove the directory index as well.

Best
Erick

On Tue, Feb 1, 2011 at 1:27 AM, Dennis Gearongear...@sbcglobal.net

wrote:

Anyone got a great little script for changing a schema?

i.e., after changing:
database,
the view in the database for data import
the data-config.xml file
the schema.xml file

I BELIEVE that I have to run:
a delete command for the whole index *:*
a full import and optimize

This all sound right?

Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually

a

better
idea to learn from others’ mistakes, so you do not have to make them
yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

RE: Using terms and N-gram

2011-02-03 Thread Bob Sandiford

I don't suppose it's something silly like the fact that your indexing chain 
includes 'words=stopwords.txt', and your query chain does not?

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 
_
Early COSUGI birds get the worm! 
Register by 15 February and get a one time viewing of the three course 
Circulation Basics self-paced training suite.
http://www.cosugi.org/ 




 -Original Message-
 From: openvictor Open [mailto:openvic...@gmail.com]
 Sent: Thursday, February 03, 2011 12:02 AM
 To: solr-user@lucene.apache.org
 Subject: Using terms and N-gram
 
 Dear all,
 
 I am trying to implement an autocomplete system for research. But I am
 stuck
 on some problems that I can't solve.
 
 Here is my problem :
 I give text like :
 the cat is black and I want to explore all 1 gram to 8 gram for all
 the
 text that are passed :
 the, cat, is, black, the cat, cat is, is black, etc...
 
 In order to do that I have defined the following fieldtype in my schema
 :
 
 !--Custom fieldtype--
 fieldType name=ngram_field class=solr.TextField
   analyzer type=index
 tokenizer class=solr.LowerCaseTokenizerFactory /
 filter class=solr.CommonGramsFilterFactory words=stopwords.txt
 ignoreCase=true maxGramSize=8
minGramSize=1/
   /analyzer
   analyzer type=query
 tokenizer class=solr.LowerCaseTokenizerFactory /
 filter class=solr.CommonGramsFilterFactory ignoreCase=true
 maxGramSize=8
minGramSize=1/
   /analyzer
 /fieldType
 
 
 Then the following field :
 
 field name=p_title_ngram type=ngram_field indexed=true
 stored=true/
 
 Then I feed solr with some phrases and I was really surprised to see
 that
 Solr didn't behave as expected.
 I went to the schema browser to see the result for the very profound
 query :
 the cat is black and it rains
 
 The results are quite deceiving : first 1 grams are not found. some 2
 grams
 are found like : the_cat, and_it etc... But not what I expected.
 Is there something I am missing here ? (by the way I also tried to
 remove
 the mingramsize and maxgramsize even the words).
 
 Thank you,
 Victor Kabdebon

Re: Using terms and N-gram

First, you'll get a lot of insight by defining something simply and looking
at the analysis page from solr admin. That's a very valuable page.

To your question:
commongrams are shingles that work between stopwords and
other words. For instance, this is some text gets analyzed into
this, this_is, is, is_some, some text. Note that the stopwords
are the only things that get combined with the text after.

NGrams form on letters. It's too long to post the whole thing, but
the above phrase gets analyzed as
t, h, i, s, th, hi, is, i, s, is, s, o, m, e, so, om, me.. It splits a
single
token into grams whereas commongrams essentially combines tokens
when they're stopwords.

Have you looked at shingles? See:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
Best
Erick


On Thu, Feb 3, 2011 at 10:15 AM, openvictor Open openvic...@gmail.comwrote:

 Thank you, I will do that and hopefuly it will be handy !

 But can someone explain me difference between CommonGramFIlterFactory et
 NGramFilterFactory ? ( Maybe the solution is there)

 Thank you all,
 best regards

 2011/2/3 Grijesh pintu.grij...@gmail.com

 
  Use analysis.jsp to see what happening at index time and query time with
  your
  input data.You can use highlighting to see if match found.
 
  -
  Thanx:
  Grijesh
  http://lucidimagination.com
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Using-terms-and-N-gram-tp2410938p2411244.html
  Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using terms and N-gram

Thank you for these inputs.

I was silly asking for ngrams because I already knew it. I think I was tired
yesterday...

Thank you Eric Erickson, once again you gave me a more than useful comment.
Indeed Shingles seems to be the perfect fit for the work I want to do. I
will try to implement that tonight and I will come back to see if it's
working.

Regards,
Victor

2011/2/3 Erick Erickson erickerick...@gmail.com

 First, you'll get a lot of insight by defining something simply and looking
 at the analysis page from solr admin. That's a very valuable page.

 To your question:
 commongrams are shingles that work between stopwords and
 other words. For instance, this is some text gets analyzed into
 this, this_is, is, is_some, some text. Note that the stopwords
 are the only things that get combined with the text after.

 NGrams form on letters. It's too long to post the whole thing, but
 the above phrase gets analyzed as
 t, h, i, s, th, hi, is, i, s, is, s, o, m, e, so, om, me.. It splits a
 single
 token into grams whereas commongrams essentially combines tokens
 when they're stopwords.

 Have you looked at shingles? See:

 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
 Best
 Erick


 On Thu, Feb 3, 2011 at 10:15 AM, openvictor Open openvic...@gmail.com
 wrote:

  Thank you, I will do that and hopefuly it will be handy !
 
  But can someone explain me difference between CommonGramFIlterFactory et
  NGramFilterFactory ? ( Maybe the solution is there)
 
  Thank you all,
  best regards
 
  2011/2/3 Grijesh pintu.grij...@gmail.com
 
  
   Use analysis.jsp to see what happening at index time and query time
 with
   your
   input data.You can use highlighting to see if match found.
  
   -
   Thanx:
   Grijesh
   http://lucidimagination.com
   --
   View this message in context:
  
 
 http://lucene.472066.n3.nabble.com/Using-terms-and-N-gram-tp2410938p2411244.html
   Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr for finding similar word between two documents

2011-02-03 Thread Gora Mohanty

On Thu, Feb 3, 2011 at 11:32 PM, rohan rai hiroha...@gmail.com wrote:
 Is there a way to use solr and get similar words between two document
 (files).
[...]

This is *way* too vague t make any sense out of. Could you elaborate,
as I could have sworn that what you seem to want is the essential
function of a search engine.

Regards,
Gora

Re: Solr for finding similar word between two documents

Rohan : what you want to do can be done with quite little effort if your
document has a limited size (up to some Mo) with common and basic structures
like Hasmap.

Do you have any additional information on your problem so that we can give
you more useful inputs ?

2011/2/3 Gora Mohanty g...@mimirtech.com

 On Thu, Feb 3, 2011 at 11:32 PM, rohan rai hiroha...@gmail.com wrote:
  Is there a way to use solr and get similar words between two document
  (files).
 [...]

 This is *way* too vague t make any sense out of. Could you elaborate,
 as I could have sworn that what you seem to want is the essential
 function of a search engine.

 Regards,
 Gora

Re: Use Parallel Search

2011-02-03 Thread Em

Hello Gustavo,

well, I did not use Nutch at all, but I got some experience with using Solr.

In Solr you could use a multicore-setup where each core points to
another hard-drive of your server. For other Solr-Servers ( and cores as
well ) each core is a seperate index, so to query all drives of one
server you have to do a distributed request to get all results from all
cores (indizes).
You got a little bit Http-overhead, because you have to send six
http-requests per server to get your results.

You could also set up 6 Solr-instances per box or 3 with two cores per
instance, but I do not see any reason to do so.


Could you please explain what you mean with remote class search? Is it
a Nutch-specific thing I never heard before?

There is no difference between a Lucene-Index created by Solr and a
Lucene-Index created by Nutch or Lucene itself.
Solr is just a Server-implementation of the Lucene-Framework.

Regards

Am 03.02.2011 19:06, schrieb Gustavo Maia:
 Hello,

 Let me give a brief description of my scenario.
 Today I am only using Lucene 2.9.3. I have an index of 30 million documents
 distributed on three machines and each machine with 6 hds (15k rmp).
 The server queries the search index using the remote class search. And each
 machine is made to search using the parallel search (search simultaneously
 in 6 hds).
 So during the search are simulating using the three machines and 18 hds,
 returning me to a very good response time.


 Today I am studying the SOLR and am interested in knowing more about the
 searches and use of distributed parallel search on the same machine. What
 would be the best scenario using SOLR that is better than I already am using
 today only with lucene?
   Note: I need to have installed on each machine 6 SOLR instantiate from my
 server? One for each hd? Or would some other alternative way for me to use
 the 6 hds without having 6 instances of SORL server?

   Another question would be if the SOLR would have some limiting size index
 for Hard drive? It would be interesting not index too big because when the
 index increased the longer the search.

 Thanks for everything.


 Gustavo Maia

Re: Solr for finding similar word between two documents

2011-02-03 Thread rohan rai

Lets say 1 have document(file) which is large and contains word inside it.

And the 2nd document also is a text file.

Problem is to find all those words in 2nd document which is present in first
document
when both of the files are large enough.

Regards
Rohan

On Fri, Feb 4, 2011 at 1:01 AM, openvictor Open openvic...@gmail.comwrote:

 Rohan : what you want to do can be done with quite little effort if your
 document has a limited size (up to some Mo) with common and basic
 structures
 like Hasmap.

 Do you have any additional information on your problem so that we can give
 you more useful inputs ?

 2011/2/3 Gora Mohanty g...@mimirtech.com

  On Thu, Feb 3, 2011 at 11:32 PM, rohan rai hiroha...@gmail.com wrote:
   Is there a way to use solr and get similar words between two document
   (files).
  [...]
 
  This is *way* too vague t make any sense out of. Could you elaborate,
  as I could have sworn that what you seem to want is the essential
  function of a search engine.
 
  Regards,
  Gora

Re: geodist and spacial search

2011-02-03 Thread Grant Ingersoll

Use a filter query? See the {!geofilt} stuff on the wiki page. That gives you
your filter to restrict down your result set, then you can sort by exact
distance to get your sort of just those docs that make it through the filter.

On Feb 3, 2011, at 10:24 AM, Eric Grobler wrote:

Hi Erick,

Thanks I saw that example, but I am trying to sort by distance AND specify
the max distance in 1 query.

The reason is:
running bbox on 2 million documents with a 20km distance takes only 200ms.
Sorting 2 million documents by distance takes over 1.5 seconds!

So it will be much faster for solr to first filter the 20km documents and
then to sort them.

Regards
Ericz

On Thu, Feb 3, 2011 at 1:27 PM, Erick Erickson erickerick...@gmail.comwrote:

Further down that very page G...

Here's an example of sorting by distance ascending:

...q=*:*sfield=storept=45.15,-93.85sort=geodist()
asc
http://localhost:8983/solr/select?wt=jsonindent=truefl=name,storeq=*:*sfield=storept=45.15,-93.85sort=geodist()%20asc

The key is just the sort=geodist(), I'm pretty sure that's independent of
the bbox, but
I could be wrong.

Best
Erick

On Wed, Feb 2, 2011 at 11:18 AM, Eric Grobler impalah...@googlemail.com
wrote:

In http://wiki.apache.org/solr/SpatialSearch
there is an example of a bbox filter and a geodist function.

Is it possible to do a bbox filter and sort by distance - combine the
two?

Thanks
Ericz

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search

DataImportHandler usage with RDF database

2011-02-03 Thread McGibbney, Lewis John

Hello List,

I am very interested in DataImportHandler. I have data stored in an RDF db and 
wish to use this data to boost query results via Solr. I wish to keep this data 
stored in db as I have a web app which directly maintains this db. Is it 
possible to use a DataImportHandler to read RDF data from db in memory, without 
sending an index commit to Solr. As far as I can see DataImportHandler 
currently supports full and delta imports which mean I would be indexing. So 
far I have yet to find a requestHandler which is able to read then store data 
in memory, then use this data elsewhere prior to returning documents via 
queryResponseWriter.

Can anyone provide their thoughts/insight

Thank you

Lewis


Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education's Widening Participation Initiative of the Year 
2009 and Herald Society's Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education's Outstanding Support for Early Career 
Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

Re: Use Parallel Search

2011-02-03 Thread Grant Ingersoll

Can you describe a bit more what you are searching (types of docs) and what 
your query rate looks like?  Also, what features are you using?  Faceting?  
Sorting? ...

On Feb 3, 2011, at 1:06 PM, Gustavo Maia wrote:

 Hello,
 
 Let me give a brief description of my scenario.
 Today I am only using Lucene 2.9.3. I have an index of 30 million documents
 distributed on three machines and each machine with 6 hds (15k rmp).
 The server queries the search index using the remote class search. And each
 machine is made to search using the parallel search (search simultaneously
 in 6 hds).
 So during the search are simulating using the three machines and 18 hds,
 returning me to a very good response time.
 
 
 Today I am studying the SOLR and am interested in knowing more about the
 searches and use of distributed parallel search on the same machine. What
 would be the best scenario using SOLR that is better than I already am using
 today only with lucene?
  Note: I need to have installed on each machine 6 SOLR instantiate from my
 server?

No, you generally treat Solr like a database and provision it separately from 
you app.  30M docs may very well all fit nicely on one machine depending on 
some of your answers above (I've certainly seen bigger)


 One for each hd? Or would some other alternative way for me to use
 the 6 hds without having 6 instances of SORL server?

I'd probably start simple and see what I can do in 1 instance of Solr and what 
query/indexing throughput you can get.

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search

Index Not Matching

2011-02-03 Thread Esclusa, Will

Greetings!

 

My organization is new to SOLR, so please bare with me.  At times, we 
experience an out of sync condition between SOLR index files and our Database. 
We resolved that by clearing the index file and performing a full crawl of the 
database. Last time we noticed an out of sync condition, we went through our 
procedure of deleting and crawling, but this time it did not fix it.  

 

For example, search for swim on the DB and we get 440 products, but yet SOLR 
states we have 214 products. Has anyone experience anything like this? Does 
anyone have any suggestions on a trace we can turn on? Again, we are new to 
SOLR so any help you can provide is greatly appreciated. 

 

Thanks!

 

Will

HTTP ERROR 400 undefined field: *

2011-02-03 Thread Jed Glazner


Hey Guys,

I was working on an checkout of the 3.x branch from about 6 months ago. 
Everything was working pretty well, but we decided that we should update 
and get what was at the head.  However after upgrading, I am now getting 
this error through the admin:


HTTP ERROR 400 undefined field: *

If I clear the fl parameter (default is set to *, score) then it works 
fine with one big problem, no score data.  If I try and set fl=score I 
get the same error except it says undefined field: score?!


This works great in the older version, what changed?  I've googled for 
about an hour now and I can't seem to find anything.



Jed.

Re: Index Not Matching

Hello,

Are you definitely positive your database isn't updated after you index your
data? Are you querying against the same field(s) specifying the same
criteria both in Solr and in the database?
Any chance you might be pointing to a dev/test instance of Solr ?

Regards,
- Savvas

On 3 February 2011 20:17, Esclusa, Will william.escl...@bonton.com wrote:

 Greetings!



 My organization is new to SOLR, so please bare with me.  At times, we
 experience an out of sync condition between SOLR index files and our
 Database. We resolved that by clearing the index file and performing a full
 crawl of the database. Last time we noticed an out of sync condition, we
 went through our procedure of deleting and crawling, but this time it did
 not fix it.



 For example, search for swim on the DB and we get 440 products, but yet
 SOLR states we have 214 products. Has anyone experience anything like this?
 Does anyone have any suggestions on a trace we can turn on? Again, we are
 new to SOLR so any help you can provide is greatly appreciated.



 Thanks!



 Will

Re: Function Question

2011-02-03 Thread William Bell

Thoughts?

On Wed, Feb 2, 2011 at 10:38 PM, Bill Bell billnb...@gmail.com wrote:

 This is posted as an enhancement on SOLR-2345.

 I am willing to work on it. But I am stuck. I would like to loop through
 the lat/long values when they are stored in a multiValue list. But it
 appears that I cannot figure out to do that. For example:

 sort=geodist() asc
 This should grab the closest point in the MultiValue list, and return the
 distance so that is can be scored.
 The problem is I cannot find a way to get the MultiValue list?
 In function:
 src/java/org/apache/solr/search/function/distance/HaversineConstFunction.ja
 va
 Has code similar to:
 VectorValueSource p2;
 this.p2 = vs
 ListValueSource sources = p2.getSources();
 ValueSource latSource = sources.get(0);
 ValueSource lonSource = sources.get(1);
 DocValues latVals = latSource.getValues(context1, readerContext1);
 DocValues lonVals = lonSource.getValues(context1, readerContext1);
 double latRad = latVals.doubleVal(doc) * DistanceUtils.DEGREES_TO_RADIANS;
 double lonRad = lonVals.doubleVal(doc) * DistanceUtils.DEGREES_TO_RADIANS;
 etc...
 It would be good if I could loop through sources.get() but it only returns
 2 sources even when there are 2 pairs of lat/long. The getSources() only
 returns the following:
 sources:[double(store_0_coordinate), double(store_1_coordinate)]
 How do I just get the 4 values in the function?

Re: Index Not Matching

that's odd..are you viewing the results through your application or the
admin console? if you aren't, I'd suggest you use the admin console just to
eliminate the possibility of an application bug.
We had a similar problem in the past and turned out to be a mixup of our
dev/test instances..

On 3 February 2011 21:41, Esclusa, Will william.escl...@bonton.com wrote:

 Hello Saavs,

 I am 100% sure we are not updating the DB after we index the data. We
 are specifying the same fields on both queries. Our prod boxes do not
 have access to QA or DEV, so I would expect a connection error when
 indexing if this is the case. No connection errors in the logs.



 -Original Message-
 From: Savvas-Andreas Moysidis
 [mailto:savvas.andreas.moysi...@googlemail.com]
 Sent: Thursday, February 03, 2011 4:26 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Index Not Matching

 Hello,

 Are you definitely positive your database isn't updated after you index
 your
 data? Are you querying against the same field(s) specifying the same
 criteria both in Solr and in the database?
 Any chance you might be pointing to a dev/test instance of Solr ?

 Regards,
 - Savvas

 On 3 February 2011 20:17, Esclusa, Will william.escl...@bonton.com
 wrote:

  Greetings!
 
 
 
  My organization is new to SOLR, so please bare with me.  At times, we
  experience an out of sync condition between SOLR index files and our
  Database. We resolved that by clearing the index file and performing a
 full
  crawl of the database. Last time we noticed an out of sync condition,
 we
  went through our procedure of deleting and crawling, but this time it
 did
  not fix it.
 
 
 
  For example, search for swim on the DB and we get 440 products, but
 yet
  SOLR states we have 214 products. Has anyone experience anything like
 this?
  Does anyone have any suggestions on a trace we can turn on? Again, we
 are
  new to SOLR so any help you can provide is greatly appreciated.
 
 
 
  Thanks!
 
 
 
  Will

DB2 and DataImportHandler

2011-02-03 Thread no spam

I get the following error when trying to index using a DataImportHandler
with solr 1.4.1.  I see that there is an open JIRA with no resolution.  Do I
have to write my own data import handler to work around this issue?

Thanks,
Mark

Feb 3, 2011 5:21:09 PM org.apache.solr.handler.dataimport.JdbcDataSource
closeConnection
SEVERE: Ignoring Error when closing connection
com.ibm.db2.jcc.b.SqlException: [jcc][t4][10251][10308][3.50.152]
java.sql.Connection.close() requested while a transaction is in progress on
the connection.
The transaction remains active, and the connection cannot be closed.
ERRORCODE=-4471, SQLSTATE=null
at com.ibm.db2.jcc.b.wc.a(wc.java:55)
at com.ibm.db2.jcc.b.wc.a(wc.java:119)
at com.ibm.db2.jcc.b.eb.t(eb.java:996)
at com.ibm.db2.jcc.b.eb.w(eb.java:1019)
at com.ibm.db2.jcc.b.eb.u(eb.java:1005)
at com.ibm.db2.jcc.b.eb.close(eb.java:989)
at
org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:399)
at
org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:390)
at
org.apache.solr.handler.dataimport.DataConfig$Entity.clearCache(DataConfig.java:173)
at
org.apache.solr.handler.dataimport.DataConfig.clearCaches(DataConfig.java:331)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:339)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)

Re: Index Not Matching

2011-02-03 Thread Geert-Jan Brits

Make sure your index is completely commited.

curl 'http://localhost:8983/solr/update?commit=true'

http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22

for an overview:
http://lucene.apache.org/solr/tutorial.html

hth,
Geert-Jan
http://techgurulive.com/2010/11/22/apache-solr-commit-and-optimize/

2011/2/3 Esclusa, Will william.escl...@bonton.com

 Both the application and the SOLR gui match (with the incorrect number
 of course :-) )

 At first I thought it could be a schema problem, but we went though it
 with a fine comb and compared it to the one in our stage environment.
 What is really weird is that I grabbed one of the product ID that are
 not showing up in SOLR from the DB, search through the SOLR GUI and it
 found it.

 -Original Message-
 From: Savvas-Andreas Moysidis
 [mailto:savvas.andreas.moysi...@googlemail.com]
 Sent: Thursday, February 03, 2011 4:57 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Index Not Matching

 that's odd..are you viewing the results through your application or the
 admin console? if you aren't, I'd suggest you use the admin console just
 to
 eliminate the possibility of an application bug.
 We had a similar problem in the past and turned out to be a mixup of our
 dev/test instances..

 On 3 February 2011 21:41, Esclusa, Will william.escl...@bonton.com
 wrote:

  Hello Saavs,
 
  I am 100% sure we are not updating the DB after we index the data. We
  are specifying the same fields on both queries. Our prod boxes do not
  have access to QA or DEV, so I would expect a connection error when
  indexing if this is the case. No connection errors in the logs.
 
 
 
  -Original Message-
  From: Savvas-Andreas Moysidis
  [mailto:savvas.andreas.moysi...@googlemail.com]
  Sent: Thursday, February 03, 2011 4:26 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Index Not Matching
 
  Hello,
 
  Are you definitely positive your database isn't updated after you
 index
  your
  data? Are you querying against the same field(s) specifying the same
  criteria both in Solr and in the database?
  Any chance you might be pointing to a dev/test instance of Solr ?
 
  Regards,
  - Savvas
 
  On 3 February 2011 20:17, Esclusa, Will william.escl...@bonton.com
  wrote:
 
   Greetings!
  
  
  
   My organization is new to SOLR, so please bare with me.  At times,
 we
   experience an out of sync condition between SOLR index files and our
   Database. We resolved that by clearing the index file and performing
 a
  full
   crawl of the database. Last time we noticed an out of sync
 condition,
  we
   went through our procedure of deleting and crawling, but this time
 it
  did
   not fix it.
  
  
  
   For example, search for swim on the DB and we get 440 products, but
  yet
   SOLR states we have 214 products. Has anyone experience anything
 like
  this?
   Does anyone have any suggestions on a trace we can turn on? Again,
 we
  are
   new to SOLR so any help you can provide is greatly appreciated.
  
  
  
   Thanks!
  
  
  
   Will

RE: Index Not Matching

2011-02-03 Thread Chris Hostetter

: At first I thought it could be a schema problem, but we went though it
: with a fine comb and compared it to the one in our stage environment.
: What is really weird is that I grabbed one of the product ID that are
: not showing up in SOLR from the DB, search through the SOLR GUI and it
: found it. 

unless i'm completely missunderstanding you, that means there is a record 
in Solr for that record in the DB -- which suggests the problem is not DB 
records getting indexed, it's analysis of some kind -- does a *:* (ie: 
return all docs) query to solr return the smae number of results as a 
select count(*) query on the DB?

there's really not enough info here to make any meaningful guesses as to 
the problem.


-Hoss

Re: Index Not Matching

which field type are you specifying in your schema.xml for the fields that
you search upon? if you are using text then this causes your input text to
be stemmed to a common root making your searches more flexible. For
instance:
if you have the term dreaming in one row/document and the term dream in
another, then this could be stemmed to dreami or something like during
indexing.  This effectively causes both your documents to match when you
search for dream in Solr but you would only return 1 result if you
searched directly in your database.

On 3 February 2011 22:37, Geert-Jan Brits gbr...@gmail.com wrote:

 Make sure your index is completely commited.

 curl 'http://localhost:8983/solr/update?commit=true'


 http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22

 for an overview:
 http://lucene.apache.org/solr/tutorial.html

 hth,
 Geert-Jan
 http://techgurulive.com/2010/11/22/apache-solr-commit-and-optimize/

 2011/2/3 Esclusa, Will william.escl...@bonton.com

  Both the application and the SOLR gui match (with the incorrect number
  of course :-) )
 
  At first I thought it could be a schema problem, but we went though it
  with a fine comb and compared it to the one in our stage environment.
  What is really weird is that I grabbed one of the product ID that are
  not showing up in SOLR from the DB, search through the SOLR GUI and it
  found it.
 
  -Original Message-
  From: Savvas-Andreas Moysidis
  [mailto:savvas.andreas.moysi...@googlemail.com]
  Sent: Thursday, February 03, 2011 4:57 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Index Not Matching
 
  that's odd..are you viewing the results through your application or the
  admin console? if you aren't, I'd suggest you use the admin console just
  to
  eliminate the possibility of an application bug.
  We had a similar problem in the past and turned out to be a mixup of our
  dev/test instances..
 
  On 3 February 2011 21:41, Esclusa, Will william.escl...@bonton.com
  wrote:
 
   Hello Saavs,
  
   I am 100% sure we are not updating the DB after we index the data. We
   are specifying the same fields on both queries. Our prod boxes do not
   have access to QA or DEV, so I would expect a connection error when
   indexing if this is the case. No connection errors in the logs.
  
  
  
   -Original Message-
   From: Savvas-Andreas Moysidis
   [mailto:savvas.andreas.moysi...@googlemail.com]
   Sent: Thursday, February 03, 2011 4:26 PM
   To: solr-user@lucene.apache.org
   Subject: Re: Index Not Matching
  
   Hello,
  
   Are you definitely positive your database isn't updated after you
  index
   your
   data? Are you querying against the same field(s) specifying the same
   criteria both in Solr and in the database?
   Any chance you might be pointing to a dev/test instance of Solr ?
  
   Regards,
   - Savvas
  
   On 3 February 2011 20:17, Esclusa, Will william.escl...@bonton.com
   wrote:
  
Greetings!
   
   
   
My organization is new to SOLR, so please bare with me.  At times,
  we
experience an out of sync condition between SOLR index files and our
Database. We resolved that by clearing the index file and performing
  a
   full
crawl of the database. Last time we noticed an out of sync
  condition,
   we
went through our procedure of deleting and crawling, but this time
  it
   did
not fix it.
   
   
   
For example, search for swim on the DB and we get 440 products, but
   yet
SOLR states we have 214 products. Has anyone experience anything
  like
   this?
Does anyone have any suggestions on a trace we can turn on? Again,
  we
   are
new to SOLR so any help you can provide is greatly appreciated.
   
   
   
Thanks!
   
   
   
Will

RE: Index Not Matching

2011-02-03 Thread Esclusa, Will

Hello Hoss,

That is exactly what it is going on. It seems to be failing in the
analysis of the record. How do I get all the records from SOLR?
http://localhost:8080/select?*.* ?

Thanks!

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Thursday, February 03, 2011 5:42 PM
To: solr-user@lucene.apache.org
Subject: RE: Index Not Matching

: At first I thought it could be a schema problem, but we went though it
: with a fine comb and compared it to the one in our stage environment.
: What is really weird is that I grabbed one of the product ID that are
: not showing up in SOLR from the DB, search through the SOLR GUI and it
: found it. 

unless i'm completely missunderstanding you, that means there is a
record 
in Solr for that record in the DB -- which suggests the problem is not
DB 
records getting indexed, it's analysis of some kind -- does a *:* (ie:

return all docs) query to solr return the smae number of results as a 
select count(*) query on the DB?

there's really not enough info here to make any meaningful guesses as to

the problem.


-Hoss

response when using my own QParserPlugin

2011-02-03 Thread Tri Nguyen

Hi,

I wrote a QParserPlugin.  When I hit solr and use this QParserPlugin, the 
response does not have the column names associated with the data such as:

0 29 0 {!tnav} faketn1 CA city san francisco US 10 - - 495,496,497 
500,657,498,499 us:ca:san francisco faketn,fakeregression 037.74 -122.49 
faketn1 
faketn1 faketn1 faketn1 faketn1 99902837 
+3774-12250|+3774-12250@1|+3772-12252@2 94116:us 495,496,497 
fakecs,fakeatti,fakevenable 500,657,498,499 San Francisco 667 US 37.742369 
-122.491240 boldMain Dishes/bold boldPancakes/bold faketn1 2.99 Enjoy 
best chinese food. faketn1 1;0:0:0:0:8:20% off.0:0:0:3:0.0 4158281775 94116 
ACTION_MODEL TN CA 2350 Taraval St Enjoy best chinese food 40233 - 
5;10:ACTION_MAP0:3:0.315:ACTION_DRIVE_TO0:3:0.517:ACTION_IMPRESSION0:6:0.005014:ACTION_PROFILE0:3:0.111:ACTION_CALL0:3:0.3
 2027 - 



How do I get the data to be associated with the index columns so I can parse it 
and know the context of the data (such as this data is the business name, this 
data is the address, etc).

---


i was hoping it return something like this or some sort of structure.

?xml version=1.0 encoding=UTF-8 ? 
- response
- lstname=responseHeader
  intname=status0/int 
  intname=QTime1/int 
- lstname=params
  strname=indenton/str 
  strname=start0/str 
  strname=qI_NAME_EXACT:faketn1/str 
  strname=rows10/str 
  strname=version2.2/str 
  /lst
  /lst
- resultname=responsenumFound=1start=0
- doc
- arrname=I_BASE_ID
  str-/str 
  str-/str 
  /arr
  strname=I_BLOCK_CATEGORY_ID495,496,497/str 
  strname=I_CATEGORY_ID500,657,498,499/str 
  strname=I_CITY_DISTRICTus:ca:san francisco/str 
  strname=I_KEYWORDfaketn,fakeregression/str 
  strname=I_LAT_RANGE037.74/str 
  strname=I_LON_RANGE-122.49/str 
  strname=I_NAME_AS_KEYWORDfaketn1/str 
  strname=I_NAME_ENUMfaketn1/str 
  strname=I_NAME_EXACTfaketn1/str 
  strname=I_NAME_NGRAMfaketn1/str 
  strname=I_NAME_PACKfaketn1/str 
  strname=I_POI_ID99902837/str 
  strname=I_SPATIAL_BLOCK+3774-12250|+3774-12250@1|+3772-12252@2/str 
  strname=I_ZIP_DISTRICT94116:us/str 
  strname=S_BLOCK_CATEGORY_ID495,496,497/str 
  strname=S_BLOCK_KEYWORDSfakecs,fakeatti,fakevenable/str 
  strname=S_CATEGORY_ID500,657,498,499/str 
  strname=S_CITYSan Francisco/str 
  strname=S_COMPAIGN_ID667/str 
  strname=S_COUNTRYUS/str 
  str name=S_FAX/ 
  strname=S_LATITUDE37.742369/str 
  strname=S_LONGTITUDE-122.491240/str 
  strname=S_MENUboldMain Dishes/bold boldPancakes/bold faketn1 
2.99/str 

  strname=S_MERCHANT_CONTENTEnjoy best chinese food./str 
  strname=S_NAMEfaketn1/str 
  strname=S_OFFERS1;0:0:0:0:8:20% off.0:0:0:3:0.0/str 
  strname=S_PHONE_NUMBER4158281775/str 
  strname=S_POSTALCODE94116/str 
  strname=S_PRICEMODEACTION_MODEL/str 
  strname=S_SOURCE_NAMETN/str 
  str name=S_SPONSOREDTEXT/ 
  strname=S_STATECA/str 
  strname=S_STREET2350 Taraval St/str 
  str name=S_STREET2/ 
  str name=S_SUIT/ 
  strname=S_TAGLINEEnjoy best chinese food/str 
  strname=S_TARGET_DISTANCE_IN_METER40233/str 
  strname=S_TA_ID-/str 
  
strname=S_USER_ACTIONS5;10:ACTION_MAP0:3:0.315:ACTION_DRIVE_TO0:3:0.517:ACTION_IMPRESSION0:6:0.005014:ACTION_PROFILE0:3:0.111:ACTION_CALL0:3:0.3/str
 
  strname=S_VENDOR_ID2027/str 
  str name=S_WEBURL/ 
  strname=S_YPC_ID-/str 
  /doc
  /result
  /response
 
Tri

Re: Scale out design patterns

2011-02-03 Thread Ganesh

I am also in the same idea. Based on the field, I could shard but there are two 
practical difficulties.

1. If normal user logged-in then result could be fetched from the corresponding 
search server but if Admin user logged-in, then he may need to see all data. 
The query should be issued across servers and results should be consolidated.

2. Consider a scenario I am sharding based on the User, I am having single 
search server and It is handling 1000 members. Now as the memory consumption is 
high,  I have added one more search server. New users could access the second 
server but what about the old users, their data will be still added to the 
server1. How to address this issue. Is rebuilding the index the only way. 

Could any one share their experience, How they solved scale out problems?

Regards
Ganesh 


- Original Message - 
From: Anshum ansh...@gmail.com
To: java-u...@lucene.apache.org
Sent: Friday, January 21, 2011 12:04 PM
Subject: Re: Scale out design patterns


 Hi Ganesh,
 I'd suggest, if you have a particular dimension/field on which you could
 shard your data such that the query/data breakup gets predictable, that
 would be a good way to scale out e.g. if you have users which are equally
 active/searched then you may want to split their data on a simple mod of
 some numeric (auto increment) userid.
 This works well under normal cases unless your partitioning is not
 predictable.
 
 --
 Anshum Gupta
 http://ai-cafe.blogspot.com
 
 
 On Fri, Jan 21, 2011 at 10:52 AM, Ganesh emailg...@yahoo.co.in wrote:
 
 Hello all,

 Could you any one guide me what all the various ways we could scale out?

 1. Index:  Add data to the nodes in round-robin.
   Search: Query all the nodes and cluster the results using carrot2.

 2.Horizontal partitioning and No shared architecture,
   Index:   Split the data based on userid and index few set of users data
 in each node.
   Search: Have a mapper kind of application which could tell which userid
 is mapped to node, redirect the search traffic to corresponding node.

 Which one is best? Did you guys tried any of these approach. Please share
 your thoughts.

 Regards
 Ganesh
 Send free SMS to your Friends on Mobile from your Yahoo! Messenger.
 Download Now! http://messenger.yahoo.com/download.php

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org



Send free SMS to your Friends on Mobile from your Yahoo! Messenger. Download 
Now! http://messenger.yahoo.com/download.php

Re: Function Question

2011-02-03 Thread William Bell

I like it. You would think it would be easy to get the values from a
multiValue field in the geodist() function,
but I guess it was not built for that. If anyone has done something
similar, let me know. Thanks.

Bill


On Thu, Feb 3, 2011 at 3:18 PM, Geert-Jan Brits gbr...@gmail.com wrote:
 I don't have a direct answer to your question, but you could consider having
 fields:
 latCombined and LongCombined where you pairwise combine the latitudes and
 longitudes, e.g:

 latCombined: 48.0-49.0-50.0
 longcombined: 2.0-3.0-4.0

 Than in your custom scorer above split latCombined and longcombined and
 calculate the closests distance to the user-defined point.

 hth,
 Geert-Jan

 2011/2/3 William Bell billnb...@gmail.com

 Thoughts?

 On Wed, Feb 2, 2011 at 10:38 PM, Bill Bell billnb...@gmail.com wrote:
 
  This is posted as an enhancement on SOLR-2345.
 
  I am willing to work on it. But I am stuck. I would like to loop through
  the lat/long values when they are stored in a multiValue list. But it
  appears that I cannot figure out to do that. For example:
 
  sort=geodist() asc
  This should grab the closest point in the MultiValue list, and return the
  distance so that is can be scored.
  The problem is I cannot find a way to get the MultiValue list?
  In function:
 
 src/java/org/apache/solr/search/function/distance/HaversineConstFunction.ja
  va
  Has code similar to:
  VectorValueSource p2;
  this.p2 = vs
  ListValueSource sources = p2.getSources();
  ValueSource latSource = sources.get(0);
  ValueSource lonSource = sources.get(1);
  DocValues latVals = latSource.getValues(context1, readerContext1);
  DocValues lonVals = lonSource.getValues(context1, readerContext1);
  double latRad = latVals.doubleVal(doc) *
 DistanceUtils.DEGREES_TO_RADIANS;
  double lonRad = lonVals.doubleVal(doc) *
 DistanceUtils.DEGREES_TO_RADIANS;
  etc...
  It would be good if I could loop through sources.get() but it only
 returns
  2 sources even when there are 2 pairs of lat/long. The getSources() only
  returns the following:
  sources:[double(store_0_coordinate), double(store_1_coordinate)]
  How do I just get the 4 values in the function?

Re: response when using my own QParserPlugin


Are looking your output in any Browser?
Which browser you are using?
If chrome then look for view source it will give your desired xml output or
change any other browser to see xml output

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/response-when-using-my-own-QParserPlugin-tp2419367p2421499.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: HTTP ERROR 400 undefined field: *


How you have upgraded ?
are you changed every thing all jars ,data,config 
or any thing using from older version?

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/HTTP-ERROR-400-undefined-field-tp2417938p2421569.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: HTTP ERROR 400 undefined field: *

2011-02-03 Thread Chris Hostetter


: I was working on an checkout of the 3.x branch from about 6 months ago.
: Everything was working pretty well, but we decided that we should update and
: get what was at the head.  However after upgrading, I am now getting this

FWIW: please be specific.  head of what? the 3x branch? or trunk?  what 
revision in svn does that corrispond to? (the svnversion command will 
tell you)

: HTTP ERROR 400 undefined field: *
: 
: If I clear the fl parameter (default is set to *, score) then it works fine
: with one big problem, no score data.  If I try and set fl=score I get the same
: error except it says undefined field: score?!
: 
: This works great in the older version, what changed?  I've googled for about
: an hour now and I can't seem to find anything.

i can't reproduce this using either trunk (r1067044) or 3x (r1067045)

all of these queries work just fine...

http://localhost:8983/solr/select/?q=*
http://localhost:8983/solr/select/?q=solrfl=*,score
http://localhost:8983/solr/select/?q=solrfl=score
http://localhost:8983/solr/select/?q=solr

...you'll have to proivde us with a *lot* more details to help understand 
why you might be getting an error (like: what your configs look like, what 
the request looks like, what the full stack trace of your error is in the 
logs, etc...)




-Hoss

Re: facet.mincount

Thanks to all

On 3 February 2011 20:21, Grijesh pintu.grij...@gmail.com wrote:


 Hi

 facet.mincount not works with facet.date option afaik.
 There is an issue for it as solr-343, but resolved.
 Try apply patch, provided as a solution in this issue may solve the
 problem.
 Fix version for this may be 1.5

 -
 Thanx:
 Grijesh
 http://lucidimagination.com
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/facet-mincount-tp2411930p2414232.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks  Regards,
Isan Fulia.

Re: Use Parallel Search

2011-02-03 Thread Ganesh

I am having similar kind of problem. I need to scale out. Could you explain how 
you have done distributed indexing and search using Lucene.

Regards
Ganesh  

- Original Message - 
From: Gustavo Maia gust...@goshme.com
To: solr-user@lucene.apache.org
Sent: Thursday, February 03, 2011 11:36 PM
Subject: Use Parallel Search

 Hello,

 Let me give a brief description of my scenario.
 Today I am only using Lucene 2.9.3. I have an index of 30 million documents
 distributed on three machines and each machine with 6 hds (15k rmp).
 The server queries the search index using the remote class search. And each
 machine is made to search using the parallel search (search simultaneously
 in 6 hds).
 So during the search are simulating using the three machines and 18 hds,
 returning me to a very good response time.

 Today I am studying the SOLR and am interested in knowing more about the
 searches and use of distributed parallel search on the same machine. What
 would be the best scenario using SOLR that is better than I already am using
 today only with lucene?
  Note: I need to have installed on each machine 6 SOLR instantiate from my
 server? One for each hd? Or would some other alternative way for me to use
 the 6 hds without having 6 instances of SORL server?

  Another question would be if the SOLR would have some limiting size index
 for Hard drive? It would be interesting not index too big because when the
 index increased the longer the search.

 Thanks for everything.

 Gustavo Maia

Send free SMS to your Friends on Mobile from your Yahoo! Messenger. Download 
Now! http://messenger.yahoo.com/download.php

Re: Using terms and N-gram

Okay so as suggested Shingle works perfectly well for what I need !
Thank you Erick

2011/2/3 openvictor Open openvic...@gmail.com

 Thank you for these inputs.

 I was silly asking for ngrams because I already knew it. I think I was
 tired yesterday...

 Thank you Eric Erickson, once again you gave me a more than useful comment.
 Indeed Shingles seems to be the perfect fit for the work I want to do. I
 will try to implement that tonight and I will come back to see if it's
 working.

 Regards,
 Victor

 2011/2/3 Erick Erickson erickerick...@gmail.com

 First, you'll get a lot of insight by defining something simply and looking
 at the analysis page from solr admin. That's a very valuable page.

 To your question:
 commongrams are shingles that work between stopwords and
 other words. For instance, this is some text gets analyzed into
 this, this_is, is, is_some, some text. Note that the stopwords
 are the only things that get combined with the text after.

 NGrams form on letters. It's too long to post the whole thing, but
 the above phrase gets analyzed as
 t, h, i, s, th, hi, is, i, s, is, s, o, m, e, so, om, me.. It splits a
 single
 token into grams whereas commongrams essentially combines tokens
 when they're stopwords.

 Have you looked at shingles? See:

 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
 Best
 Erick


 On Thu, Feb 3, 2011 at 10:15 AM, openvictor Open openvic...@gmail.com
 wrote:

  Thank you, I will do that and hopefuly it will be handy !
 
  But can someone explain me difference between CommonGramFIlterFactory et
  NGramFilterFactory ? ( Maybe the solution is there)
 
  Thank you all,
  best regards
 
  2011/2/3 Grijesh pintu.grij...@gmail.com
 
  
   Use analysis.jsp to see what happening at index time and query time
 with
   your
   input data.You can use highlighting to see if match found.
  
   -
   Thanx:
   Grijesh
   http://lucidimagination.com
   --
   View this message in context:
  
 
 http://lucene.472066.n3.nabble.com/Using-terms-and-N-gram-tp2410938p2411244.html
   Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLR 1.4 and Lucene 3.0.3 index problem

2011-02-03 Thread Dominique Bejean


Hi,

I would not try to change the lucene version in Solr 1.4.1 from 2.9.x to 
3.0.x.


As said Koji, the best solution is to get the branch 3.x or the trunk 
and build it. You need svn and ant.


1. Create a working directory

$ mkdir ~/solr

2. Get the source

$ cd ~/solr

$ svn co http://svn.apache.org/repos/asf/lucene/dev/trunk
or
$ svn co http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x

3. build

$ cd ~/solr/modules
$ ant compile
$ cd ~/solr/lucene
$ ant dist
$ cd ~/solr/modules
$ ant dist

Dominique

Le 02/02/11 12:47, Churchill Nanje Mambe a écrit :

thanks guys
  I will try the trunk

as for unpacking the war and changing the lucene... I am not an expect and
this my get complicated for me maybe over time
when I am comfortable

Mambe Churchill Nanje
237 33011349,
AfroVisioN Founder, President,CEO
http://www.afrovisiongroup.com | http://mambenanje.blogspot.com
skypeID: mambenanje
www.twitter.com/mambenanje



On Wed, Feb 2, 2011 at 8:03 AM, Grijeshpintu.grij...@gmail.com  wrote:


You can extract the solr.war using java's jar -xvf solr.war  command

change the lucene-2.9.jar with your lucene-3.0.3.jar in WEB-INF/lib
directory

then use jar -cxf solr.war * to again pack the war

deploy that war hope that work

-
Thanx:
Grijesh
--
View this message in context:
http://lucene.472066.n3.nabble.com/SOLR-1-4-and-Lucene-3-0-3-index-problem-tp2396605p2403542.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr faceting on score