Porting from Solr 1.3 to 3.5

2012-06-07 Thread Ramprakash Ramamoorthy
I am porting my app from lucene 2.X(solr 1.3) to lucene 3.X(solr 3.5). The
following is my issue.

This one was valid in 2.X, but 3.5 throws me an error.

IndexReader reader = IndexReader.open(/home/path/to/my/dataDir);

2.X accepted a string, but 3.5 strictly wants a Directory object. I find
Directory to be abstract and the only way to instantiate it seems a
RAMDirectory().

How do I go about this and how do I point my reader to the desired
directory?


P.S : Our application needs a custom logic this way and hence instead of
going with cores, we do it this way.

-- 
With Thanks and Regards,
Ramprakash Ramamoorthy,
Engineer Trainee,
Zoho Corporation.
+91 9626975420


Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Spadez
Hi,

My current method of searching involes communicating to solr using python.
The clients browser communicates to the search API using jquery/json.
However, although this works, I dont like the dependancy on Javascript.

Either I can keep with this method and have a backup system in place that
works when javascript is disabled, or better yet, I can use a system that
works both with Javascript or without.

So I was thinking, instead of using the API and returning a JSON to be
interpreted by Javascript, I could create a new handler to render the search
results in the server and use POST to submit the query to the server.

So, if I wanted a fast and effiicent method of querying results from Solr
and returning the results all without Javascript enabled, what choices do I
have?

Your thoughts would be hugely appreciated because im new to this stuff.

James

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr, I have perfomance problem for indexing.

2012-06-07 Thread Lee Carroll
what is your db schema ? do you need to import all the schema ? (128
joined tables ??)
or are the tables all independant ? (if so dump them out and import
them in using csv)

cheers lee c

On 7 June 2012 02:32, Jihyun Suh jhsuh.ourli...@gmail.com wrote:
 Each table has 35,000 rows. (35 thousands).
 I will check the log for each step of indexing.

 I run Solr 3.5.


 2012/6/6 Jihyun Suh jhsuh.ourli...@gmail.com

 I have 128 tables of mysql 5.x and each table have 3,5000 rows.
 When I start dataimport(indexing) in Solr, it takes 5 minutes for one
 table.
 But When Solr indexs 20th table, it takes around 10 minutes for one table.
 And then When it indexs 40th table, it takes around 20 minutes for one
 table.

 Solr has some performance problem for too many documents?
 Should I set some configuration?




Re: Porting from Solr 1.3 to 3.5

2012-06-07 Thread Ramprakash Ramamoorthy
On Thu, Jun 7, 2012 at 1:18 PM, Ramprakash Ramamoorthy 
youngestachie...@gmail.com wrote:

 I am porting my app from lucene 2.X(solr 1.3) to lucene 3.X(solr 3.5). The
 following is my issue.

 This one was valid in 2.X, but 3.5 throws me an error.

 IndexReader reader = IndexReader.open(/home/path/to/my/dataDir);

 2.X accepted a string, but 3.5 strictly wants a Directory object. I find
 Directory to be abstract and the only way to instantiate it seems a
 RAMDirectory().

 How do I go about this and how do I point my reader to the desired
 directory?


 P.S : Our application needs a custom logic this way and hence instead of
 going with cores, we do it this way.

 --
 With Thanks and Regards,
 Ramprakash Ramamoorthy,
 Engineer Trainee,
 Zoho Corporation.
 +91 9626975420


I was able to do it. I just did it this way

IndexReader reader = new SimpleFSDirectory(new File(my/desired/path));

Thanks for your time.

-- 
With Thanks and Regards,
Ramprakash Ramamoorthy,
Engineer Trainee,
Zoho Corporation.
+91 9626975420


Re: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Spadez
Further to my last reply. How about I do the following:

Send the request to the server using the GET method and then return the
results in XML rather than JSON. Does this sound logical?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988128.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Exception when optimizing index

2012-06-07 Thread Rok Rejc
Hi Jack,

its the virtual machine running on a VMware vSphere 5 Enterprise Plus.
Machine has 30 GB vRAM, 8 core vCPU 3.0 GHz, 2 TB SATA RAID-10 over iSCSI.
Operation system is CentOS 6.2 64bit.

Here are java infos:


   - catalina.​base/usr/share/tomcat6
   - catalina.​home/usr/share/tomcat6
   - catalina.​useNamingtrue
   - common.​loader
   
${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar
   - file.​encodingUTF-8
   - file.​encoding.​pkgsun.io
   - file.​separator/
   - java.​awt.​graphicsenvsun.awt.X11GraphicsEnvironment
   - java.​awt.​printerjobsun.print.PSPrinterJob
   - java.​class.​path
   /usr/share/tomcat6/bin/bootstrap.jar
   /usr/share/tomcat6/bin/tomcat-juli.jar/usr/share/java/commons-daemon.jar
   - java.​class.​version50.0
   - java.​endorsed.​dirs
   - java.​ext.​dirs
   /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/ext
   /usr/java/packages/lib/ext
   - java.​home/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre
   - java.​io.​tmpdir/var/cache/tomcat6/temp
   - java.​library.​path
   /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64/server
   /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64
   /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/../lib/amd64
   /usr/java/packages/lib/amd64/usr/lib64/lib64/lib/usr/lib
   - java.​naming.​factory.​initial
   org.apache.naming.java.javaURLContextFactory
   - java.​naming.​factory.​url.​pkgsorg.apache.naming
   - java.​runtime.​nameOpenJDK Runtime Environment
   - java.​runtime.​version1.6.0_22-b22
   - java.​specification.​nameJava Platform API Specification
   - java.​specification.​vendorSun Microsystems Inc.
   - java.​specification.​version1.6
   - java.​util.​logging.​config.​file
   /usr/share/tomcat6/conf/logging.properties
   - java.​util.​logging.​managerorg.apache.juli.ClassLoaderLogManager
   - java.​vendorSun Microsystems Inc.
   - java.​vendor.​urlhttp://java.sun.com/
   - java.​vendor.​url.​bughttp://java.sun.com/cgi-bin/bugreport.cgi
   - java.​version1.6.0_22
   - java.​vm.​infomixed mode
   - java.​vm.​nameOpenJDK 64-Bit Server VM
   - java.​vm.​specification.​nameJava Virtual Machine Specification
   - java.​vm.​specification.​vendorSun Microsystems Inc.
   - java.​vm.​specification.​version1.0
   - java.​vm.​vendorSun Microsystems Inc.
   - java.​vm.​version20.0-b11
   - javax.​sql.​DataSource.​Factory
   org.apache.commons.dbcp.BasicDataSourceFactory
   - line.​separator
   - os.​archamd64
   - os.​nameLinux
   - os.​version2.6.32-220.13.1.el6.x86_64
   - package.​access
   
sun.,org.apache.catalina.,org.apache.coyote.,org.apache.tomcat.,org.apache.jasper.,sun.beans.
   - package.​definition
   
sun.,java.,org.apache.catalina.,org.apache.coyote.,org.apache.tomcat.,org.apache.jasper.
   - path.​separator:
   - server.​loader
   - shared.​loader
   - sun.​arch.​data.​model64
   - sun.​boot.​class.​path
   /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/resources.jar
   /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/rt.jar
   /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/sunrsasign.jar
   /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/jsse.jar
   /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/jce.jar
   /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/charsets.jar
   /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/netx.jar
   /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/plugin.jar
   /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/rhino.jar
   /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/modules/jdk.boot.jar
   /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/classes
   - sun.​boot.​library.​path
   /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64
   - sun.​cpu.​endianlittle
   - sun.​cpu.​isalist
   - sun.​io.​unicode.​encodingUnicodeLittle
   - sun.​java.​commandorg.apache.catalina.startup.Bootstrap start
   - sun.​java.​launcherSUN_STANDARD
   - sun.​jnu.​encodingUTF-8
   - sun.​management.​compilerHotSpot 64-Bit Tiered Compilers
   - sun.​os.​patch.​levelunknown
   - tomcat.​util.​buf.​StringCache.​byte.​enabledtrue
   - user.​countryUS
   - user.​dir/usr/share/tomcat6
   - user.​home/usr/share/tomcat6
   - user.​languageen
   - user.​nametomcat
   - user.​timezoneEurope/Ljubljana




As far as I see from the JIRA issue I have the patch attached (as mentioned
I have a trunk version from May 12). Any ideas?

Many thanks!



On Wed, Jun 6, 2012 at 2:49 PM, Jack Krupansky j...@basetechnology.comwrote:

 It could be related to 
 https://issues.apache.org/**jira/browse/LUCENE-2975https://issues.apache.org/jira/browse/LUCENE-2975.
 At least the exception comes from the same function.


 Caused by: java.io.IOException: Invalid vInt detected (too many bits)
   at org.apache.lucene.store.**DataInput.readVInt(DataInput.**java:112)

 What hardware and Java version are you running?

 -- Jack Krupansky

 -Original Message- From: Rok Rejc
 Sent: Wednesday, 

Re: Solr, I have perfomance problem for indexing.

2012-06-07 Thread Erick Erickson
You haven't really told us much about what you're doing here. As Lee
hints, we don't know much about the details of *how* you are doing this.

But unless you're doing something odd, Solr shouldn't be the bottleneck
here. Often when a database import is slow, the problem is in the data-
acquisition bit. That is, your SQL query for some reason gets
slow. That said, with DIH it can be hard to know exactly.

You might want to consider using SolrJ instead of DIH. We've found that
as the import process gets more complex, using SolrJ is often easier. See:
http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/


Best
Erick

On Thu, Jun 7, 2012 at 5:26 AM, Lee Carroll
lee.a.carr...@googlemail.com wrote:
 what is your db schema ? do you need to import all the schema ? (128
 joined tables ??)
 or are the tables all independant ? (if so dump them out and import
 them in using csv)

 cheers lee c

 On 7 June 2012 02:32, Jihyun Suh jhsuh.ourli...@gmail.com wrote:
 Each table has 35,000 rows. (35 thousands).
 I will check the log for each step of indexing.

 I run Solr 3.5.


 2012/6/6 Jihyun Suh jhsuh.ourli...@gmail.com

 I have 128 tables of mysql 5.x and each table have 3,5000 rows.
 When I start dataimport(indexing) in Solr, it takes 5 minutes for one
 table.
 But When Solr indexs 20th table, it takes around 10 minutes for one table.
 And then When it indexs 40th table, it takes around 20 minutes for one
 table.

 Solr has some performance problem for too many documents?
 Should I set some configuration?




Solr, db connections remain after indexing a table.

2012-06-07 Thread Jihyun Suh
I index many tables which are written with entities in data-config.xml.
But after indexing one table, db connection remains
even though I set 'holdability=CLOSE_CURSORS_AT_COMMIT'.

How can I remove the connection after indexing a table?


dataConfig
  dataSource type=JdbcDataSource
   driver=com.mysql.jdbc.Driver
   url=jdbc:mysql://hostname/dbname
   batchSize=2000
   user=id
   password=passwd
   readOnly=true
   transactionIsolation=TRANSACTION_READ_COMMITTED
   holdability=CLOSE_CURSORS_AT_COMMIT
   connectionTimeout=1 readTimeout=24 /
  document name=doc
entity name=testTbl_0
transformer=RegexTransformer
onError=continue
query=SELECT Title, url, DocID, substring_index(body,' ',2048)
description FROM testTbl_0 WHERE status in ('1','s') 
  field column=DocID name=id /
  field column=Title name=title_t /
  field column=description name=contents_txt /
  field column=url name=url /
/entity
entity name=testTbl_1
transformer=RegexTransformer
onError=continue
query=SELECT Title, url, DocID, substring_index(body,' ',2048)
description FROM testTbl_1 WHERE status in ('1','s') 
  field column=DocID name=id /
  field column=Title name=title_t /
  field column=description name=contents_txt /
  field column=url name=url /
/entity


+---+--+--+--+-+--+---+---+
| Id| User | Host | db   | Command | Time |
State | Info  |
+---+--+--+--+-+--+---+---+
| 88757 | id | hostname:38843 | tmp  | Sleep   | 2268 |   | NULL
   |
| 88758 | id | hostname:38844 | tmp  | Sleep   | 2196 |   | NULL
   |
| 88759 | id | hostname:38845 | tmp  | Sleep   | 2134 |   | NULL
   |
| 88760 | id | hostname:47822 | tmp  | Sleep   | 2074 |   | NULL
   |
| 88761 | id | hostname:47823 | tmp  | Sleep   | 2013 |   | NULL
   |
| 88762 | id | hostname:47824 | tmp  | Sleep   | 1953 |   | NULL
   |
| 88763 | id | hostname:47825 | tmp  | Sleep   | 1896 |   | NULL
   |
| 88764 | id | hostname:47826 | tmp  | Sleep   | 1838 |   | NULL
   |
| 88765 | id | hostname:39795 | tmp  | Sleep   | 1778 |   | NULL
   |
| 88766 | id | hostname:39796 | tmp  | Sleep   | 1717 |   | NULL
   |
| 88767 | id | hostname:39797 | tmp  | Sleep   | 1658 |   | NULL
   |
| 88768 | id | hostname:39798 | tmp  | Sleep   | 1594 |   | NULL
   |
| 88769 | id | hostname:39799 | tmp  | Sleep   | 1535 |   | NULL
   |
| 88770 | id | hostname:50275 | tmp  | Sleep   | 1470 |   | NULL
   |
| 88771 | id | hostname:50276 | tmp  | Sleep   | 1411 |   | NULL
   |
| 88772 | id | hostname:50277 | tmp  | Sleep   | 1352 |   | NULL
   |
| 88773 | id | hostname:50278 | tmp  | Sleep   | 1291 |   | NULL
   |
| 88774 | id | hostname:57385 | tmp  | Sleep   | 1165 |   | NULL
   |
| 88775 | id | hostname:57386 | tmp  | Sleep   | 1044 |   | NULL
   |
| 88776 | id | hostname:57387 | tmp  | Sleep   |  923 |   | NULL
   |
| 88777 | id | hostname:53484 | tmp  | Sleep   |  801 |   | NULL
   |
| 88778 | id | hostname:53485 | tmp  | Sleep   |  682 |   | NULL
   |
| 88779 | id | hostname:58343 | tmp  | Sleep   |  560 |   | NULL
   |
| 88780 | id | hostname:58344 | tmp  | Sleep   |  438 |   | NULL
   |
| 88781 | id | hostname:58345 | tmp  | Sleep   |  314 |   | NULL
   |
| 88782 | id | hostname:50474 | tmp  | Sleep   |  193 |   | NULL
   |
| 88783 | id | hostname:50475 | tmp  | Sleep   |   72 |   | NULL
   |
...


Re: Levenstein Distance

2012-06-07 Thread Tommaso Teofili
During the analysis phase you could add payloads to the terms using
LevensteinDistance and then use that in conjunction with a
PayloadSimilarity class ´See [1] for an example), or just use a custom
Similarity class which uses LevensteinDistance for scoring.
HTH
Tommaso

[1] :
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/


2012/6/6 Gau gauravshe...@gmail.com

 I have a list of synoynms which is being expanded at query time. This
 yields
 a lot of results (in millions). My use-case is name search.

 I want to sort the results by Levenstein Distance. I know this can be done
 with strdist function. But sorting being inefficient and Solr function
 adding to its woes kills the performance. I want the results to be returned
 as quickly as possible.

 One of the ways which I think Levenstein can work is, applying the strdist
 on the synonym file and getting the scores of each of the synonym. And then
 use these scores to boost the results appropriately, it should be
 equivalent
 to levenstein distance. But I am not sure how to do this in Solr or infact
 if Solr supports this.


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Levenstein-Distance-tp3988026.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr, db connections remain after indexing a table.

2012-06-07 Thread Jihyun Suh
I read someone's question and answer about db connection.
Someone said, db connection is still alive for 10minutes.
But I start to index(dataimport) before 1 hour, all of db connection
remains for 1 hour.


| 88757 | id | localhost:38843 | tmp  | Sleep   | 3696 |   | NULL
   |
| 88758 | id | localhost:38844 | tmp  | Sleep   | 3624 |   | NULL
   |


2012/6/7 Jihyun Suh jhsuh.ourli...@gmail.com

 I index many tables which are written with entities in data-config.xml.
 But after indexing one table, db connection remains
 even though I set 'holdability=CLOSE_CURSORS_AT_COMMIT'.

 How can I remove the connection after indexing a table?


 dataConfig
   dataSource type=JdbcDataSource
driver=com.mysql.jdbc.Driver
url=jdbc:mysql://hostname/dbname
batchSize=2000
user=id
password=passwd
readOnly=true
transactionIsolation=TRANSACTION_READ_COMMITTED
holdability=CLOSE_CURSORS_AT_COMMIT
connectionTimeout=1 readTimeout=24 /
   document name=doc
 entity name=testTbl_0
 transformer=RegexTransformer
 onError=continue
 query=SELECT Title, url, DocID, substring_index(body,'
 ',2048) description FROM testTbl_0 WHERE status in ('1','s') 
   field column=DocID name=id /
   field column=Title name=title_t /
   field column=description name=contents_txt /
   field column=url name=url /
 /entity
 entity name=testTbl_1
 transformer=RegexTransformer
 onError=continue
 query=SELECT Title, url, DocID, substring_index(body,'
 ',2048) description FROM testTbl_1 WHERE status in ('1','s') 
   field column=DocID name=id /
   field column=Title name=title_t /
   field column=description name=contents_txt /
   field column=url name=url /
 /entity
 


 +---+--+--+--+-+--+---+---+
 | Id| User | Host | db   | Command | Time |
 State | Info  |

 +---+--+--+--+-+--+---+---+
 | 88757 | id | hostname:38843 | tmp  | Sleep   | 2268 |   | NULL
|
 | 88758 | id | hostname:38844 | tmp  | Sleep   | 2196 |   | NULL
|
 | 88759 | id | hostname:38845 | tmp  | Sleep   | 2134 |   | NULL
|
 | 88760 | id | hostname:47822 | tmp  | Sleep   | 2074 |   | NULL
|
 | 88761 | id | hostname:47823 | tmp  | Sleep   | 2013 |   | NULL
|
 | 88762 | id | hostname:47824 | tmp  | Sleep   | 1953 |   | NULL
|
 | 88763 | id | hostname:47825 | tmp  | Sleep   | 1896 |   | NULL
|
 | 88764 | id | hostname:47826 | tmp  | Sleep   | 1838 |   | NULL
|
 | 88765 | id | hostname:39795 | tmp  | Sleep   | 1778 |   | NULL
|
 | 88766 | id | hostname:39796 | tmp  | Sleep   | 1717 |   | NULL
|
 | 88767 | id | hostname:39797 | tmp  | Sleep   | 1658 |   | NULL
|
 | 88768 | id | hostname:39798 | tmp  | Sleep   | 1594 |   | NULL
|
 | 88769 | id | hostname:39799 | tmp  | Sleep   | 1535 |   | NULL
|
 | 88770 | id | hostname:50275 | tmp  | Sleep   | 1470 |   | NULL
|
 | 88771 | id | hostname:50276 | tmp  | Sleep   | 1411 |   | NULL
|
 | 88772 | id | hostname:50277 | tmp  | Sleep   | 1352 |   | NULL
|
 | 88773 | id | hostname:50278 | tmp  | Sleep   | 1291 |   | NULL
|
 | 88774 | id | hostname:57385 | tmp  | Sleep   | 1165 |   | NULL
|
 | 88775 | id | hostname:57386 | tmp  | Sleep   | 1044 |   | NULL
|
 | 88776 | id | hostname:57387 | tmp  | Sleep   |  923 |   | NULL
|
 | 88777 | id | hostname:53484 | tmp  | Sleep   |  801 |   | NULL
|
 | 88778 | id | hostname:53485 | tmp  | Sleep   |  682 |   | NULL
|
 | 88779 | id | hostname:58343 | tmp  | Sleep   |  560 |   | NULL
|
 | 88780 | id | hostname:58344 | tmp  | Sleep   |  438 |   | NULL
|
 | 88781 | id | hostname:58345 | tmp  | Sleep   |  314 |   | NULL
|
 | 88782 | id | hostname:50474 | tmp  | Sleep   |  193 |   | NULL
|
 | 88783 | id | hostname:50475 | tmp  | Sleep   |   72 |   | NULL
|
 ...







Re: filtering number and repeated contents

2012-06-07 Thread Mark , N
thanks Jack  , I will try updateProcessor

Between does SOLR store tokenized content in fields if field have
property stored=true ?







On Tue, Jun 5, 2012 at 8:23 PM, Jack Krupansky j...@basetechnology.comwrote:

 My (very limited) understanding of boilerpipe in Tika is that it strips
 out short text, which is great for all the menu and navigation text, but
 the typical disclaimer at the bottom of an email is not very short and
 frequently can be longer than the email message body itself. You may have
 to resort to a custom update processor that is programmed with some
 disclaimer signature text strings to be removed from field values.

 -- Jack Krupansky

 -Original Message- From: Mark , N
 Sent: Tuesday, June 05, 2012 8:28 AM
 To: solr-user@lucene.apache.org
 Subject: filtering number and repeated contents


 Is it possible to filter out numbers and disclaimer ( repeated contents)
 while indexing to SOLR?
 These are all surplus information and do not want to index it

 I have tried using boilerpipe algorithm as well to remove surplus
 infromation from web pages such as navigational elements, templates, and
 advertisements , I think it works well but looking forward to see If I
 could filter out  disclaimer information too mainly in email texts.
 --
 Thanks,

 *Nipen Mark *




-- 
Thanks,

*Nipen Mark *


Re: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Spadez
Final comment from me then Ill let someone else speak.

The solution we seem to be looking at is send a GET request to SOLR and then
send back a renderized page, so we are basically creating the results page
on the server rather than the client side.

I would really like to hear what people have to say about this. Is this a
good idea? Are there any major disadvantages? 

It seems like the only way to go to have a reliable search site which works
without Javascript.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988158.html
Sent from the Solr - User mailing list archive at Nabble.com.


how to work with solr

2012-06-07 Thread sdssfour
Hi all

can anybosy suggest me how to work with solr in web application

please  send the information


Regards
Raja

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-work-with-solr-tp3988154.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ERROR 400 undefined field

2012-06-07 Thread Michael Kuhlmann

Am 07.06.2012 09:55, schrieb sheethal shreedhar:

http://localhost:8983/solr/select/?q=fruitversion=2.2start=0rows=10indent=on

I get

HTTP ERROR 400

Problem accessing /solr/select/. Reason:

 undefined field text


Look at your schema.xml. You'll find a line like this:

defaultSearchFieldtext/defaultSearchField

Replace text with a field that s defined somewhere in schema.xml.

Or change your query to something with a field name like this:

http://localhost:8983/solr/select/?q=somefield:fruit

Or use the (e)dismax handler and configure it accordingly. See 
http://wiki.apache.org/solr/DisMaxRequestHandler.


Greetings,
Kuli


Hiring multiple Lucene/Solr Engineers, Leads, and Architects

2012-06-07 Thread SV
Hi,

Best Buy is building new Search Platform/Eco-System powered by Lucene/Solr.
We are hiring multiple Lucene/Solr engineers, tech leads, and architects,
both full-time and consulting based in Minneapolis, MN. This is a long term
project and the team is fun to work with.

Please reach out to me if you are interested @ venkat.amb...@bestbuy.com

Thanks,
Venkat Ambati
Sr. Manager, Digital Commerce Tower, GBS IT
Best Buy


RE: issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults

2012-06-07 Thread Markus Jelsma
Hi

The search is distributed over all shards. The problem exists locally as well.

Thanks,
 
-Original message-
 From:Jack Krupansky j...@basetechnology.com
 Sent: Wed 06-Jun-2012 17:07
 To: solr-user@lucene.apache.org
 Subject: Re: issues with spellcheck.maxCollationTries and 
 spellcheck.collateExtendedResults
 
 Do single-word queries return hits?
 
 Is this a multi-shard environment? Does the request list all the shards 
 needed to give hits for all the collations you expect? Maybe the queries are 
 being done locally and don't have hits for the collations locally.
 
 -- Jack Krupansky
 
 -Original Message- 
 From: Markus Jelsma
 Sent: Wednesday, June 06, 2012 6:21 AM
 To: solr-user@lucene.apache.org
 Subject: issues with spellcheck.maxCollationTries and 
 spellcheck.collateExtendedResults
 
 Hi,
 
 We've had some issues with a bad zero-hits collation being returned for a 
 two word query where one word was only one edit away from the required 
 collation. With spellcheck.maxCollations to a reasonable number we saw the 
 various suggestions without the required collation. We decreased 
 thresholdTokenFrequency to make it appear in the list of collations. 
 However, with collateExtendedResults=true the hits field for each collation 
 was zero, which is incorrect.
 
 Required collation=huub stapel (two hits) and q=huup stapel
 
   collation:{
 collationQuery:heup stapel,
 hits:0,
 misspellingsAndCorrections:{
   huup:heup}},
   collation:{
 collationQuery:hugo stapel,
 hits:0,
 misspellingsAndCorrections:{
   huup:hugo}},
   collation:{
 collationQuery:hulp stapel,
 hits:0,
 misspellingsAndCorrections:{
   huup:hulp}},
   collation:{
 collationQuery:hup stapel,
 hits:0,
 misspellingsAndCorrections:{
   huup:hup}},
   collation:{
 collationQuery:huub stapel,
 hits:0,
 misspellingsAndCorrections:{
   huup:huub}},
   collation:{
 collationQuery:huur stapel,
 hits:0,
 misspellingsAndCorrections:{
   huup:huur}
 
 Now, with maxCollationTries set to 3 or higher we finally get the required 
 collation and the only collation able to return results. How can we 
 determine the best value for maxCollationTries regarding the decrease of the 
 thresholdTokenFrequency? Why is hits always zero?
 
 This is with a today's build and distributed search enabled.
 
 Thanks,
 Markus 
 
 


Re: Solr 4.0 Clean Commit for production use

2012-06-07 Thread TheNova
Thanks everyone!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-Clean-Commit-for-production-use-tp3987852p3988183.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults

2012-06-07 Thread Markus Jelsma
Hello!

-Original message-
 From:Dyer, James james.d...@ingrambook.com
 Sent: Wed 06-Jun-2012 17:23
 To: solr-user@lucene.apache.org
 Subject: RE: issues with spellcheck.maxCollationTries and 
 spellcheck.collateExtendedResults
 
 Markus,
 
 With maxCollationTries=0, it is not going out and querying the collations 
 to see how many hits they each produce.  So it doesn't know the # of hits.  
 That is why if you also specify collateExtendedResults=true, all the hit 
 counts are zero.  It would probably be better in this case if it would not 
 report hits in the extended response at all.  (On the other hand, if you're 
 seeing zeros and maxCollationTries0, then you've hit a bug!)

I see. It would indeed make sense to get rid of the hits field when it's always 
zero anyway if maxCollationTries=0. Despite your recent explanations it raises 
some confusion.

 
 thresholdTokenFrequency in my opinion is a pretty blunt instrument for 
 getting rid of bad suggestions.  It takes out all of the rare terms, 
 presuming that if a term is rare in the data it either is a mistake or isn't 
 worthy to be suggested ever.  But if you're using maxCollationTries the 
 suggestions that don't fit will be filtered out automatically, making 
 thresholdTokenFrequency to be needed less.  (On the other hand, if you're 
 using IndexBasedSpellChecker, thresholdTokenFrequency will make the 
 dictionary smaller and spellcheck.build run faster...  This is solved 
 entirely in 4.0 with DirectSolrSpellChecker...) 

I forgot to mention this is with the DirectSolrSpellChecker. I guess we'll just 
have to try working with the thresholdTokenFrequency. It's difficult, however, 
because the index will grow but changes are that at some point the rare, but 
correct, token drops below the threshold and is not suggested anymore. We also 
see the benefit from the threshold since our index is human editted and 
contains rare but misspelled words.

 
 For the apps here, I've been using maxCollationTries=10 and have been 
 getting good results.  Keep in mind that even though you're allowing it to 
 try up to 10 queries to find a viable collation, so long as you're setting 
 maxCollations to something low it will (hopefully) seldom need to try more 
 than a couple before finding one with hits.  (I always ask for only 1 
 collation as we just re-apply the spelling correction automatically if the 
 original query returned nothing).  Also, if spellcheck.count is low it 
 might not have enough terms available to try, so you might need to raise this 
 value also if raising maxCollationTries.

We have a similar set-up and require only one collation to be returned. I can 
increase maxCollationTries.

 
 The worse problem, in my opinion is the fact that it won't ever suggest words 
 if they're in the index (even if using thresholdTokenFrequency to remove 
 them from the dictionary).  For that there is 
 https://issues.apache.org/jira/browse/SOLR-2585 which is part of Solr 4.  The 
 only other workaround is onlyMorePopular which has its own issues.  (see 
 http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount).

We don't really like onlyMorePopular since more hits is not always a better 
suggestion. We decided to turn it off quite some time ago. Also because of 
SOLR-2555.AlternativeTermCount may indeed be a solution.

Thanks, we'll manage for now.

 
 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311
 
 
 -Original Message-
 From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
 Sent: Wednesday, June 06, 2012 5:22 AM
 To: solr-user@lucene.apache.org
 Subject: issues with spellcheck.maxCollationTries and 
 spellcheck.collateExtendedResults
 
 Hi,
 
 We've had some issues with a bad zero-hits collation being returned for a two 
 word query where one word was only one edit away from the required collation. 
 With spellcheck.maxCollations to a reasonable number we saw the various 
 suggestions without the required collation. We decreased 
 thresholdTokenFrequency to make it appear in the list of collations. However, 
 with collateExtendedResults=true the hits field for each collation was zero, 
 which is incorrect.
 
 Required collation=huub stapel (two hits) and q=huup stapel
 
   collation:{
 collationQuery:heup stapel,
 hits:0,
 misspellingsAndCorrections:{
   huup:heup}},
   collation:{
 collationQuery:hugo stapel,
 hits:0,
 misspellingsAndCorrections:{
   huup:hugo}},
   collation:{
 collationQuery:hulp stapel,
 hits:0,
 misspellingsAndCorrections:{
   huup:hulp}},
   collation:{
 collationQuery:hup stapel,
 hits:0,
 misspellingsAndCorrections:{
   huup:hup}},
   collation:{
 collationQuery:huub stapel,
 hits:0,
 misspellingsAndCorrections:{
   huup:huub}},
   collation:{
 collationQuery:huur stapel,
 

Re: how to work with solr

2012-06-07 Thread Jack Krupansky

What language environment are you using? PHP, Python, Ruby, other?

Each has its own interface.

But ultimately Solr is just another web service with an HTTP and XML or JSON 
interface. So, it is mostly a question of how your client environment 
accesses web services that have an HTTP and XML or JSON interface.


There is a little info here:
http://www.lucidimagination.com/blog/2010/01/14/solr-search-user-interface-examples/

-- Jack Krupansky

-Original Message- 
From: sdssfour

Sent: Thursday, June 07, 2012 7:38 AM
To: solr-user@lucene.apache.org
Subject: how to work with solr

Hi all

can anybosy suggest me how to work with solr in web application

please  send the information


Regards
Raja

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-work-with-solr-tp3988154.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: filtering number and repeated contents

2012-06-07 Thread Jack Krupansky
Solr (Lucene actually) stores the source form of the data that was fed to 
Solr, so it is not yet tokenized and will include all punctuation and 
whitespace.


-- Jack Krupansky

-Original Message- 
From: Mark , N

Sent: Thursday, June 07, 2012 7:45 AM
To: solr-user@lucene.apache.org
Subject: Re: filtering number and repeated contents

thanks Jack  , I will try updateProcessor

Between does SOLR store tokenized content in fields if field have
property stored=true ?







On Tue, Jun 5, 2012 at 8:23 PM, Jack Krupansky 
j...@basetechnology.comwrote:



My (very limited) understanding of boilerpipe in Tika is that it strips
out short text, which is great for all the menu and navigation text, but
the typical disclaimer at the bottom of an email is not very short and
frequently can be longer than the email message body itself. You may have
to resort to a custom update processor that is programmed with some
disclaimer signature text strings to be removed from field values.

-- Jack Krupansky

-Original Message- From: Mark , N
Sent: Tuesday, June 05, 2012 8:28 AM
To: solr-user@lucene.apache.org
Subject: filtering number and repeated contents


Is it possible to filter out numbers and disclaimer ( repeated contents)
while indexing to SOLR?
These are all surplus information and do not want to index it

I have tried using boilerpipe algorithm as well to remove surplus
infromation from web pages such as navigational elements, templates, and
advertisements , I think it works well but looking forward to see If I
could filter out  disclaimer information too mainly in email texts.
--
Thanks,

*Nipen Mark *





--
Thanks,

*Nipen Mark * 



Re: Boost by Nested Query / Join Needed?

2012-06-07 Thread naleiden
Thanks for your reply.

I think the number could eventually get very large (~1B) as our
customer-base grows, since each customer could possibly have a preference
for each candy, but currently we're looking at around 50M.

I've looked at the Solr-2272 patch for joins, which looks as though it might
fit the bill, but don't want to ignore an underlying scalability issue if my
schema organization doesn't make sense.

Also, it has recently been brought to my attention that it might be
problematic if preferences are updated frequently, which they will be
('candy' records will not be). If it helps things at all, I never have to do
any *direct* searches (just indirect/join-type referencing) on the
preference data.

Does it make more sense to try to index preference data in a separate core
and use another (non-nested) query to obtain them?

I had thought of trying a nested query with the query Function Query, but I
need the 'candy' id from the initial query, which amounts to join-like
behavior.

Thanks again for your guidance,
-Nick

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boost-by-Nested-Query-Join-Needed-tp3987818p3988210.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Walter Underwood
This is a bad idea. Solr is not designed to be exposed to arbitrary internet 
traffic and attacks. The best design is to have a front end server make 
requests to Solr, then use those to make HTML pages.

wunder

On Jun 7, 2012, at 4:49 AM, Spadez wrote:

 Final comment from me then Ill let someone else speak.
 
 The solution we seem to be looking at is send a GET request to SOLR and then
 send back a renderized page, so we are basically creating the results page
 on the server rather than the client side.
 
 I would really like to hear what people have to say about this. Is this a
 good idea? Are there any major disadvantages? 
 
 It seems like the only way to go to have a reliable search site which works
 without Javascript.
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988158.html
 Sent from the Solr - User mailing list archive at Nabble.com.






Re: solr replication lag

2012-06-07 Thread Michael Della Bitta
Hello, Boris,

If I remember correctly, older versions of Solr report the version of
the as-of-yet uncommitted core in the replication page. So if you did
a commit on the master and then a replication, you'd see that version
on the client.

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Thu, Jun 7, 2012 at 3:53 AM, Boris Vorotnikov bori...@auto.ru wrote:
 Hello,

 My name is Boris Vorotnikov. I am a developer of project parts.auto.ru. My 
 team uses solr v.3.5 in this project. Several days ago I noticed that 
 replication between master and slave had a time lag. There was no such lags 
 before. I tried to find the source of trouble but it was unsuccessfully. All 
 I found is that master tells more recent version of index than it has. And 
 when I press the button Replicate now on slave it receive information about 
 index that it already has and do nothing.

 This is configuration of master and slave replication:
 requestHandler name=/replication class=solr.ReplicationHandler 
    lst name=master
        str name=enable${enable.master:false}/str
        str name=replicateAfterstartup/str
        str name=replicateAfteroptimize/str
        str 
 name=confFilesschema.xml,stopwords.txt,synonyms.txt,spellings.txt/str
    /lst
    lst name=slave
        str name=enable${enable.slave:false}/str
        str 
 name=masterUrlhttp://__solr.master_url__:__solr.port__/solr/parts/replication/str
        str name=pollInterval01:00:00/str
    /lst
    str name=maxNumberOfBackups3/str
 /requestHandler

 enable.master - parameter of command line. It works.
 Solr replicates only once a day when we do index optimization by cron.
 Is there any parameter responsible for keeping index version or something 
 else?




 Best regards,
 Boris Vorotnikov
 Developer auto.ru

 Tel: +7 (499) 780 3780 # 424
 E-mail: bori...@auto.ru








Re: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Michael Della Bitta
And keep Solr behind a firewall or authentication or even better,
both! People *will* find and exploit your Solr installation.

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Thu, Jun 7, 2012 at 10:31 AM, Walter Underwood wun...@wunderwood.org wrote:
 This is a bad idea. Solr is not designed to be exposed to arbitrary internet 
 traffic and attacks. The best design is to have a front end server make 
 requests to Solr, then use those to make HTML pages.

 wunder

 On Jun 7, 2012, at 4:49 AM, Spadez wrote:

 Final comment from me then Ill let someone else speak.

 The solution we seem to be looking at is send a GET request to SOLR and then
 send back a renderized page, so we are basically creating the results page
 on the server rather than the client side.

 I would really like to hear what people have to say about this. Is this a
 good idea? Are there any major disadvantages?

 It seems like the only way to go to have a reliable search site which works
 without Javascript.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988158.html
 Sent from the Solr - User mailing list archive at Nabble.com.






Re: Exception when optimizing index

2012-06-07 Thread Jack Krupansky
Is the index otherwise usable for queries? And it is only the optimize that 
is failing?


I suppose it is possible that the index could be corrupted, but it is also 
possible that there is a bug in Lucene.


I would suggest running Lucene CheckIndex next. See what it has to say.

See:
https://builds.apache.org/job/Lucene-trunk/javadoc/core/org/apache/lucene/index/CheckIndex.html#main(java.lang.String[])

-- Jack Krupansky

-Original Message- 
From: Rok Rejc

Sent: Thursday, June 07, 2012 5:50 AM
To: solr-user@lucene.apache.org
Subject: Re: Exception when optimizing index

Hi Jack,

its the virtual machine running on a VMware vSphere 5 Enterprise Plus.
Machine has 30 GB vRAM, 8 core vCPU 3.0 GHz, 2 TB SATA RAID-10 over iSCSI.
Operation system is CentOS 6.2 64bit.

Here are java infos:


  - catalina.​base/usr/share/tomcat6
  - catalina.​home/usr/share/tomcat6
  - catalina.​useNamingtrue
  - common.​loader
  
${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar
  - file.​encodingUTF-8
  - file.​encoding.​pkgsun.io
  - file.​separator/
  - java.​awt.​graphicsenvsun.awt.X11GraphicsEnvironment
  - java.​awt.​printerjobsun.print.PSPrinterJob
  - java.​class.​path
  /usr/share/tomcat6/bin/bootstrap.jar
  /usr/share/tomcat6/bin/tomcat-juli.jar/usr/share/java/commons-daemon.jar
  - java.​class.​version50.0
  - java.​endorsed.​dirs
  - java.​ext.​dirs
  /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/ext
  /usr/java/packages/lib/ext
  - java.​home/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre
  - java.​io.​tmpdir/var/cache/tomcat6/temp
  - java.​library.​path
  /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64/server
  /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64
  /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/../lib/amd64
  /usr/java/packages/lib/amd64/usr/lib64/lib64/lib/usr/lib
  - java.​naming.​factory.​initial
  org.apache.naming.java.javaURLContextFactory
  - java.​naming.​factory.​url.​pkgsorg.apache.naming
  - java.​runtime.​nameOpenJDK Runtime Environment
  - java.​runtime.​version1.6.0_22-b22
  - java.​specification.​nameJava Platform API Specification
  - java.​specification.​vendorSun Microsystems Inc.
  - java.​specification.​version1.6
  - java.​util.​logging.​config.​file
  /usr/share/tomcat6/conf/logging.properties
  - java.​util.​logging.​managerorg.apache.juli.ClassLoaderLogManager
  - java.​vendorSun Microsystems Inc.
  - java.​vendor.​urlhttp://java.sun.com/
  - java.​vendor.​url.​bughttp://java.sun.com/cgi-bin/bugreport.cgi
  - java.​version1.6.0_22
  - java.​vm.​infomixed mode
  - java.​vm.​nameOpenJDK 64-Bit Server VM
  - java.​vm.​specification.​nameJava Virtual Machine Specification
  - java.​vm.​specification.​vendorSun Microsystems Inc.
  - java.​vm.​specification.​version1.0
  - java.​vm.​vendorSun Microsystems Inc.
  - java.​vm.​version20.0-b11
  - javax.​sql.​DataSource.​Factory
  org.apache.commons.dbcp.BasicDataSourceFactory
  - line.​separator
  - os.​archamd64
  - os.​nameLinux
  - os.​version2.6.32-220.13.1.el6.x86_64
  - package.​access
  
sun.,org.apache.catalina.,org.apache.coyote.,org.apache.tomcat.,org.apache.jasper.,sun.beans.
  - package.​definition
  
sun.,java.,org.apache.catalina.,org.apache.coyote.,org.apache.tomcat.,org.apache.jasper.
  - path.​separator:
  - server.​loader
  - shared.​loader
  - sun.​arch.​data.​model64
  - sun.​boot.​class.​path
  /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/resources.jar
  /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/rt.jar
  /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/sunrsasign.jar
  /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/jsse.jar
  /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/jce.jar
  /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/charsets.jar
  /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/netx.jar
  /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/plugin.jar
  /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/rhino.jar
  /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/modules/jdk.boot.jar
  /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/classes
  - sun.​boot.​library.​path
  /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64
  - sun.​cpu.​endianlittle
  - sun.​cpu.​isalist
  - sun.​io.​unicode.​encodingUnicodeLittle
  - sun.​java.​commandorg.apache.catalina.startup.Bootstrap start
  - sun.​java.​launcherSUN_STANDARD
  - sun.​jnu.​encodingUTF-8
  - sun.​management.​compilerHotSpot 64-Bit Tiered Compilers
  - sun.​os.​patch.​levelunknown
  - tomcat.​util.​buf.​StringCache.​byte.​enabledtrue
  - user.​countryUS
  - user.​dir/usr/share/tomcat6
  - user.​home/usr/share/tomcat6
  - user.​languageen
  - user.​nametomcat
  - user.​timezoneEurope/Ljubljana




As far as I see from the JIRA issue I have the patch attached (as mentioned
I have a trunk version from May 12). Any ideas?

Many thanks!



On Wed, Jun 6, 2012 at 2:49 PM, Jack 

Re: How to cap facet counts beyond a specified limit

2012-06-07 Thread Jack Krupansky

Sounds like an interesting improvement to propose.

It will also depend on various factors, such as number of unique terms in a 
field, field type, etc.


Which field types are giving you the most trouble and how many unique values 
do they have? And do you specify a facet.method or just let it default?


What release of Solr are you on? Are you using trie for numeric fields? 
Are these mostly string fields? Any boolean fields?


-- Jack Krupansky

-Original Message- 
From: Andrew Laird

Sent: Thursday, June 07, 2012 4:01 AM
To: solr-user@lucene.apache.org
Subject: How to cap facet counts beyond a specified limit

We have an index with ~100M documents and I am looking for a simple way to 
speed up faceted searches.  Is there a relatively straightforward way to 
stop counting the number of matching documents beyond some specifiable 
value?  For our needs we don't really need to know that a particular facet 
has exactly 14,203,527 matches - just knowing that there are more than a 
million is enough.  If I could somehow limit the hit counts to a million 
(say) it seems like that could decrease the work required to compute the 
values (just stop counting after the limit is reached) and potentially 
improve faceted search time - especially when we have 20-30 fields to facet 
on.  Has anyone else tried to do something like this?


Many thanks for comments and info,

Sincerely,


andy laird | gettyimages | 206.925.6728







return *all* words at levenstein distance = N from query word

2012-06-07 Thread Giovanni Gherdovich
Hi all,

I am wandering if SOLR can return me all words in my text corpus
that have a given levenstein distance with my query word.

Possible?
Difficult?

Cheers,
Giovanni


Re: return *all* words at levenstein distance = N from query word

2012-06-07 Thread Paul Libbrecht
I would debug somewhere close to the FuzzyQuery.
Lucene is doing exactly that (just as PrefixQueries are doing): expand a 
FuzzyQuery (PrefixQuery) to a disjunction of term-queries for the words that 
match that fuzzy or prefix queries.

Maybe it helps you start?

paul

Le 7 juin 2012 à 18:15, Giovanni Gherdovich a écrit :

 Hi all,
 
 I am wandering if SOLR can return me all words in my text corpus
 that have a given levenstein distance with my query word.
 
 Possible?
 Difficult?
 
 Cheers,
 Giovanni



Re: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Spadez
Thank you for the reply, but I'm afraid I don't understand :(

This is how things are setup. On my Python website, I have a keyword and
location box. When clicked, it queries the server via a javascript GET
request, it then sends back the data via Json.

I'm saying that I dont want to be reliant on Javascript. So I'm confused
about the best way to not only send the request to the Solr server, but also
how to receive the data.

My guess is that a GET request without javascript is the right way to send
the request to the Solr server, but then what should Solr be spitting out
the other end, just an XML file? Then is the idea that my Python site would
receive this XML data and display it on the site?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988246.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Boost by Nested Query / Join Needed?

2012-06-07 Thread naleiden
For posterity, I think we're going to remove 'preference' data from Solr
indexing and go in the custom Function Query direction with a key-value
store.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boost-by-Nested-Query-Join-Needed-tp3987818p3988255.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Ben Woods
I'm new to Solr...but this is more of a web programming question...so I can get 
in on this :).

Your only option to get the data from Solr sans-Javascript, is the use python 
to pull the results BEFORE the client loads the page.

So, if you are asking if you can get AJAX like results (an already loaded page 
pulling info from your Solr server)...but without using Javascript...no, you 
cannot do that. You might be able to hack something ugly together using 
iframes, but trust me, you don't want to. It will look bad, it won't work well, 
and interacting with data in an iframe is nightmarish.

So, basically, if you don't want to use Javascript, your only option is a total 
page reload every time you need to query Solr (which you then query on the 
python side.)

-Original Message-
From: Spadez [mailto:james_will...@hotmail.com]
Sent: Thursday, June 07, 2012 11:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Help! Confused about using Jquery for the Search query - Want to 
ditch it

Thank you for the reply, but I'm afraid I don't understand :(

This is how things are setup. On my Python website, I have a keyword and 
location box. When clicked, it queries the server via a javascript GET
request, it then sends back the data via Json.

I'm saying that I dont want to be reliant on Javascript. So I'm confused about 
the best way to not only send the request to the Solr server, but also how to 
receive the data.

My guess is that a GET request without javascript is the right way to send 
the request to the Solr server, but then what should Solr be spitting out the 
other end, just an XML file? Then is the idea that my Python site would receive 
this XML data and display it on the site?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988246.html
Sent from the Solr - User mailing list archive at Nabble.com.

Quincy and its subsidiaries do not discriminate in the sale of advertising in 
any medium (broadcast, print, or internet), and will accept no advertising 
which is placed with an intent to discriminate on the basis of race or 
ethnicity.



RE: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Spadez
Hi Ben,

Thank you for the reply. So, If I don't want to use Javascript and I want
the entire page to reload each time, is it being done like this?

1. User submits form via GET
2. Solr server queried via GET
3. Solr server completes query
4. Solr server returns XML output
5. XML data put into results page
6. User shown new results page

Is this basically how it would work if we wanted Javascript out of the
equation?

Regards,

James



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988272.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Nick Chase



On 6/7/2012 1:53 PM, Spadez wrote:

Hi Ben,

Thank you for the reply. So, If I don't want to use Javascript and I want
the entire page to reload each time, is it being done like this?

1. User submits form via GET
2. Solr server queried via GET
3. Solr server completes query
4. Solr server returns XML output
5. XML data put into results page
6. User shown new results page

Is this basically how it would work if we wanted Javascript out of the
equation?


Seems to me that you'd still have to have Javascript turn the XML into 
HTML -- unless you use the XsltResponseWriter 
(http://wiki.apache.org/solr/XsltResponseWriter) to use XSLT to turn the 
raw XML into your actual results HTML.


The other option is to create a python page that does the call to Solr 
and spits out just the HTML for your results, then call THAT rather than 
calling Solr directly.


  Nick


replication start notification

2012-06-07 Thread Jon Kirton
Is there a programmatic way or otherwise to become aware when the
replication operation starts?  In looking at the source for
ReplicationHandler, there aren't log statements to indicate that it started.

Thanks, Jon


Re: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Michael Della Bitta
On Thu, Jun 7, 2012 at 1:59 PM, Nick Chase nch...@earthlink.net wrote:
 The other option is to create a python page that does the call to Solr and 
 spits out just the HTML for your results, then call THAT rather than calling 
 Solr directly.

This is the *only* option if you're listening to Walter and I. Don't
give end users direct access to your Solr box!

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


Re: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Nick Chase
+1 on that!  If you do want to provide direct results, ALWAYS send 
requests through a proxy that can verify that a) all requests are coming 
from your web app, and b) only acceptable queries are being passed on.


  Nick

On 6/7/2012 2:50 PM, Michael Della Bitta wrote:

On Thu, Jun 7, 2012 at 1:59 PM, Nick Chasench...@earthlink.net  wrote:

The other option is to create a python page that does the call to Solr and 
spits out just the HTML for your results, then call THAT rather than calling 
Solr directly.


This is the *only* option if you're listening to Walter and I. Don't
give end users direct access to your Solr box!


RE: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Ben Woods
Yes (or, at least, I think I understand what you are saying, haha.) Let me 
clarify.

1. Client sends GET request to web server
2. Web server (via Python, in your case, if I remember correctly) queries Solr 
Server
3. Solr server sends response to web server
4. You take that data and put it into the page you are creating server-side
5. Server returns static page to client

-Original Message-
From: Spadez [mailto:james_will...@hotmail.com]
Sent: Thursday, June 07, 2012 12:53 PM
To: solr-user@lucene.apache.org
Subject: RE: Help! Confused about using Jquery for the Search query - Want to 
ditch it

Hi Ben,

Thank you for the reply. So, If I don't want to use Javascript and I want the 
entire page to reload each time, is it being done like this?

1. User submits form via GET
2. Solr server queried via GET
3. Solr server completes query
4. Solr server returns XML output
5. XML data put into results page
6. User shown new results page

Is this basically how it would work if we wanted Javascript out of the equation?

Regards,

James



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988272.html
Sent from the Solr - User mailing list archive at Nabble.com.

Quincy and its subsidiaries do not discriminate in the sale of advertising in 
any medium (broadcast, print, or internet), and will accept no advertising 
which is placed with an intent to discriminate on the basis of race or 
ethnicity.



RE: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Spadez
Thank you, that helps. The bit I am still confused about how the server sends
the response to the server though. I get the impression that there are
different ways that this could be done, but is sending an XML response back
to the Python server the best way to do this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988302.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Ben Woods
As far as I know, it is the only way to do this. Look around a bit, Python (or 
PHP, or C, etc., etc.) is able to act as an HTTP client...in fact, that is the 
most common way that web services are consumed. But, we are definitely beyond 
the scope of the Solr list at this point.

-Original Message-
From: Spadez [mailto:james_will...@hotmail.com]
Sent: Thursday, June 07, 2012 2:09 PM
To: solr-user@lucene.apache.org
Subject: RE: Help! Confused about using Jquery for the Search query - Want to 
ditch it

Thank you, that helps. The bit I am still confused about how the server sends 
the response to the server though. I get the impression that there are 
different ways that this could be done, but is sending an XML response back to 
the Python server the best way to do this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988302.html
Sent from the Solr - User mailing list archive at Nabble.com.

Quincy and its subsidiaries do not discriminate in the sale of advertising in 
any medium (broadcast, print, or internet), and will accept no advertising 
which is placed with an intent to discriminate on the basis of race or 
ethnicity.



RE: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Ben Woods
But, check out things like httplib2 and urllib2.

-Original Message-
From: Spadez [mailto:james_will...@hotmail.com]
Sent: Thursday, June 07, 2012 2:09 PM
To: solr-user@lucene.apache.org
Subject: RE: Help! Confused about using Jquery for the Search query - Want to 
ditch it

Thank you, that helps. The bit I am still confused about how the server sends 
the response to the server though. I get the impression that there are 
different ways that this could be done, but is sending an XML response back to 
the Python server the best way to do this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988302.html
Sent from the Solr - User mailing list archive at Nabble.com.

Quincy and its subsidiaries do not discriminate in the sale of advertising in 
any medium (broadcast, print, or internet), and will accept no advertising 
which is placed with an intent to discriminate on the basis of race or 
ethnicity.



Re: timeAllowed flag in the response

2012-06-07 Thread Walter Underwood
Are you requesting a large number of rows? If so, request smaller chunks, like 
ten at a time. Then you can show those with a waiting note.

wunder

On Jun 7, 2012, at 1:14 PM, Laurent Vaills wrote:

 Hi everyone,
 
 We have some grouping queries that are quite long to execute. Some are too
 long to execute and are not acceptable. We have setup timeout for the
 socket but with this we get no result and the query is still running on the
 Solr side.
 So, we are now using the timeAllowed parameter which is a good compromise.
 However, in the response, how can we know that the query was stopped
 because it was too long ?
 
 I need this information for monitoring and to tell the user that the
 results are not complete.
 
 Regards,
 Laurent







Re: replication start notification

2012-06-07 Thread Jack Krupansky

SOLR-1855 has a script that checks replication details:

/solr/${CORE}/replication?command=details

# Get the last time the core replicated correctly.
# Get the last time the core failed to replicate.
# Is this core replicating (aka pulling index from master) right now?

See:
https://issues.apache.org/jira/browse/SOLR-1855

-- Jack Krupansky

-Original Message- 
From: Jon Kirton

Sent: Thursday, June 07, 2012 2:30 PM
To: solr-user@lucene.apache.org
Subject: replication start notification

Is there a programmatic way or otherwise to become aware when the
replication operation starts?  In looking at the source for
ReplicationHandler, there aren't log statements to indicate that it started.

Thanks, Jon 



PorterStemmerTokenizerFactory ?

2012-06-07 Thread Carrie Coy
I've read different suggestions on how to handle cases where synonyms 
are used and there are multiple
version of the original word that need to point to the same set of 
synonyms (/responsibility, responsibilities, obligation, duty/ ).


The approach that seems most logical is to configure a 
SynonymFilterFactory to use a custom TokenizerFactory that stems 
synonyms by calling out to the PorterStemmer.


Does anyone know if a PorterStemmerTokenizerFactory already exists 
somewhere?


Thank you.
Carrie Coy


Re: PorterStemmerTokenizerFactory ?

2012-06-07 Thread Jack Krupansky

Look at the text_en field type in the Solr 3.6 example schema.

-- Jack Krupansky

-Original Message- 
From: Carrie Coy 
Sent: Thursday, June 07, 2012 5:04 PM 
To: solr-user@lucene.apache.org 
Subject: PorterStemmerTokenizerFactory ? 

I've read different suggestions on how to handle cases where synonyms 
are used and there are multiple
version of the original word that need to point to the same set of 
synonyms (/responsibility, responsibilities, obligation, duty/ ).


The approach that seems most logical is to configure a 
SynonymFilterFactory to use a custom TokenizerFactory that stems 
synonyms by calling out to the PorterStemmer.


Does anyone know if a PorterStemmerTokenizerFactory already exists 
somewhere?


Thank you.
Carrie Coy


Re: Filter query vs Facets

2012-06-07 Thread Jack Krupansky

You may want to read the faceting overview:
http://wiki.apache.org/solr/SolrFacetingOverview

-- Jack Krupansky

-Original Message- 
From: Swetha Shenoy 
Sent: Thursday, June 07, 2012 5:24 PM 
To: solr-user@lucene.apache.org 
Subject: Filter query vs Facets 


Hi,

I had a question regarding the filter query (fq) and faceted search. Both
are used to filter search results. Can someone tell me how they are
different and when you would use one over the other?

Thanks,
Swetha


ContentStreamUpdateRequest method addFile in 4.0 release.

2012-06-07 Thread Koorosh Vakhshoori
In latest 4.0 release, the addFile() method has a new argument 'contentType':

addFile(File file, String contentType)

In context of Solr Cell how should addFile() method be called? Specifically
I refer to the Wiki example:

ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest(/update/extract);
up.addFile(new File(mailing_lists.pdf));
up.setParam(literal.id, mailing_lists.pdf);
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
result = server.request(up);
assertNotNull(Couldn't upload mailing_lists.pdf, result);
rsp = server.query( new SolrQuery( *:*) );
Assert.assertEquals( 1, rsp.getResults().getNumFound() );

given at URL: http://wiki.apache.org/solr/ExtractingRequestHandler

Since Solr Cell is calling Tika under the hood, doesn't the file
content-type is already identified by Tika? Looking at the code, it seems
passing NULL would do the job, is that correct? Also for Solr Cell, is the
ContentStreamUpdateRequest class is the right one to use or there is a
different class that is more appropriate here?

Thanks
 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/ContentStreamUpdateRequest-method-addFile-in-4-0-release-tp3988344.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using Data Import Handler to invoke a stored procedure with output (cursor) parameter

2012-06-07 Thread Niran Fajemisin
Thanks Michael and Lance! 

I decided to go with an Oracle Pipelined Table function and that took care of 
it. I think that's what Michael was referring to below. This enabled us to be 
able to make a simple SQL call.

Thanks again.





 From: Lance Norskog goks...@gmail.com
To: solr-user@lucene.apache.org 
Sent: Sunday, June 3, 2012 12:28 AM
Subject: Re: Using Data Import Handler to invoke a stored procedure with 
output (cursor) parameter
 
Right, or create a view.

On Fri, Jun 1, 2012 at 8:11 PM, Michael Della Bitta
michael.della.bi...@appinions.com wrote:
 Apologies for the terseness of this reply, as I'm on my mobile.

 To treat the result of a function call as a table in Oracle SQL, use the
 table() function, like this:

 select * from table(my_stored_func())

 HTH,

 Michael
 On Jun 1, 2012 8:01 PM, Niran Fajemisin afa...@yahoo.com wrote:

 So I was able to run some additional tests today on this. I tried to use a
 stored function instead of a stored procedure. The hope was that the Stored
 Function would simply be a wrapper for the Store Procedure and would simply
 return the cursor as the return value. This unfortunately did not work.

 My test attempted to call the function from the query attribute of the
 entity tag as such:
 {call my_stored_func()}

 It raised an error stating that: 'my_stored_func' is not a procedure or is
 undefined.  This makes sense because the invocation format above is
 customarily reserved for a stored procedure.

 So then I tried the typical approach for invoking a function which would
 be:
 {call ? := my_stored_function()}

 And as expected this resulted in an error stating that: not all variables
 bound . Again, this is expected as the ? notation would be the
 placeholder parameter that would be bound to the OracleTypes.CURSOR
 constant in a typical JDBC program.

 Note that this function has been tested outside of DIH and it works when
 properly invoked.

 I think the bottom-line here is that there is no proper support for stored
 procedures (or functions for that matter) in DIH. This is really
 unfortunate because anyone thinking of doing any significant processing in
 the source RDBMS prior to data export would have to look elsewhere. Short
 of adding this functionality to the JdbcDataSource class of the DIH, I
 think I'm at a dead end.

 If anyone knows of any alternatives I would greatly appreciate hearing
 them.

 Thanks for the responses as usual.

 Cheers.




 
  From: Lance Norskog goks...@gmail.com
 To: solr-user@lucene.apache.org; Niran Fajemisin afa...@yahoo.com
 Sent: Thursday, May 31, 2012 3:09 PM
 Subject: Re: Using Data Import Handler to invoke a stored procedure with
 output (cursor) parameter
 
 Can you add a new stored procedure that uses your current one? It
 would operate like the DIH expects.
 
 I don't remember if DB cursors are a standard part of JDBC. If they
 are, it would be a great addition to the DIH if they work right.
 
 On Thu, May 31, 2012 at 10:44 AM, Niran Fajemisin afa...@yahoo.com
 wrote:
  Thanks for your response, Michael. Unfortunately changing the stored
 procedure is not really an option here.
 
  From what I'm seeing, it would appear that there's really no way of
 somehow instructing the Data Import Handler to get a handle on the output
 parameter from the stored procedure. It's a bit surprising though that no
 one has ran into this scenario but I suppose most people just work around
 it.
 
  Anyone else care to shed some more light on alternative approaches?
 Thanks again.
 
 
 
 
  From: Michael Della Bitta michael.della.bi...@appinions.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, May 31, 2012 9:40 AM
 Subject: Re: Using Data Import Handler to invoke a stored procedure
 with output (cursor) parameter
 
 I could be wrong about this, but Oracle has a table() function that I
 believe turns the output of a function as a table. So possibly you
 could wrap your procedure in a function that returns the cursor, or
 convert the procedure to a function.
 
 Michael Della Bitta
 
 
 Appinions, Inc. -- Where Influence Isn’t a Game.
 http://www.appinions.com
 
 
 On Thu, May 31, 2012 at 8:00 AM, Niran Fajemisin afa...@yahoo.com
 wrote:
  Hi all,
 
  I've seen a few questions asked around invoking stored procedures
 from within Data Import Handler but none of them seem to indicate what type
 of output parameters were being used.
 
  I have a stored procedure created in Oracle database that takes a
 couple input parameters and has an output parameter that is a reference
 cursor. The cursor is expected to be used as a way of iterating through the
 returned table rows. I'm using the following format to invoke my stored
 procedure in the Data Import Handler's data config XML:
 
  entity name=entity_name ... query={call my_stored_proc(inParam1,
 inParam2)} .../entity
 
  I have tested that 

Solr 4.0 Master slave configuration in JBOSS 5.1.2

2012-06-07 Thread ursamit79
I have Solr 4.0 (apache-solr-4.0) and JBoss Application Server 5.1.2
installed in RHEL 6.2
machine. I was successful in integrating solr with JBoss and I am able to
view admin console (single core).

Now I would link to create the Master/Slave configuration for Solr servers ? 

can anyone help me? 

Thanks

Amit

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-Master-slave-configuration-in-JBOSS-5-1-2-tp3988375.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question on addBean and deleteByQuery

2012-06-07 Thread Nick Zadrozny
On Wed, Jun 6, 2012 at 8:51 PM, Darin Pope da...@planetpope.com wrote:

 When using SolrJ (1.4.1 or 3.5.0) and calling either addBean or
 deleteByQuery, the POST body has numbers before and after the XML (47 and 0
 as noted in the example below):


It looks like this is HTTP chunked transfer encoding. As to whether that's
configurable in SolrJ, I defer to the experts on the list.

http://en.wikipedia.org/wiki/Chunked_transfer_encoding

-- 
Nick Zadrozny

http://websolr.com — hassle-free hosted search, powered by Apache Solr