Restricting results based on user authentication

2009-01-12 Thread Manupriya

Hi,

I am using DIH feature of Solr for indexing a database. I am using Solr
server and it is independent of my web application. I send a http request
for searching and then process the returned result.

Now we have a requirement that we have to filter the results further based
on security level restrictions?  For example, user id abc should not be
allowed to see a particular result.  How could we achieve that?

I
followed,http://www.nabble.com/Restricted-views-of-an-index-td15088750.html#a15090791
It suggests something like -
Add a role or access class to each indexed item, then use that in the 
queries, probably in a filter specified in a request handler. That keeps 
the definition of the filter within Solr. 
For example, you can create a request handler named admin, a field named 
role, and add a filter of role:admin. 

I could not follow this solution. Is there any example or resource that
explains how to use custom request handler with filtering?

Thanks,
Manu



-- 
View this message in context: 
http://www.nabble.com/Restricting-results-based-on-user-authentication-tp21411449p21411449.html
Sent from the Solr - User mailing list archive at Nabble.com.



DataImportHandler: UTF-8 and Mysql

2009-01-12 Thread gwk

Hello,

First of all thanks to Jacob Singh for his reply on my mail last week, I 
completely forgot to reply. Multicore is perfect for my needs. I've got 
Solr running now with my new schema partially implemented and I've 
started to test importing data with DIH. I've run in to a number of 
issues though and I hope someone here can help:


  1. Posting UTF-8 data through the example post-script works and I get
 the proper results back when I query using the admin page.
 However, data imported through the DataImportHandler from a MySQL
 database (the database contains correct data, it's a copy of a
 production db and selecting through the client gives the correct
 characters) I get ó instead of ó. I've tried several
 combinations of arguments to my datasource url
 (useUnicode=truecharacterEncoding=UTF-8) but it does not seem to
 help. How do I get this to work correctly?
  2. On the wikipage for DataImportHandler, the deletedPkQuery has no
 real description, am I correct in assuming it should contain a
 query which returns the ids of items which should be removed from
 the index?
  3. Another question concerning the DataImportHandler wikipage, I'm
 not sure about the exact way the field-tag works. From the first
 data-config.xml example for the full-import I can infer that the
 column-attribute represents the column from the sql-query and
 the name-attribute represents the name of the field in the
 schema the column should map to. However further on in the
 RegexTransformer section there are column-attributes which do not
 correspond to the sql-query result set and its the sourceColName
 attribute which acually represents that data, which comes from the
 RegexTransformer I understand but why then is the column
 attribute used instead of the name-attribute. This has confused
 me somewhat, any clarification would be greatly appreciated.

Regards,

gwk


Deletion of indexes.

2009-01-12 Thread Tushar_Gandhi

Hi,
   I am using solr 1.3. I am facing a problem to delete the index.
I have mysql database. Some of the data from database is deleted, but the
indexing for those records is still present. Due to that I am getting those
records in search result. I don't want this type of behavior. I want to
delete those indexes which are not present in database. Also, I don't know
which records are deleted from database and present in index. Is there any
way to solve this problem? Also I think that re indexing will not solve my
problem, because it will re index only the records which are present in
database and don't bother about the indexes which don't have reference in
database.

Can anyone have solution for this?

Thanks,
Tushar
-- 
View this message in context: 
http://www.nabble.com/Deletion-of-indexes.-tp21412630p21412630.html
Sent from the Solr - User mailing list archive at Nabble.com.



To get all indexed records.

2009-01-12 Thread Tushar_Gandhi

Hi,
   I am using solr 1.3. I want to retrieve all records from index file.
How should I write solr query so that I will get all records?

Thanks,
Tushar.
-- 
View this message in context: 
http://www.nabble.com/To-get-all-indexed-records.-tp21413170p21413170.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: To get all indexed records.

2009-01-12 Thread Akshay
Use *:* as a query to get all records. Refer to
http://wiki.apache.org/solr/SolrQuerySyntax for more info.

On Mon, Jan 12, 2009 at 5:30 PM, Tushar_Gandhi 
tushar_gan...@neovasolutions.com wrote:


 Hi,
   I am using solr 1.3. I want to retrieve all records from index file.
 How should I write solr query so that I will get all records?

 Thanks,
 Tushar.
 --
 View this message in context:
 http://www.nabble.com/To-get-all-indexed-records.-tp21413170p21413170.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Akshay Ukey.


Re: To get all indexed records.

2009-01-12 Thread Manupriya

Hi Tushar,

1. If you are using SOLR admin console to search record, then default query
'*:*' in the Query String search box will serve the purpose.

2. If you directly want to send an HTTP request for retrieving records then
you can hit a URL similar to following - 
http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on

Note - Here, 'start' and 'rows' in the URL specify the first record returned
and total number of records returned respectively.

3. If you are using Solrj for querying from Java, following code snippet
would be helpful - 

CommonsHttpSolrServer server = new
CommonsHttpSolrServer(http://localhost:8983/solr;);
SolrQuery query = new SolrQuery(*:*);
QueryResponse results = server.query(query);
SolrDocumentList list = results.getResults();

Thanks,
Manu

Tushar_Gandhi wrote:
 
 Hi,
I am using solr 1.3. I want to retrieve all records from index file.
 How should I write solr query so that I will get all records?
 
 Thanks,
 Tushar.
 

-- 
View this message in context: 
http://www.nabble.com/To-get-all-indexed-records.-tp21413170p21414148.html
Sent from the Solr - User mailing list archive at Nabble.com.



Index is not created if my database table is large

2009-01-12 Thread Rahul Brid
Hii,

I new to SOLR world...
i am using  solr multicore config in my webapp
i am able to configure solr properly...
but problem is when i am building using full data-import...
if my databse table has few number of rows say 10 to 25 the index is created
properly...
and search query returns proper result
but when i create index table for large table...index is not propery
created..
and it does not return any result for search ...
what's the problem...
can any body help me out .


my data-config file looks like this..

dataSource type=JdbcDataSource name=ds-1 driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost/retaildb?characterEncoding=UTF-8
user=kickuser password=kickapps /
 document name=countries
entity dataSource=ds-1 name=zonesToCountry pk=countryId
 query=select * from countries limit 10 deltaQuery=select * from
countries limit 10  docRoot=true
 field column=countries_id name=id /
field column=countries_name name=countries_name /
 field column=countries_iso_code_2 name=countries_iso_code_2 /
field column=countries_iso_code_3 name=countries_iso_code_3 /
 entity dataSource=ds-1 name=zones pk=zone_id  query=select * from
zones z where z.zone_country_id='${zonesToCountry.countries_id}'
 field column=zone_code name=zone_code /
field column=zone_name name=zone_name /
 /entity
/entity
 /document


-- 
Thanks and Regards
Rahul G.Brid


Re: Greter than conditions in Solr.

2009-01-12 Thread Erik Hatcher


On Jan 12, 2009, at 7:13 AM, Tushar_Gandhi wrote:

  Is it possible to write a query like id  0?


Sure... id:[1 TO *]

See here for lots more details: http://wiki.apache.org/solr/SolrQuerySyntax 
.  Be sure to follow the link to the Lucene query syntax for fuller  
details.


Erik



Re: Index is not created if my database table is large

2009-01-12 Thread gwk

Hi,

I'm not sure that this is the same issue but I had a similar problem 
with importing a large table from Mysql, on the DataImportHandler FAQ 
(http://wiki.apache.org/solr/DataImportHandlerFaq) the first issue 
mentions memory problems. Try adding the batchSize=-1 attribute to 
your datasource, it fixed the problem for me.


Regards,

gwk


Re: Index is not created if my database table is large

2009-01-12 Thread Rahul Brid
Hi,thnx for the reply ...but an you tell me where to set this batchSize???
in dataconfig.xml

On Mon, Jan 12, 2009 at 8:48 AM, gwk g...@eyefi.nl wrote:

 Hi,

 I'm not sure that this is the same issue but I had a similar problem with
 importing a large table from Mysql, on the DataImportHandler FAQ (
 http://wiki.apache.org/solr/DataImportHandlerFaq) the first issue mentions
 memory problems. Try adding the batchSize=-1 attribute to your datasource,
 it fixed the problem for me.

 Regards,

 gwk




-- 
Thanks and Regards
Rahul G.Brid


Re: Index is not created if my database table is large

2009-01-12 Thread Rahul Brid
Hey...i tried using batchsize=-1 it doesnt work,..I am not getting any
memory problem as such...
http://127.0.0.1/search/products/dataimport?command=full-importdebug=onverbose=true
which runs without error gives me the response also
but when i query using admin it does not returns any result set..this
happens when database table has large number of rows

On Mon, Jan 12, 2009 at 9:17 AM, Rahul Brid rahul.b...@balajisoftware.inwrote:

 Hi,thnx for the reply ...but an you tell me where to set this batchSize???
 in dataconfig.xml


 On Mon, Jan 12, 2009 at 8:48 AM, gwk g...@eyefi.nl wrote:

 Hi,

 I'm not sure that this is the same issue but I had a similar problem with
 importing a large table from Mysql, on the DataImportHandler FAQ (
 http://wiki.apache.org/solr/DataImportHandlerFaq) the first issue
 mentions memory problems. Try adding the batchSize=-1 attribute to your
 datasource, it fixed the problem for me.

 Regards,

 gwk




 --
 Thanks and Regards
 Rahul G.Brid




Re: Database permissions integration and Sub documents

2009-01-12 Thread Stephen Weiss


On Jan 11, 2009, at 10:08 PM, Mike Shredder wrote:


Hi ,
  I'm new to Solr ..  I've been able to get Solr up  running.  But  
got

some quick questions.

 1) How do I filter results based on permissions from an external  
database

system ?
  -- Should I implement a queryfilter which will look up  
permissions

in the DB for permissions on each doc returned .
  -- Or should I handle this in a request handler ?



I have one project that has permissions in a db.  What I do is index  
the permissions group ids along with the documents, so that I can use  
a simple query parameter appended to the users' search strings.  The  
only drawback is that when the permissions change, the documents must  
be (entirely) reindexed, which can be a pain (like when one change  
effects half your index), but it's a small price to pay for the speed  
improvements vs. constantly querying the database.


  2) I need to support sub-documents  documents. So I was planning  
to make
my sub-documents as Solr docs.  But depending on query types I need  
to dup
out sub-documents and return only one document for all sub-docs in a  
result

set. Which interface to I needs to implement to achieve this ?


Check out SOLR-236.  I'm using it for this purpose (using the ivan-3  
patch).  Works well for me although faceting can be a bit strange.

https://issues.apache.org/jira/browse/SOLR-236




  3) if I do duping  , my total result count will be off , what is the
right way to return an estimated total doc count ...


The doc count returned from solr-236 would be accurate, just the facet  
counts are off.


--
Steve



Improving Readability of Hit Highlighting

2009-01-12 Thread Terence Gannon
I'm indexing text from an OCR of an old document.  Many words get read
perfectly, but they're typically embedded in a lot of junk.  I would
like the hit highlighting to show only the 'good' words, in the order
in which they appeared in the original document.  Is it possible to
use output of the filter classes as the text used in hit highlighting?
 Or do you have to all the text cleanup outside of Solr and present it
with two fields to index, one with the original text, and one with the
cleaned up text.  The objective of the hit highlighting is to give the
user a *sense* of the original context, even if it's not provided
verbatim from the original document.  Thanks in advance.

TerryG


Re: Query regarding Spelling Suggestions

2009-01-12 Thread Grant Ingersoll
Solr 1.3 doesn't use Log4J, it uses Java Utility Logging (JUL).  I  
believe the info level in the logs is sufficient.  Let's start by  
posting what you have?


Also, are you able to get the sample spellchecking to work?

On Jan 12, 2009, at 2:16 AM, Deshpande, Mukta wrote:


Hi,

Could you please send me the needful entries in log4j.properties to
enable logging, explicitly for SpellCheckComponent.

My current log4j.properties looks like:

log4j.rootLogger=INFO,console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd  
HH:mm:ss} %p

%c{2}: %m%n
log4j.logger.org.apache.solr=DEBUG

With these settings I can only see the INFO level logs.

I tried to change the log level for SpellCheckComponent to FINE  
using
the admin logging page http://localhost:8080/solr/admin/logging but  
did

not see any difference in logging.

Thanks,
~Mukta

-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org]
Sent: Monday, January 12, 2009 3:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Query regarding Spelling Suggestions

Can you send the full log?

On Jan 11, 2009, at 1:51 PM, Deshpande, Mukta wrote:


I am using the example schema that comes with the Solr installation
downloaded from http://www.mirrorgeek.com/apache.org/lucene/solr/.
I have added the word  field with textSpell fieldtype in the
schema.xml file, as specified in the below mail.

My spelling index exist under SOLR HOME/data/ If I open my index in
Luke  I can see the entries against word
field.

Thanks,
~Mukta




From: Grant Ingersoll [mailto:gsing...@apache.org]
Sent: Fri 1/9/2009 8:29 AM
To: solr-user@lucene.apache.org
Subject: Re: Query regarding Spelling Suggestions



Can you put the full log (as short as possibly demonstrates the
problem) somewhere where I can take a look?  Likewise, can you share
your schema?

Also, does the spelling index exist under SOLR HOME/data/index?  If
you open it w/ Luke, does it have entries?

Thanks,
Grant

On Jan 8, 2009, at 11:30 PM, Deshpande, Mukta wrote:



Yes. I send the build command as:
http://localhost:8080/solr/select/? 
q=documnetspellcheck=truespellch

eck
.build
=truespellcheck.count=2spellcheck.q=parfectspellcheck.dictionar
y=dict

The Tomcat log shows:
Jan 9, 2009 9:55:19 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select/
params
={spellcheck=trueq=documnetspellcheck.q=parfectspellcheck.dicti
onary=dictspellcheck.count=2spellcheck.build=true} hits=0 status=0
QTime=141

Even after sending the build command I do not get any suggestions.
Can you please check.

Thanks,
~Mukta

-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org]
Sent: Thursday, January 08, 2009 7:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Query regarding Spelling Suggestions

Did you send in the build command?  See
http://wiki.apache.org/solr/SpellCheckComponent

On Jan 8, 2009, at 5:14 AM, Deshpande, Mukta wrote:


Hi,

I am using Wordnet dictionary for spelling suggestions.

The dictionary is converted to Solr index  with only one field
word
and stored in location solr-home/data/syn_index, using
syns2Index.java program available at
http://www.tropo.com/techno/java/lucene/wordnet.html

I have added the word field in my schema.xml as field
name=word



type=textSpell indexed=true stored=true/

My application data indexes are in solr-home/data

I am trying to use solr.IndexBasedSpellChecker to get spelling
suggestions.

My spell check component is configured as:

searchComponent name=spellcheck  
class=solr.SpellCheckComponent



str name=queryAnalyzerFieldTypetextSpell/str
lst name=spellchecker
  str name=namedict/str
  str name=classnamesolr.IndexBasedSpellChecker/str
  str name=fieldword/str
  str name=characterEncodingUTF-8/str
  str name=spellcheckIndexDir./syn_index/str
/lst
/searchComponent

I have added this component to my standard request handler as:

requestHandler name=standard class=solr.StandardRequestHandler
default=true
 lst name=defaults
 str name=echoParamsexplicit/str
 /lst
 arr name=last-components
 strspellcheck/str
 /arr
/requestHandler

With the above configuration, I do not get any spelling  
suggestions.

Can
somebody help ASAP.

Thanks,
~Mukta













--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ












Re: Deletion of indexes.

2009-01-12 Thread Ryan Grange
I got around this problem by using a trigger on the table I index that 
records the values of deleted items in a queue table so when my next 
Solr update rolls around it sends a remove request for that record's 
ID.  Once the Solr deletion is done, I remove that ID from the queue 
table.  Of course, you have to be on MySQL 5.0 or above to have that 
available to you.  Otherwise, you'll have to manually add something to 
your deletion queries to record all the IDs you're about to delete to a 
queue table.


Ryan T. Grange, IT Manager
DollarDays International, Inc.

Tushar_Gandhi wrote:

Hi,
   I am using solr 1.3. I am facing a problem to delete the index.
I have mysql database. Some of the data from database is deleted, but the
indexing for those records is still present. Due to that I am getting those
records in search result. I don't want this type of behavior. I want to
delete those indexes which are not present in database. Also, I don't know
which records are deleted from database and present in index. Is there any
way to solve this problem? Also I think that re indexing will not solve my
problem, because it will re index only the records which are present in
database and don't bother about the indexes which don't have reference in
database.

Can anyone have solution for this?

Thanks,
Tushar
  


Re: Restricting results based on user authentication

2009-01-12 Thread Chris Harris
Hi Manu,

I haven't made a custom request handler in a while, but I want to
clarify that, if you trust your application code, you don't actually
need a custom request handler to do this sort of authentication
filtering. At indexing time, you can add a role field to each object
that you index, as described in the thread. At query time, you could
simply have your application code add an appropriate filter query to
each Solr request. So, if you're using the standard XML query
interface, instead of sending URLs like

  http://.../solr/select?q=foo...

you can have your application code send URLs like

  http://.../solr/select?q=foofq=role:admin...

If I understand the custom request handler approach, then it basically
amounts to the same thing as the above; the only difference is that
the filter query gets added internally by Solr, rather than at the
application level.

Sorry if you already understand all this; I'm throwing these comments
out just in case.

Cheers,
Chris

On Mon, Jan 12, 2009 at 1:54 AM, Manupriya manupriya.si...@gmail.com wrote:

 Hi,

 I am using DIH feature of Solr for indexing a database. I am using Solr
 server and it is independent of my web application. I send a http request
 for searching and then process the returned result.

 Now we have a requirement that we have to filter the results further based
 on security level restrictions?  For example, user id abc should not be
 allowed to see a particular result.  How could we achieve that?

 I
 followed,http://www.nabble.com/Restricted-views-of-an-index-td15088750.html#a15090791
 It suggests something like -
 Add a role or access class to each indexed item, then use that in the
 queries, probably in a filter specified in a request handler. That keeps
 the definition of the filter within Solr.
 For example, you can create a request handler named admin, a field named
 role, and add a filter of role:admin. 

 I could not follow this solution. Is there any example or resource that
 explains how to use custom request handler with filtering?

 Thanks,
 Manu



 --
 View this message in context: 
 http://www.nabble.com/Restricting-results-based-on-user-authentication-tp21411449p21411449.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Custom Transformer to handle Timestamp

2009-01-12 Thread con

Hi all

I am using solr to index data from my database.
In my database there is a timestamp field of which data will be in the form
of, 
15-09-08 06:28:38.44200 AM. The column is of type TIMESTAMP in the
oracle db.
So in the schema.xml  i have mentioned as:
   field name=LOGIN_TIMESTAMP type=date indexed=true stored=true / 

While indexing data in the debug mode i get this timestamp  value as 
arr
stroracle.sql.TIMESTAMP:oracle.sql.timest...@f536e8/str
/arr

And when i do a searching this value is not displaying while all other
fields indexed along with it are getting displayed.

1) So do i need to write a custom transformer to add these values to the
index.
2)And if yes I am confused how it is? Is there a sample code somewhere?
I have tried the sample TrimTransformer and it is working. But can i convert
this string to a valid date format.(I am not a java expert..:-( )?

Expecting your reply
Thanks in advance
Con


-- 
View this message in context: 
http://www.nabble.com/Custom-Transformer-to-handle-Timestamp-tp21421742p21421742.html
Sent from the Solr - User mailing list archive at Nabble.com.



Single index - multiple SOLR instances

2009-01-12 Thread ashokc

Hello,

Is it possible to have the index created by a single SOLR instance, but have
several SOLR instances field the search queries. Or do I HAVE to replicate
the index for each SOLR instance that I want to answer queries? I need to
set up a fail-over instance. Thanks

- ashok
-- 
View this message in context: 
http://www.nabble.com/Single-index---multiple-SOLR-instances-tp21422543p21422543.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Single index - multiple SOLR instances

2009-01-12 Thread Otis Gospodnetic
Ashok,

You can put your index on any kind of shared storage - SAN, NAS, NFS (this one 
is not recommended).  That will let you point all your Solr instances to a 
single copy of your index.  Of course, you will want to test performance to 
ensure the network is not slowing things down too much, if there is network in 
the picture.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: ashokc ash...@qualcomm.com
 To: solr-user@lucene.apache.org
 Sent: Monday, January 12, 2009 3:05:40 PM
 Subject: Single index - multiple SOLR instances
 
 
 Hello,
 
 Is it possible to have the index created by a single SOLR instance, but have
 several SOLR instances field the search queries. Or do I HAVE to replicate
 the index for each SOLR instance that I want to answer queries? I need to
 set up a fail-over instance. Thanks
 
 - ashok
 -- 
 View this message in context: 
 http://www.nabble.com/Single-index---multiple-SOLR-instances-tp21422543p21422543.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Getting only fields that match

2009-01-12 Thread Otis Gospodnetic
Norbert,

Other than though explain query method I don't think we have any mechanism to 
figure out which field(s) exactly a query matched.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Norbert Hartl norb...@hartl.name
 To: solr-user@lucene.apache.org
 Sent: Sunday, January 11, 2009 6:41:12 PM
 Subject: Re: Getting only fields that match
 
 Hi,
 
 On Sun, 2009-01-11 at 17:07 +0530, Shalin Shekhar Mangar wrote:
  On Sun, Jan 11, 2009 at 4:02 PM, Norbert Hartl wrote:
  
  
   I like the search result to include only the fields
   that matched the search. Is this possible? I only
   saw the field spec where you can have a certain set
   of fields or all.
  
  
  Are you looking for highlighting (snippets)?
  
  http://wiki.apache.org/solr/HighlightingParameters
  
  A Field can be indexed (searchable) or stored (retrievable) or both. When
  you make a query to Solr, you yourself specify which fields it needs to
  search on. If they are stored, you can ask to retrieve those fields only.
  Not sure if that answers your question.
  
 no, it doesn't. I want to have the following:
 
 Doc1
   field one = super test text
   field two = something 
   field three = another thing
 
 Doc2
   field one = even other stuff
   field zzz = this is a test
 
 Searching for test I want to retrieve
 
 Doc1
   field one 
 Doc2
   field zzz
 
 So I want only retrieve the fields that match the search
 (test in this case)
 
 I hope this makes it clear.
 
 Norbert



Re: Single index - multiple SOLR instances

2009-01-12 Thread ashokc

Thanks, Otis. That is great, as I plan to place the index on NAS and make it
writable to a single solr instance (write load is not heavy) and readable by
many solr instances to handle fail-over and also share the query load (query
load can be high)

- ashok

Otis Gospodnetic wrote:
 
 Ashok,
 
 You can put your index on any kind of shared storage - SAN, NAS, NFS (this
 one is not recommended).  That will let you point all your Solr instances
 to a single copy of your index.  Of course, you will want to test
 performance to ensure the network is not slowing things down too much, if
 there is network in the picture.
 
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
 From: ashokc ash...@qualcomm.com
 To: solr-user@lucene.apache.org
 Sent: Monday, January 12, 2009 3:05:40 PM
 Subject: Single index - multiple SOLR instances
 
 
 Hello,
 
 Is it possible to have the index created by a single SOLR instance, but
 have
 several SOLR instances field the search queries. Or do I HAVE to
 replicate
 the index for each SOLR instance that I want to answer queries? I need to
 set up a fail-over instance. Thanks
 
 - ashok
 -- 
 View this message in context: 
 http://www.nabble.com/Single-index---multiple-SOLR-instances-tp21422543p21422543.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Single-index---multiple-SOLR-instances-tp21422543p21423138.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Improving Readability of Hit Highlighting

2009-01-12 Thread Otis Gospodnetic
I'm not sure if I have a good suggestion, but I have a question. :)  What is 
considered junk?  Would it be possible to eliminate the junk before it even 
goes into the index in order to avoid GIGO (Garbage In Garbage Out)?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Terence Gannon butzi0...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Monday, January 12, 2009 11:00:31 AM
 Subject: Improving Readability of Hit Highlighting
 
 I'm indexing text from an OCR of an old document.  Many words get read
 perfectly, but they're typically embedded in a lot of junk.  I would
 like the hit highlighting to show only the 'good' words, in the order
 in which they appeared in the original document.  Is it possible to
 use output of the filter classes as the text used in hit highlighting?
 Or do you have to all the text cleanup outside of Solr and present it
 with two fields to index, one with the original text, and one with the
 cleaned up text.  The objective of the hit highlighting is to give the
 user a *sense* of the original context, even if it's not provided
 verbatim from the original document.  Thanks in advance.
 
 TerryG



Re: Improving Readability of Hit Highlighting

2009-01-12 Thread Terence Gannon
To answer your questions specifically, here is an example of the raw OCR output;

CONTRACTORINMPRIMENTAYIVE : mom Ale ACCEPT INFORMATIONON TOUR SHEET TO ea

to which I would like to see;

mom ale access tour sheet to

in the hit highlight.  My schema for this field is pretty much
standard, as follows;

tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ...
filter class=solr.WordDelimiterFilterFactory ...
filter class=solr.LowerCaseFilterFactory ...
filter class=solr.EnglishPorterFilterFactory ...
filter class=solr.RemoveDuplicatesTokenFilterFactory ...

When I examine the effect of each of these with the Analyzer, it seems
like if I could use the output after LowerCaseFilterFactory in the hit
highlight, I would come close to achieving what I want.

I'm not averse to doing the text cleanup external to Solr before the
indexing, but only if it's *not* redundant to what the filter
factories are going to do anyway.  Thanks for your help!

TerryG


Re: Single index - multiple SOLR instances

2009-01-12 Thread Otis Gospodnetic
OK.  Of course, you'll have to make sure everything on the SAN is redundant 
(down to controllers, power supplies, etc.) and that the disks can handle that 
high query load/IO.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: ashokc ash...@qualcomm.com
 To: solr-user@lucene.apache.org
 Sent: Monday, January 12, 2009 3:37:41 PM
 Subject: Re: Single index - multiple SOLR instances
 
 
 Thanks, Otis. That is great, as I plan to place the index on NAS and make it
 writable to a single solr instance (write load is not heavy) and readable by
 many solr instances to handle fail-over and also share the query load (query
 load can be high)
 
 - ashok
 
 Otis Gospodnetic wrote:
  
  Ashok,
  
  You can put your index on any kind of shared storage - SAN, NAS, NFS (this
  one is not recommended).  That will let you point all your Solr instances
  to a single copy of your index.  Of course, you will want to test
  performance to ensure the network is not slowing things down too much, if
  there is network in the picture.
  
  
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
  
  
  
  - Original Message 
  From: ashokc 
  To: solr-user@lucene.apache.org
  Sent: Monday, January 12, 2009 3:05:40 PM
  Subject: Single index - multiple SOLR instances
  
  
  Hello,
  
  Is it possible to have the index created by a single SOLR instance, but
  have
  several SOLR instances field the search queries. Or do I HAVE to
  replicate
  the index for each SOLR instance that I want to answer queries? I need to
  set up a fail-over instance. Thanks
  
  - ashok
  -- 
  View this message in context: 
  
 http://www.nabble.com/Single-index---multiple-SOLR-instances-tp21422543p21422543.html
  Sent from the Solr - User mailing list archive at Nabble.com.
  
  
  
 
 -- 
 View this message in context: 
 http://www.nabble.com/Single-index---multiple-SOLR-instances-tp21422543p21423138.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Highlighting Trouble With Bigram Shingle Index

2009-01-12 Thread Chris Harris
I'm running into some highlighting issues that appear to arise only
when I'm using a bigram shingle (ShingleFilterFactory) analyzer.

I started with a bigram-free situation along these lines:

   field name=body type=noshingleText indexed=false stored=false /
   !-- Stored text for use with highlighting: --
   field name=kwic type=noshingleText indexed=false
stored=true compressed=true multiValued=false /
   copyField source=body dest=kwic maxLength=10 /

fieldType name=noshingleText class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

For performance reasons, though, I wanted to turn on bigram shingle
indexing on the body field. (For more information see
http://www.nabble.com/Using-Shingles-to-Increase-Phrase-Search-Performance-td19015758.html#a19015758)
In particular, I wanted to use this field type:

fieldType name=shingleText class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ShingleFilterFactory outputUnigrams=true /
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ShingleFilterFactory outputUnigrams=false
outputUnigramIfNoNgram=true /
  /analyzer
/fieldType

(Regarding outputUnigramsIfNoNgram parameter, see
http://issues.apache.org/jira/browse/SOLR-744.)

I wasn't sure if I should want to define my kwic field (the one I use
for highlighting) as type shingleText, to match the body field, or
type noshingleText. So I tried both. Neither work quite as desired.

[kwic as type shingleText]

If I have both body and kwic as type shingleText, then highlighting
more or less works, but there are some anomolies. The main thing is
that it really likes to pick fragments where the highlighted term
(e.g. car) is the last term in the fragment:

... la la la la la emcar/em ...
... foo foo foo foo foo emcar/em ...

This should obviously happen some of the time, but this is happening
with like 95% of my fragments, which is statistically unexpected. And
unfortunate. And it doesn't happen if I turn of shingling.

Another issue is that, if there are two instances of a highlighted
term within a given fragment, it will often highlight not just those
instances, but all the terms in between, like this:

... boo boo bar emcar la la la car/em bar bar bar ...

This too doesn't seem to happen if I disable bigram indexing.

I haven't figured out why this is the case. One potential issue is that
the TokenGroup abstraction doesn't necessarily make sense if you have
a token stream of alternating unigrams and bigrams like this:

  the, the cat, cat, cat went, went, went for, for, ...

Even if you could have a TokenGroup abstraction that makes sense, the current
implementation of TokenGroup.isDistinct looks like this:

  return token.startOffset()=endOffset

and it turns false most of the time in this case. (I can give some
explanation of why, but maybe I'll save that for later.)

I'm not sure if the highlighter can easily be made to accomodate
sequences of alternating unigrams and bigrams, or if highlighting
should really only be attempted on bigram-free token streams.

[kwic with type noshingleText]

If I set kwic to be of type noshingleText, then the above symptoms go
away. Some things are not quite right, though. The particular symptom
now is that if I do a quoted query like

  big dog

then the correct results get returned, but no preview fragments are returned.

The underlying reason this happens is that an inappropriate Query
object is being passed
to the constructor for QueryScorer. The query that gets passed is

  TermQuery:big dog

That is the Query that should be used for *searching* on my bigram body
field, but it's *not* the Query that should be used for *highlighting*; the
Query that should be used for highlighting is something like

  PhraseQuery:big dog~0

What apparently is going on is that the highlighter is using the Query
object generated by the the *search* component to do highlighting.
One possibility is that the highlighter should
instead create a separate Query object for each hl.fl parameter; each
one would use the analyzer particular to the given *highlighting* field,
rather than the one for the default search field. There might be reasons why
that would be crazy, though.

Sorry this post is a little half-baked, but I'd really 

Summing the results in a collapse

2009-01-12 Thread John Martyniak
I have been using the Collapse extension, and have it working pretty  
good.


However I would like to find out if there is a way to show the  
collapsed results, and then sum up a field of one of the remaining  
results.  For example


I display Result 1, (There 20 results, totalling $50.00).  Where the  
20 would be the number of items returned from the collapse, and the  
$50.00 would be the sum fee field in the 20 collapsed results.


Any help would be greatly appreciated.

Thank you,

-John




Multiple result fields in a collapse or subquery

2009-01-12 Thread John Martyniak
Is there anyway to have multiple collapse.field directives in the  
search string?


What I am trying to accomplish is the following

Result 1 (20 results)
EU (5 results)
USD (15 results)

Result 2 (10 results)
EU (5 results)
USD (5 results)

I thought that this could be done with faceting but with faceting you  
get the sum total for each keyword.  So for the above I get:

EU (10 results)
USD (20 results)

Which works well guiding a search, in to deeper more meaningful results.

However I would like have additional data that is tailored to each  
result row.


Any help would be greatly appreciated.

Thank you,

-John



Re: Improving Readability of Hit Highlighting

2009-01-12 Thread Otis Gospodnetic
Hi,

Quick note: please include copy of previous email when replying, so people can 
be reminded of the context.

You mentioned junk getting highlighted.  In your case is 
CONTRACTORINMPRIMENTAYIVE getting highlighted?  And that is junk?If so, why 
not augment your indexing to throw out junk tokens if you have some rules for 
what constitutes junk tokens? (e.g. token not in dictionary)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Terence Gannon butzi0...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Monday, January 12, 2009 4:07:57 PM
 Subject: Re: Improving Readability of Hit Highlighting
 
 To answer your questions specifically, here is an example of the raw OCR 
 output;
 
 CONTRACTORINMPRIMENTAYIVE : mom Ale ACCEPT INFORMATIONON TOUR SHEET TO ea
 
 to which I would like to see;
 
 mom ale access tour sheet to
 
 in the hit highlight.  My schema for this field is pretty much
 standard, as follows;
 
 
 
 
 
 
 
 
 When I examine the effect of each of these with the Analyzer, it seems
 like if I could use the output after LowerCaseFilterFactory in the hit
 highlight, I would come close to achieving what I want.
 
 I'm not averse to doing the text cleanup external to Solr before the
 indexing, but only if it's *not* redundant to what the filter
 factories are going to do anyway.  Thanks for your help!
 
 TerryG



Re: Restricting results based on user authentication

2009-01-12 Thread Manupriya

Thanks Chris,

I agree with your approach. I also dont want to add anything at the
application level. I want authentication to be handled internally at the
Solr level itself. 

Can you please explain me little more about how to add a role field to
each object at indexing time? Is there any resource/example available
explaining this?

Thank,
Manu


ryguasu wrote:
 
 Hi Manu,
 
 I haven't made a custom request handler in a while, but I want to
 clarify that, if you trust your application code, you don't actually
 need a custom request handler to do this sort of authentication
 filtering. At indexing time, you can add a role field to each object
 that you index, as described in the thread. At query time, you could
 simply have your application code add an appropriate filter query to
 each Solr request. So, if you're using the standard XML query
 interface, instead of sending URLs like
 
   http://.../solr/select?q=foo...
 
 you can have your application code send URLs like
 
   http://.../solr/select?q=foofq=role:admin...
 
 If I understand the custom request handler approach, then it basically
 amounts to the same thing as the above; the only difference is that
 the filter query gets added internally by Solr, rather than at the
 application level.
 
 Sorry if you already understand all this; I'm throwing these comments
 out just in case.
 
 Cheers,
 Chris
 
 On Mon, Jan 12, 2009 at 1:54 AM, Manupriya manupriya.si...@gmail.com
 wrote:

 Hi,

 I am using DIH feature of Solr for indexing a database. I am using Solr
 server and it is independent of my web application. I send a http request
 for searching and then process the returned result.

 Now we have a requirement that we have to filter the results further
 based
 on security level restrictions?  For example, user id abc should not be
 allowed to see a particular result.  How could we achieve that?

 I
 followed,http://www.nabble.com/Restricted-views-of-an-index-td15088750.html#a15090791
 It suggests something like -
 Add a role or access class to each indexed item, then use that in the
 queries, probably in a filter specified in a request handler. That keeps
 the definition of the filter within Solr.
 For example, you can create a request handler named admin, a field
 named
 role, and add a filter of role:admin. 

 I could not follow this solution. Is there any example or resource that
 explains how to use custom request handler with filtering?

 Thanks,
 Manu



 --
 View this message in context:
 http://www.nabble.com/Restricting-results-based-on-user-authentication-tp21411449p21411449.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/Restricting-results-based-on-user-authentication-tp21411449p21429723.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Restricting results based on user authentication

2009-01-12 Thread Chris Harris
On Mon, Jan 12, 2009 at 9:31 PM, Manupriya manupriya.si...@gmail.com wrote:

 Thanks Chris,

 I agree with your approach. I also dont want to add anything at the
 application level. I want authentication to be handled internally at the
 Solr level itself.

The application layer needs to be involved somehow, right, because I
assume the application level is the code that knows what the current
user id is. I'm not clear exactly what you want to keep out of the
application level.

In any case, if you don't like the idea of the application layer
adding a filter query, I think I'll defer to people with more
expertise on what your options are.

 Can you please explain me little more about how to add a role field to
 each object at indexing time? Is there any resource/example available
 explaining this?

You mentioned you're using the DataImportHandler. If your data source
is a single SQL table, the easiest approach might be to add a role
column to that table, and populate it appropriately for each object.
(How to do this of course depends on your application.) If your data
import code joins multiple tables, you'd need to think about which
table would be most appropriate for storing the role data.

Or perhaps your select statement could fill out a role based on
testing values of other fields; in SQL Server anyway you can write
something that looks more or less like this (the real syntax is
slightly different):

SELECT OrderID, Date, Company, CASE Company = 'CIA' THEN 'admin' ELSE
'user' END CASE as Role

(The idea here is to require admin access to view orders from the CIA.)


 Thank,
 Manu


 ryguasu wrote:

 Hi Manu,

 I haven't made a custom request handler in a while, but I want to
 clarify that, if you trust your application code, you don't actually
 need a custom request handler to do this sort of authentication
 filtering. At indexing time, you can add a role field to each object
 that you index, as described in the thread. At query time, you could
 simply have your application code add an appropriate filter query to
 each Solr request. So, if you're using the standard XML query
 interface, instead of sending URLs like

   http://.../solr/select?q=foo...

 you can have your application code send URLs like

   http://.../solr/select?q=foofq=role:admin...

 If I understand the custom request handler approach, then it basically
 amounts to the same thing as the above; the only difference is that
 the filter query gets added internally by Solr, rather than at the
 application level.

 Sorry if you already understand all this; I'm throwing these comments
 out just in case.

 Cheers,
 Chris

 On Mon, Jan 12, 2009 at 1:54 AM, Manupriya manupriya.si...@gmail.com
 wrote:

 Hi,

 I am using DIH feature of Solr for indexing a database. I am using Solr
 server and it is independent of my web application. I send a http request
 for searching and then process the returned result.

 Now we have a requirement that we have to filter the results further
 based
 on security level restrictions?  For example, user id abc should not be
 allowed to see a particular result.  How could we achieve that?

 I
 followed,http://www.nabble.com/Restricted-views-of-an-index-td15088750.html#a15090791
 It suggests something like -
 Add a role or access class to each indexed item, then use that in the
 queries, probably in a filter specified in a request handler. That keeps
 the definition of the filter within Solr.
 For example, you can create a request handler named admin, a field
 named
 role, and add a filter of role:admin. 

 I could not follow this solution. Is there any example or resource that
 explains how to use custom request handler with filtering?

 Thanks,
 Manu