Re: Solr 3.3 Sorting is not working for long fields

2011-11-14 Thread rajini maski
Field type is long and not multi valued.
Using solr 3.3 war file ,
Tried on solr 1.4.1 index and solr 3.3 index , both cases its not working.

query :
http://localhost:8091/Group/select?/indent=onq=studyid:120sort=studyidasc,groupid
asc,subjectid ascstart=0rows=10

all the ID fields are long

Thanks  Regards
Rajani


On Sun, Nov 13, 2011 at 7:58 AM, Erick Erickson erickerick...@gmail.comwrote:

 Well, 3.3 has been around for quite a while, I'd suspect that
 something this fundamental would have been found...

 Is your field multi-valued? And what kind of field is
 studyid?

 You really have to provide more details, input, output, etc
 to get reasonable help. It might help to review:

 http://wiki.apache.org/solr/UsingMailingLists

 Best
 Erick

 On Fri, Nov 11, 2011 at 5:52 AM, rajini maski rajinima...@gmail.com
 wrote:
  Hi,
 
  I have upgraded my Solr from 1.4.1 to 3.3.Now I tried to sort
  on a long field and documents are not getting sorted based on that.
 
  Sort is working when we do sorting on facet ex:facet=on
 facet.sort=studyid
 
  But when do simple sort on documents , sort=studyid,  sort doesn't
 happen.
  Is there any bug ?
 
 
 
  Regards,
  Rajani
 



Dismax, pf and qf

2011-11-14 Thread Andrea Gazzarini
Hi all,
In my dismax request handler I'm usually using both qf and pf
parameters in order to do phrse and query search with different
boosting.

Now there are some scenario when I want just the pf active (without
qf). Othen then surrounding my query with double quotes, is there
another way to do that? I mean, i would like to do the following

_query:{!dismax pf=author^100}vincent kwner

And that would fire a phrase search, not also

vincent OR knwer

By completelty ignoring the qf settings. I saw that if i omit the qf
parameter SOLR uses the default field and subsequently it returns no
result, even if the pf query is matching a record.

Regards,
Andrea


Re: Solr 3.3 Sorting is not working for long fields

2011-11-14 Thread Michael Kuhlmann

Am 14.11.2011 09:33, schrieb rajini maski:

query :
http://localhost:8091/Group/select?/indent=onq=studyid:120sort=studyidasc,groupid
asc,subjectid ascstart=0rows=10


Is it a copy-and-paste error, or did you realls sort on studyidasc?

I don't think you have a field studyidasc, and Solr should've given an 
exception that either asc or desc is missing.


-Kuli


Re: getting solr to expand Acronym

2011-11-14 Thread Tiernan OToole
thanks for the replies... the problem with Synonyms is that they would need
to be tracked... there could be new words entered that will need to be
added to the list on a regular basis...

@Otis: As for the option of a custom TokenFilter, how would that work? i
have not coded anything into Solr or any custom TokenFilters my self... I
am sure theres documentation on this, but how would you think this should
work?

Thanks.

--Tiernan


On Fri, Nov 11, 2011 at 9:01 PM, Brandon Ramirez 
brandon_rami...@elementk.com wrote:

 Could this be simulated through synonyms?  Could you define CD as a
 synonym of Compact Disc or vice versa?  I'm not sure if that would work,
 just brainstorming here...


 Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848
 Software Engineer II | Element K | www.elementk.com


 -Original Message-
 From: Tiernan OToole [mailto:lsmart...@gmail.com]
 Sent: Friday, November 11, 2011 5:10 AM
 To: solr-user@lucene.apache.org
 Subject: getting solr to expand Acronym

 Dont know if this is posible, but  i need to ask anyway... Say we have a
 list of Acronyms in a database (CD, DVD, CPU) and also a list of their not
 so short names (Compact Disk, Digital Versitile Disk, Central Processing
 Unit) but they are not linked in any particular way (lost of items, some
 with full names, some using anronyms), is it posible for Solr to figure out
 CD is an Acronym of Compact Disk? I know CD could also mean Central Data,
 or anything that beings with C and D, but is there a way to tell solr to
 look for items that not only match CD, but have words next to each other
 that begin with C and D... Another example i can think of is IBM: It could
 be International Business Machines, or Irish Business Machines, or Irish
 Banking Machines...

 So, would that be posible?

 --
 Tiernan O'Toole
 blog.lotas-smartman.net
 www.geekphotographer.com
 www.tiernanotoole.ie




-- 
Tiernan O'Toole
blog.lotas-smartman.net
www.geekphotographer.com
www.tiernanotoole.ie


Re: delta-import of rich documents like word and pdf files!

2011-11-14 Thread Ahmet Arslan
 Thanks for your reply Mr. Erick
 All I want to do is that I have indexed some of my pdf
 files and doc files.
 Now, any changes I make to them, I want a
 delta-import(incremental) so that
 I do not have to re index whole document by full import .
 Only changes made
 to these documents should get updated. I am using
 dataimporthandler. I
 have seen in forums but all of them have queried for delta
 import related to
 databases. I am just indexing some of my doc and pdf files
 for now.
 What should I do in order to achieve that?

Can you provide your data-config.xml? 


Re: Delete by Query with limited number of rows

2011-11-14 Thread mikr00
Hi Erick, hi Yury,

thanks to your input I found a perfect solution for my case. Even though
this is not a solr-only solution, I will just briefly describe how it works
since it might be of interest to others:

I have put up a mysql database holding two tables. The first only has a
primarykey with auto-increment and nothing else. The second has a primarykey
but without auto-increment and also fields for the content I store in solr. 

Now, before I add something to the solr core, I add an entry to the first
mysql database. After the insertion, I get the primarykey for the action. I
check, whether it is above my limit of documents. If so, I empty the first
mysql table and reset the auto-increment to zero. I than insert a mysql
entry to the second table using the primarykey taken from the first table
(if the primarykey exists, I do not add an entry but update the existing
one). And finally I have a solr core which holds my searchable data and has
a uniquekey field. Into this core I add a new document by using the
primarykey from the first mysql table for the uniquekey field.

The solution has two main benefits for me:

- I can precisely control the number of documents in my solr core.
- I do now also have a backup of my data in mysql

Thank you very much for your help!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Delete-by-Query-with-limited-number-of-rows-tp3503094p3506380.html
Sent from the Solr - User mailing list archive at Nabble.com.


Counting in facet results

2011-11-14 Thread LT.thomas
Hi,

By counting in facet results I mean resolve the problem:

I have 7 documents:

A1   B1   C1
A2   B1   C1
A3   B2   C1
A4   B2   C2
A5   B3   C2
A6   B3   C2
A7   B3   C2

If I make the facet query by field B, get the result: B1=2, B2=2, B3=3.
A1   B1   C1
A2   B1   C1 2 - facing by B
--===
A3   B2   C1
A4   B2   C2 2 - facing by B
--===
A5   B3   C2
A6   B3   C2
A7   B3   C2 3 - facing by B

I wont to get additional information, something like count in results, by
field C. So, how can I query to get a result similar to the following:
A1   B1   C1
A2   B1   C1 2, 1 - facing by B, count C in facet results
--=
A3   B2   C1
A4   B2   C2 2, 2 - facing by B, count C in facet results
--=
A5   B3   C2
A6   B3   C2
A7   B3   C2 2, 1 - facing by B, count C in facet results


Thanks 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Counting-in-facet-results-tp3506382p3506382.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: delta-import of rich documents like word and pdf files!

2011-11-14 Thread neuron005
Thanks for your reply...my data-config.xml is
dataConfig
  dataSource type=BinFileDataSource name=bin/
 document
entity name=f pk=id processor=FileListEntityProcessor
recursive=true 
rootEntity=false 
 dataSource=null  baseDir=/var/data/solr 
fileName=.*\.(DOC)|(PDF)|(XML)|(xml)|(JPEG)|(jpg)|(ZIP)|(zip)|(pdf)|(doc)
onError=skip 



entity name=tika-test processor=TikaEntityProcessor 
url=${f.fileAbsolutePath} format=text dataSource=bin onError=skip 
field column=Author name=author meta=true/ 
field column=title name=title meta=true/ 
field column=text name=text/ 
field column=id name=id/
/entity 
field column=file name=fileName/
field column=fileAbsolutePath name=links/
/entity
/document
/dataConfig

--
View this message in context: 
http://lucene.472066.n3.nabble.com/delta-import-of-rich-documents-like-word-and-pdf-files-tp3502039p3506404.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Counting in facet results

2011-11-14 Thread Samuel García Martínez
Hi, i think what you are looking for is *nested facets* or *
HierarchicalFaceting http://wiki.apache.org/solr/HierarchicalFaceting*
*
*
Category A - Subcategory A1
Category A - Subcategory A1
Category B - Subcategory A1
Category B - Subcategory B2
Category A - Subcategory A2

Faceting by Category:
 A: 3
 B: 2

In addition, pivoting this query:
Cat: A=3
  SubCat: A1=2 and A2=1
Cat: B=2
  SubCat: A1=1 and B2=1

This makes sense?

On Mon, Nov 14, 2011 at 11:02 AM, LT.thomas t.latu...@itspree.pl wrote:

 Hi,

 By counting in facet results I mean resolve the problem:

 I have 7 documents:

 A1   B1   C1
 A2   B1   C1
 A3   B2   C1
 A4   B2   C2
 A5   B3   C2
 A6   B3   C2
 A7   B3   C2

 If I make the facet query by field B, get the result: B1=2, B2=2, B3=3.
 A1   B1   C1
 A2   B1   C1 2 - facing by B
 --===
 A3   B2   C1
 A4   B2   C2 2 - facing by B
 --===
 A5   B3   C2
 A6   B3   C2
 A7   B3   C2 3 - facing by B

 I wont to get additional information, something like count in results, by
 field C. So, how can I query to get a result similar to the following:
 A1   B1   C1
 A2   B1   C1 2, 1 - facing by B, count C in facet results
 --=
 A3   B2   C1
 A4   B2   C2 2, 2 - facing by B, count C in facet results
 --=
 A5   B3   C2
 A6   B3   C2
 A7   B3   C2 2, 1 - facing by B, count C in facet results


 Thanks

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Counting-in-facet-results-tp3506382p3506382.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Un saludo,
Samuel García.


Re: delta-import of rich documents like word and pdf files!

2011-11-14 Thread Ahmet Arslan

 Thanks for your reply...my
 data-config.xml is
 dataConfig
         dataSource
 type=BinFileDataSource name=bin/
 document
             entity
 name=f pk=id processor=FileListEntityProcessor
 recursive=true 
 rootEntity=false 
  dataSource=null  baseDir=/var/data/solr 
 fileName=.*\.(DOC)|(PDF)|(XML)|(xml)|(JPEG)|(jpg)|(ZIP)|(zip)|(pdf)|(doc)
 onError=skip 
 
 
 
             entity
 name=tika-test processor=TikaEntityProcessor 
 url=${f.fileAbsolutePath} format=text dataSource=bin
 onError=skip 
                
 field column=Author name=author meta=true/ 
                
 field column=title name=title meta=true/ 
                
 field column=text name=text/ 
     field column=id name=id/
 /entity 
      field column=file
 name=fileName/
 field column=fileAbsolutePath name=links/
         /entity
         /document
 /dataConfig

According to wiki : the only EntityProcessor which supports delta is 
SqlEntityProcessor.

May be you can use newerThan parameter of FileListEntityProcessor. Issuing a 
full-import with clean=false may mimic delta import. 

You can pass value of this newerThan parameter in your request.

command=full-importclean=falsemyLastModifiedParam=NOW-3DAYS

http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters




Re: TikaEntityProcessor not working?

2011-11-14 Thread kumar8anuj
Earlier issue has been resolved but stuck up on something else. Can you tell
me which poi jar version would work with tika.0.6. Currently I have 
poi-3.7.jar. Error which i am getting is this 

SEVERE: Exception while processing: js_logins document :
SolrInputDocument[{id=id(1.0)={100984},
complete_mobile_number=complete_mobile_number(1.0)={+91 9600067575},
emailid=emailid(1.0)={vkry...@gmail.com}, full_name=full_name(1.0)={Venkat
Ryali}}]:org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NoSuchMethodError:
org.apache.poi.xwpf.usermodel.XWPFParagraph.init(Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/CTP;Lorg/apache/poi/xwpf/usermodel/XWPFDocument;)V
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
Caused by: java.lang.NoSuchMethodError:
org.apache.poi.xwpf.usermodel.XWPFParagraph.init(Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/CTP;Lorg/apache/poi/xwpf/usermodel/XWPFDocument;)V
at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator$MyXWPFParagraph.init(XWPFWordExtractorDecorator.java:163)
at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator$MyXWPFParagraph.init(XWPFWordExtractorDecorator.java:161)
at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractTableContent(XWPFWordExtractorDecorator.java:140)
at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.buildXHTML(XWPFWordExtractorDecorator.java:91)
at
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:69)
at
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:51)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:128)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596)
... 7 more


--
View this message in context: 
http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3506596.html
Sent from the Solr - User mailing list archive at Nabble.com.


TREC-style IR experiments

2011-11-14 Thread Ismo Raitanen
Hi,

I'm planning to do some information retrieval experiments with Solr.
I'd like to compare different IR methods. I have a test collection
with topics and judgements available. I'm considering using Solr (and
not Lemur/Indri etc.) for the tests, because Solr supports several
nice methods out-of-the-box, e.g. n-grams.

Finally, I plan to evaluate the different methods and their results
with trec_eval or similar program. What I need is a program, which
puts Solr results in a suitable format for trec_eval. I think I can
get the Solr search results in that format quite easily by using the
solr-php-client library.

Have any of you run TREC-style IR experiments with Solr and what are
your experiences with that? Do you have any suggestion for that kind
of tests with Solr?

Kind regards,
Ismo


Re: Using solr during optimization

2011-11-14 Thread Isan Fulia
Hi Mark,

In the above case , what if  the index is optimized partly ie. by
specifying the max no of segments we want.
It has been observed that after optimizing(even partly optimization), the
indexing as well as searching had been faster than in case of an
unoptimized one.
Decreasing the merge factor will affect  the performance as it will
increase the indexing time due to the frequent merges.
So is it good that we optimize partly(let say once in a month), rather than
decreasing the merge factor and affect  the indexing speed.Also since we
will be sharding, that 100 GB index will be divided in different shards.

Thanks,
Isan Fulia.



On 14 November 2011 11:28, Kalika Mishra kalika.mis...@germinait.comwrote:

 Hi Mark,

 Thanks for your reply.

 What you saying is interesting; so are you suggesting that optimizations
 should be done usually when there not many updates. Also can you please
 point out further under what conditions optimizations might be beneficial.

 Thanks.

 On 11 November 2011 20:30, Mark Miller markrmil...@gmail.com wrote:

  I would not optimize - it's very expensive. With 11,000 updates a day, I
  think it makes sense to completely avoid optimizing.
 
  That should be your default move in any case. If you notice performance
  suffers more than is acceptable (good chance you won't), then I'd use a
  lower merge factor. It defaults to 10 - lower numbers will lower the
 number
  of segments in your index, and essentially amortize the cost of an
 optimize.
 
  Optimize is generally only useful when you will have a mostly static
 index.
 
  - Mark Miller
  lucidimagination.com
 
 
  On Nov 11, 2011, at 9:12 AM, Kalika Mishra wrote:
 
   Hi Mark,
  
   We are performing almost 11,000 updates a day, we have around 50
 million
   docs in the index (i understand we will need to shard) the core seg
 will
   get fragmented over a period of time. We will need to do optimize every
  few
   days or once in a month; do you have any reason not to optimize the
 core.
   Please let me know.
  
   Thanks.
  
   On 11 November 2011 18:51, Mark Miller markrmil...@gmail.com wrote:
  
   Do a you have something forcing you to optimize, or are you just doing
  it
   for the heck of it?
  
   On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote:
  
   Hi,
  
   I would like to optimize solr core which is in Reader Writer mode.
  Since
   the Solr cores are huge in size (above 100 GB) the optimization takes
   hours
   to complete.
  
   When the optimization is going on say. on the Writer core, the
   application
   wants to continue using the indexes for both query and write
 purposes.
   What
   is the best approach to do this.
  
   I was thinking of using a temporary index (empty core) to write the
   documents and use the same Reader to read the documents. (Please note
   that
   temp index and the Reader cannot be made Reader Writer as Reader is
   already
   setup for the Writer on which optimization is taking place) But there
   could
   be some updates to the temp index which I would like to get reflected
  in
   the Reader. Whats the best setup to support this.
  
   Thanks,
   Kalika
  
   - Mark Miller
   lucidimagination.com
  
  
  
  
  
  
  
  
  
  
  
  
  
  
   --
   Thanks  Regards,
   Kalika
 
 
 
 
 
 
 
 
 
 
 
 
 


 --
 Thanks  Regards,
 Kalika




-- 
Thanks  Regards,
Isan Fulia.


Re: Solr 3.3 Sorting is not working for long fields

2011-11-14 Thread rajini maski
There is no error as such.

When I do a basic sort on *long *field. the sort doesn't happen.


Query is :

-http://blr-ws-195:8091/Solr3.3/select/?q=2%3A104+AND+526%3A27747version=2.2start=0rows=10indent=onsort=469%20ascfl=469#
lst name=*responseHeader*
  int name=*status*0/int
  int name=*QTime*3/int
 
-http://blr-ws-195:8091/Solr3.3/select/?q=2%3A104+AND+526%3A27747version=2.2start=0rows=10indent=onsort=469%20ascfl=469#
lst name=*params*
  str name=*fl*studyid/str
  str name=*sort*studyid asc/str
  str name=*indent*on/str
  str name=*start*0/str
  str name=*q**:*/str
  str name=*rows*100/str
  str name=*version*2.2/str
 /lst
 /lst




response
- result name=response numFound=216 start=0
- doc
  long name=studyid53/long 
  /doc
- doc
  long name=studyid18/int
  /doc
- doc
  long name=studyid14/long 
  /doc
- doc
  int name=studyid11/long 
  /doc
- doc
  long name=studyid7/long 
  /doc
- doc
  int name=studyid63/int
  /doc
- doc
  int name=studyid35/long 
  /doc
- doc
  int name=studyid70/long 
  /doc
- doc
  long name=studyid91/long 
  /doc
- doc
  int name=studyid97/int
  /doc
  /result
  /response


The same case works with Solr1.4.1 but it is not working solr 3.3


Regards,
Rajani

On Mon, Nov 14, 2011 at 2:23 PM, Michael Kuhlmann k...@solarier.de wrote:

 Am 14.11.2011 09:33, schrieb rajini maski:

 query :
 http://localhost:8091/Group/**select?/indent=onq=studyid:**
 120sort=studyidasc,groupidhttp://localhost:8091/Group/select?/indent=onq=studyid:120sort=studyidasc,groupid
 asc,subjectid ascstart=0rows=10


 Is it a copy-and-paste error, or did you realls sort on studyidasc?

 I don't think you have a field studyidasc, and Solr should've given an
 exception that either asc or desc is missing.

 -Kuli



Re: Counting in facet results

2011-11-14 Thread LT.thomas
I use Solandra that integrates Solr 3.4 with Cassandra. So, is there any way
to solve this problem with Solr 3.4 (without pivots)?

Your results are:
Cat: A=3
  SubCat: A1=2 and A2=1
Cat: B=2
  SubCat: A1=1 and B2=1

but I would like to have:
Cat: A=3
  SubCat: 2 (losing information about the numbers within A1 and A2, only
distinct count of subcategories)
Cat: B=2
  SubCat: 2 (losing information about the numbers within A1 and B2, only
distinct count of subcategories)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Counting-in-facet-results-tp3506382p3506848.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: TREC-style IR experiments

2011-11-14 Thread Ahmet Arslan
 I'm planning to do some information retrieval experiments
 with Solr.
 I'd like to compare different IR methods. I have a test
 collection
 with topics and judgements available. I'm considering using
 Solr (and
 not Lemur/Indri etc.) for the tests, because Solr supports
 several
 nice methods out-of-the-box, e.g. n-grams.
 
 Finally, I plan to evaluate the different methods and their
 results
 with trec_eval or similar program. What I need is a
 program, which
 puts Solr results in a suitable format for trec_eval. I
 think I can
 get the Solr search results in that format quite easily by
 using the
 solr-php-client library.
 
 Have any of you run TREC-style IR experiments with Solr
 and what are
 your experiences with that? Do you have any suggestion for
 that kind
 of tests with Solr?

There some existing implementations in Lucene

http://lucene.apache.org/java/3_0_2/api/contrib-benchmark/org/apache/lucene/benchmark/quality/trec/package-summary.html



Casesensitive search problem

2011-11-14 Thread jayanta sahoo
Hi,
Whenever I am searching with the words OfficeJet or officejet or
Officejet or oFiiIcejET. I am getting the different results for each
search respectively. I am not able to understand why this is happening?
   I want to solve this problem such a way that search will become case
insensitive and I will get same result for any combination of capital and
small letters.

-- 
Jayanta Sahoo


Re: Solr 3.3 Sorting is not working for long fields

2011-11-14 Thread Ahmet Arslan
 When I do a basic sort on *long *field. the sort doesn't
 happen.
 
 
 Query is :
 
 -http://blr-ws-195:8091/Solr3.3/select/?q=2%3A104+AND+526%3A27747version=2.2start=0rows=10indent=onsort=469%20ascfl=469#
 lst name=*responseHeader*
   int name=*status*0/int
   int name=*QTime*3/int
  
 -http://blr-ws-195:8091/Solr3.3/select/?q=2%3A104+AND+526%3A27747version=2.2start=0rows=10indent=onsort=469%20ascfl=469#
 lst name=*params*
   str name=*fl*studyid/str
   str name=*sort*studyid asc/str
   str name=*indent*on/str
   str name=*start*0/str
   str name=*q**:*/str
   str name=*rows*100/str
   str name=*version*2.2/str
  /lst
  /lst
 
 
 
 
 response
 - result name=response numFound=216 start=0
 - doc
   long name=studyid53/long 
   /doc
 - doc
   long name=studyid18/int
   /doc
 - doc
   long name=studyid14/long 
   /doc
 - doc
   int name=studyid11/long 
   /doc
 - doc
   long name=studyid7/long 
   /doc
 - doc
   int name=studyid63/int
   /doc
 - doc
   int name=studyid35/long 
   /doc
 - doc
   int name=studyid70/long 
   /doc
 - doc
   long name=studyid91/long 
   /doc
 - doc
   int name=studyid97/int
   /doc
   /result
   /response
 
 
 The same case works with Solr1.4.1 but it is not working
 solr 3.3

Can you try with the following type?

  fieldType name=tlong class=solr.TrieLongField precisionStep=8 
omitNorms=true positionIncrementGap=0/

And studyid must be marked as indexed=true.


Re: Casesensitive search problem

2011-11-14 Thread Parvin Gasimzade
Check this :
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LowerCaseFilterFactory

On Mon, Nov 14, 2011 at 3:24 PM, jayanta sahoo jsahoo1...@gmail.com wrote:

 Hi,
 Whenever I am searching with the words OfficeJet or officejet or
 Officejet or oFiiIcejET. I am getting the different results for each
 search respectively. I am not able to understand why this is happening?
   I want to solve this problem such a way that search will become case
 insensitive and I will get same result for any combination of capital and
 small letters.

 --
 Jayanta Sahoo



Re: Using solr during optimization

2011-11-14 Thread Mark Miller

On Nov 14, 2011, at 8:27 AM, Isan Fulia wrote:

 Hi Mark,
 
 In the above case , what if  the index is optimized partly ie. by
 specifying the max no of segments we want.
 It has been observed that after optimizing(even partly optimization), the
 indexing as well as searching had been faster than in case of an
 unoptimized one.

Yes, this remains true - searching against fewer segments is faster than 
searching against many segments. Unless you have a really high merge factor, 
this is just generally not a big deal IMO.

It tends to be something like, a given query is say 10-30% slower. If you have 
good performance though, this should often be something like a 50ms query goes 
to 80 or 90ms. You really have to decide/test if there is a practical 
difference to your users.

You should also pay attention to how long that perf improvement lasts while you 
are continuously adding more documents. Is it a super high cost for a short 
perf boost?

 Decreasing the merge factor will affect  the performance as it will
 increase the indexing time due to the frequent merges.

True - it will essentially amortize the cost of reducing segments. Have you 
tested lower merge factors though? Does it really slow down indexing to the 
point where you find it unacceptable? I've been surprised in the past. Usually 
you can find a pretty nice balance.

 So is it good that we optimize partly(let say once in a month), rather than
 decreasing the merge factor and affect  the indexing speed.Also since we
 will be sharding, that 100 GB index will be divided in different shards.

Partial optimize is a good option, and optimize is an option. They both exist 
for a reason ;) Many people pay the price because they assume they have to 
though, when they really have no practical need.

Generally, the best way to manage the number of segments in your index is 
through the merge policy IMO - not necessarily optimize calls.

I'm pretty sure optimize also blocks adds in previous version of Solr as well - 
it grabs the commit lock. It won't do that in Solr 4, but that is another 
reason I wouldn't recommend it under normal circumstances.

I look at optimize as a last option, or when creating a static index personally.

 
 Thanks,
 Isan Fulia.
 
 
 
 On 14 November 2011 11:28, Kalika Mishra kalika.mis...@germinait.comwrote:
 
 Hi Mark,
 
 Thanks for your reply.
 
 What you saying is interesting; so are you suggesting that optimizations
 should be done usually when there not many updates. Also can you please
 point out further under what conditions optimizations might be beneficial.
 
 Thanks.
 
 On 11 November 2011 20:30, Mark Miller markrmil...@gmail.com wrote:
 
 I would not optimize - it's very expensive. With 11,000 updates a day, I
 think it makes sense to completely avoid optimizing.
 
 That should be your default move in any case. If you notice performance
 suffers more than is acceptable (good chance you won't), then I'd use a
 lower merge factor. It defaults to 10 - lower numbers will lower the
 number
 of segments in your index, and essentially amortize the cost of an
 optimize.
 
 Optimize is generally only useful when you will have a mostly static
 index.
 
 - Mark Miller
 lucidimagination.com
 
 
 On Nov 11, 2011, at 9:12 AM, Kalika Mishra wrote:
 
 Hi Mark,
 
 We are performing almost 11,000 updates a day, we have around 50
 million
 docs in the index (i understand we will need to shard) the core seg
 will
 get fragmented over a period of time. We will need to do optimize every
 few
 days or once in a month; do you have any reason not to optimize the
 core.
 Please let me know.
 
 Thanks.
 
 On 11 November 2011 18:51, Mark Miller markrmil...@gmail.com wrote:
 
 Do a you have something forcing you to optimize, or are you just doing
 it
 for the heck of it?
 
 On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote:
 
 Hi,
 
 I would like to optimize solr core which is in Reader Writer mode.
 Since
 the Solr cores are huge in size (above 100 GB) the optimization takes
 hours
 to complete.
 
 When the optimization is going on say. on the Writer core, the
 application
 wants to continue using the indexes for both query and write
 purposes.
 What
 is the best approach to do this.
 
 I was thinking of using a temporary index (empty core) to write the
 documents and use the same Reader to read the documents. (Please note
 that
 temp index and the Reader cannot be made Reader Writer as Reader is
 already
 setup for the Writer on which optimization is taking place) But there
 could
 be some updates to the temp index which I would like to get reflected
 in
 the Reader. Whats the best setup to support this.
 
 Thanks,
 Kalika
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 --
 Thanks  Regards,
 Kalika
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 --
 Thanks  Regards,
 Kalika
 
 
 
 
 -- 
 Thanks  Regards,
 Isan Fulia.

- Mark Miller
lucidimagination.com













Re: Solr 3.3 Sorting is not working for long fields

2011-11-14 Thread rajini maski
I

On Mon, Nov 14, 2011 at 7:23 PM, Ahmet Arslan iori...@yahoo.com wrote:

  When I do a basic sort on *long *field. the sort doesn't
  happen.
 
 
  Query is :
 
  -
 http://blr-ws-195:8091/Solr3.3/select/?q=2%3A104+AND+526%3A27747version=2.2start=0rows=10indent=onsort=469%20ascfl=469#
 
  lst name=*responseHeader*
int name=*status*0/int
int name=*QTime*3/int
   -
 http://blr-ws-195:8091/Solr3.3/select/?q=2%3A104+AND+526%3A27747version=2.2start=0rows=10indent=onsort=469%20ascfl=469#
 
  lst name=*params*
str name=*fl*studyid/str
str name=*sort*studyid asc/str
str name=*indent*on/str
str name=*start*0/str
str name=*q**:*/str
str name=*rows*100/str
str name=*version*2.2/str
   /lst
   /lst
 
 
 
 
  response
  - result name=response numFound=216 start=0
  - doc
long name=studyid53/long 
/doc
  - doc
long name=studyid18/int
/doc
  - doc
long name=studyid14/long 
/doc
  - doc
int name=studyid11/long 
/doc
  - doc
long name=studyid7/long 
/doc
  - doc
int name=studyid63/int
/doc
  - doc
int name=studyid35/long 
/doc
  - doc
int name=studyid70/long 
/doc
  - doc
long name=studyid91/long 
/doc
  - doc
int name=studyid97/int
/doc
/result
/response
 
 
  The same case works with Solr1.4.1 but it is not working
  solr 3.3

 Can you try with the following type?

  fieldType name=tlong class=solr.TrieLongField precisionStep=8
 omitNorms=true positionIncrementGap=0/

 And studyid must be marked as indexed=true.



I tried this one.   fieldType name=tlong class=solr.TrieLongField
precisionStep=8 omitNorms=true positionIncrementGap=0/

It didn't work :(

Sort didn't happen


Re: Solr 3.3 Sorting is not working for long fields

2011-11-14 Thread Ahmet Arslan
 I tried this one.   fieldType
 name=tlong class=solr.TrieLongField
 precisionStep=8 omitNorms=true
 positionIncrementGap=0/
 
 It didn't work :(
 
 Sort didn't happen


Did you restart tomcat and perform re-index?


XSLT caching mechanism

2011-11-14 Thread vrpar...@gmail.com
Hello All,

i am using xslt to transform solr xml response, when made search;getting
below warning

WARNING [org.apache.solr.util.xslt.TransformerProvider] The
TransformerProvider's simplistic XSLT caching mechanism is not appropriate
for high load scenarios, unless a single XSLT transform is used and
xsltCacheLifetimeSeconds is set to a sufficiently high value.

how can i apply effective xslt caching for solr ?



Thanks,
Vishal Parekh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/XSLT-caching-mechanism-tp3506979p3506979.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: XSLT caching mechanism

2011-11-14 Thread Erik Hatcher
Set the cache lifetime high, like it says.

Questions - why use the XSLT response writer?  What are you transforming the 
response into and digesting it with?

Erik

On Nov 14, 2011, at 09:31 , vrpar...@gmail.com wrote:

 Hello All,
 
 i am using xslt to transform solr xml response, when made search;getting
 below warning
 
 WARNING [org.apache.solr.util.xslt.TransformerProvider] The
 TransformerProvider's simplistic XSLT caching mechanism is not appropriate
 for high load scenarios, unless a single XSLT transform is used and
 xsltCacheLifetimeSeconds is set to a sufficiently high value.
 
 how can i apply effective xslt caching for solr ?
 
 
 
 Thanks,
 Vishal Parekh
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/XSLT-caching-mechanism-tp3506979p3506979.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: delta-import of rich documents like word and pdf files!

2011-11-14 Thread Erick Erickson
And you cannot update-in-place. That is, you can't update
just selected fields in a document, you have to re-index the
whole document.

Best
Erick

On Mon, Nov 14, 2011 at 6:11 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Thanks for your reply...my
 data-config.xml is
 dataConfig
         dataSource
 type=BinFileDataSource name=bin/
 document
             entity
 name=f pk=id processor=FileListEntityProcessor
 recursive=true
 rootEntity=false
  dataSource=null  baseDir=/var/data/solr
 fileName=.*\.(DOC)|(PDF)|(XML)|(xml)|(JPEG)|(jpg)|(ZIP)|(zip)|(pdf)|(doc)
 onError=skip

 

             entity
 name=tika-test processor=TikaEntityProcessor
 url=${f.fileAbsolutePath} format=text dataSource=bin
 onError=skip

 field column=Author name=author meta=true/

 field column=title name=title meta=true/

 field column=text name=text/
     field column=id name=id/
 /entity
  field column=file
 name=fileName/
 field column=fileAbsolutePath name=links/
         /entity
         /document
 /dataConfig

 According to wiki : the only EntityProcessor which supports delta is 
 SqlEntityProcessor.

 May be you can use newerThan parameter of FileListEntityProcessor. Issuing a 
 full-import with clean=false may mimic delta import.

 You can pass value of this newerThan parameter in your request.

 command=full-importclean=falsemyLastModifiedParam=NOW-3DAYS

 http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters





Re: Solr 3.3 Sorting is not working for long fields

2011-11-14 Thread rajini maski
Yes .

On 11/14/11, Ahmet Arslan iori...@yahoo.com wrote:
 I tried this one.   fieldType
 name=tlong class=solr.TrieLongField
 precisionStep=8 omitNorms=true
 positionIncrementGap=0/

 It didn't work :(

 Sort didn't happen


 Did you restart tomcat and perform re-index?



Re: Solr 3.3 Sorting is not working for long fields

2011-11-14 Thread Ahmet Arslan
 Yes .


  Did you restart tomcat and perform re-index?
 
 

Okey, one thing left. Http caching may cause stale response. Delete your 
browsers cache if you are using a browser to query solr. 


Re: XSLT caching mechanism

2011-11-14 Thread Chantal Ackermann
In solrconfig.xml, change the xsltCacheLifetimeSeconds property of the
XSLTResponseWriter to the desired value (this example 6000secs):

queryResponseWriter name=xslt class=solr.XSLTResponseWriter
int name=xsltCacheLifetimeSeconds6000/int
/queryResponseWriter



On Mon, 2011-11-14 at 15:31 +0100, vrpar...@gmail.com wrote:
 Hello All,
 
 i am using xslt to transform solr xml response, when made search;getting
 below warning
 
 WARNING [org.apache.solr.util.xslt.TransformerProvider] The
 TransformerProvider's simplistic XSLT caching mechanism is not appropriate
 for high load scenarios, unless a single XSLT transform is used and
 xsltCacheLifetimeSeconds is set to a sufficiently high value.
 
 how can i apply effective xslt caching for solr ?
 
 
 
 Thanks,
 Vishal Parekh
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/XSLT-caching-mechanism-tp3506979p3506979.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Easy way to tell if there are pending documents

2011-11-14 Thread Latter, Antoine
Hi Solr,

Does anyone know of an easy way to tell if there are pending documents waiting 
for commit?

Our application performs operations that are never safe to perform while 
commits are pending. We make this work by making sure that all indexing 
operations end in a commit, and stop the unsafe operations from running while a 
commit is running.

This works great most of the time, except when we have enough disk space to add 
documents to the pending area, but not enough disk space to do a commit - then 
the indexing operations only error out after they've done all of their adds.

It would be nice if the unsafe operation could somehow detect that there are 
pending documents and abort.

In the interim I'll have the unsafe operation perform a commit when it starts, 
but I've been weeding out useless commits from my app recently and I don't like 
them creeping back in.

Thanks,
Antoine


get a total count

2011-11-14 Thread U Anonym
Hello everyone,

A newbie question:  how do I find out how documents have been indexed
across all shards?

Thanks much!


memory usage keep increase

2011-11-14 Thread Yongtao Liu
Hi all,

I saw one issue is ram usage keep increase when we run query.
After look in the code, looks like Lucene use MMapDirectory to map index file 
to ram.

According to 
http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/store/MMapDirectory.html
 comments, it will use lot of memory.
NOTE: memory mapping uses up a portion of the virtual memory address space in 
your process equal to the size of the file being mapped. Before using this 
class, be sure your have plenty of virtual address space, e.g. by using a 64 
bit JRE, or a 32 bit JRE with indexes that are guaranteed to fit within the 
address space.

So, my understanding is solr request physical RAM = index file size, is it 
right?

Yongtao


**Legal Disclaimer***
This communication may contain confidential and privileged
material for the sole use of the intended recipient. Any
unauthorized review, use or distribution by others is strictly
prohibited. If you have received the message in error, please
advise the sender by reply email and delete the message. Thank
you.
*

Re: TREC-style IR experiments

2011-11-14 Thread Ismo Raitanen
 I'm planning to do some information retrieval experiments with Solr.

 There some existing implementations in Lucene
 http://lucene.apache.org/java/3_0_2/api/contrib-benchmark/org/apache/lucene/benchmark/quality/trec/package-summary.html

Have you used that with Solr? How?

//Ismo


Help! - ContentStreamUpdateRequest

2011-11-14 Thread Tod

Could someone take a look at this page:

http://wiki.apache.org/solr/ContentStreamUpdateRequestExample

... and tell me what code changes I would need to make to be able to 
stream a LOT of files at once rather than just one?  It has to be 
something simple like a collection of some sort but I just can't get it 
figured out.  Maybe I'm using the wrong class altogether?



TIA


Re: Question about solr caches and warming

2011-11-14 Thread Chris Hostetter

: Although I don't have statistics to back my claim, I suspect that the really
: nasty filters don't have as high a hitcount as the ones that are more simple.
: Typically the really nasty filters are used when an employee logs into the
: site.  Employees have access to a lot more than customers do, but the search
: still needs to be filtered to be appropriate for whatever search options are
: active.

A low impact change to consider would be to leverage the cache=false 
local param feature that was added in Solr 3.4...

  https://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters

...you could add this localparam anytime you know the query is coming from 
an employee -- or anytime you know the filter query is esoteric

A higher impact change would be to create a dedicated query slave 
machine (or just an alternate core name that polls the same master) that 
is *only* used by employees and has much lower sizes on the caches -- this 
is the approach i have advocated and seen work very well since the 
pre-apache days of Solr: dedicated instances for each major user base 
with key settings (ie: replication frequencies, cache sizes, cache 
warming, static warming of sorts, etc...) tuned for that user base.  

-Hoss


Getting 411 Length required when adding docs

2011-11-14 Thread Darniz
Hello All, 
i am this strange issue of http 411 Length required error. My Solr is hosted
on third party hosting company and it was working fine all these while. 
i really don't understand why this happened. Attached is the stack trace any
help will be appreciated

org.apache.solr.common.SolrException: Length Required
Length Required

request: http://www.listing-social.com/solr/update?wt=javabinversion=1
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:64)
at org.apache.solr.client.solrj.SolrServer.addBean(SolrServer.java:68)
at
com.listings.solr.service.impl.BulkIndexingServiceImpl.startBulkIndexing(BulkIndexingServiceImpl.java:55)
at
com.listings.action.BulkIndexingAction.execute(BulkIndexingAction.java:42)
at
org.apache.struts.chain.commands.servlet.ExecuteAction.execute(ExecuteAction.java:53)
at
org.apache.struts.chain.commands.AbstractExecuteAction.execute(AbstractExecuteAction.java:64)
at
org.apache.struts.chain.commands.ActionCommandBase.execute(ActionCommandBase.java:48)
at org.apache.commons.chain.impl.ChainBase.execute(ChainBase.java:190)
at
org.apache.commons.chain.generic.LookupCommand.execute(LookupCommand.java:304)
at org.apache.commons.chain.impl.ChainBase.execute(ChainBase.java:190)
at
org.apache.struts.chain.ComposableRequestProcessor.process(ComposableRequestProcessor.java:280)
at 
org.apache.struts.action.ActionServlet.process(ActionServlet.java:1858)
at org.apache.struts.action.ActionServlet.doGet(ActionServlet.java:446)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:362)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)

Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-411-Length-required-when-adding-docs-tp3508372p3508372.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Keyword counts

2011-11-14 Thread Chris Hostetter

: Thanks for the reply. There are many keyword terms (1000?) and not sure if
: Solr would choke on a query string that long. Perhaps solr is not built to

Did you try it?

1000 facet.query params is not a strain for Solr -- but you may find 
problems with your servlet container if you try specifying them all in a 
GET request.

if this list isn't going to change very often it sounds like a perfect use 
case for specifying as appends request params on the request 
handler declaration in your solrconfig.xml

see the comments in solrconfig.xml for examples.

-Hoss


Index format difference between 4.0 and 3.4

2011-11-14 Thread roz dev
Hi All,

We are using Solr 1.4.1 in production and are considering an upgrade to
newer version.

It seems that Solr 3.x requires a complete rebuild of index as the format
seems to have changed.

Is Solr 4.0 index file format compatible with Solr 3.x format?

Please advise.

Thanks
Saroj


File based wordlists for spellchecker

2011-11-14 Thread Tomasz Wegrzanowski
Hi,

I have a very large index, and I'm trying to add a spell checker for it.
I don't want to copy all text in index to extra spell field, since that would
be prohibitively big, and index is already close to how big it can
reasonably be,
so I just want to extract word frequencies as I index for offline processing.

After some filtering I get something like this (word, frequency):

a   122958495
aa  834203
aaa 175206
22389
aaab1522
aaai1050
aaas6384
aab 8109
aabb1906
aac 35100
aacc1692
aachen  11723

I wanted to use FileBasedSpellChecker, but it doesn't support frequencies,
so its recommendations are consistently horrible. Increasing frequency cutoff
won't really help that much - it will still suggest less frequent
words over equally
similar more frequent words.

What's the easiest way to get this working?
Presumably I'd need to create a separate index with just these words.
How do I get frequencies there, without actually creating 11723 records with
aachen in them etc.?

I can do some small Java coding if need be.
I'm already using 3.x branch (mostly for edismax, plus some unrelated
minor patches).

Thanks,
Tomasz


Re: Casesensitive search problem

2011-11-14 Thread jsahoo1...@gmail.com
HI,
Even if i have used all the posibility way like filter
class=solr.LowerCaseFilterFactory/ still i am getting same problrm.If
anyone faced  before same problem  please let me know how you have solved.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Casesensitive-search-problem-tp3506883p3508765.html
Sent from the Solr - User mailing list archive at Nabble.com.