Re: questions on query format

2011-10-24 Thread Ahmet Arslan
 2. If send solr the following query:
   q=*:*
 
   I get nothing just:
    responseresult
 name=response numFound=0 start=0
 maxScore=0.0/lst
 name=highlighting//response
 
 Would appreciate some insight into what is going on.

If you are using dismax as query parser, then *:* won't function as match all 
docs query. To retrieve all docs - with dismax - use q.alt=*:* parameter. Also, 
adding debugQuery=on will display information about parsed query.


Re: Dismax and phrases

2011-10-24 Thread Hyttinen Lauri

On 10/23/2011 09:34 PM, Erick Erickson wrote:

Hmmm dismax is, indeed, different. Note that dismax doesn't respect
the default operator at all, so don't be mislead there.

Could you paste the debug output for both the queries? Perhaps something
will jump out at us.

Best
Erick

Thank you Erick. I've tried to paste the query results here.
First one is the query with 's around the terms and returns 6888 results.
I've hid the explain parts of most of the results (and timing) just to 
keep the email reasonably short.

If you need to see them let me know.
+ designates hidden subtree.

Best regards,
Lauri


lst name=responseHeader
int name=status0/int
int name=QTime91/int

lst name=params
str name=explainOther/
str name=indenton/str
str name=hl.fl/
str name=wtstandard/str
str name=version2.2/str
str name=rows10/str
str name=fl*,score/str
str name=debugQueryon/str
str name=start0/str
str name=qasuntojen hinnat/str
str name=qtdismax/str
str name=fq/
/lst

/lst

+result name=response numFound=6888 start=0 maxScore=3.0879765

lst name=debug
lst name=queryBoosting
str name=qasuntojenhinnat/str
null name=match/
/lststr name=rawquerystringasuntojen hinnat/str
str name=querystringasuntojen hinnat/str

str name=parsedquery+DisjunctionMaxQuery((table.title_t:asuntojen 
hinnat^2.0 | title_t:asuntojen hinnat^2.0 | ingress_t:asuntojen 
hinnat | (text_fi:asunto text_fi:hinta) | (table.description_fi:asunto 
table.description_fi:hinta) | table.description_t:asuntojen hinnat | 
graphic.title_t:asuntojen hinnat^2.0 | ((graphic.title_fi:asunto 
graphic.title_fi:hinta)^2.0) | ((table.title_fi:asunto 
table.title_fi:hinta)^2.0) | table.contents_t:asuntojen hinnat | 
text_t:asuntojen hinnat | (ingress_fi:asunto ingress_fi:hinta) | 
(table.contents_fi:asunto table.contents_fi:hinta) | ((title_fi:asunto 
title_fi:hinta)^2.0))~0.01) () type:tie^6.0 type:kuv^2.0 type:tau^2.0 
FunctionQuery((1.0/(3.16E-11*float(ms(const(1319437912691),date(date.modified_dt)))+1.0))^100.0)/str


str name=parsedquery_toString+(table.title_t:asuntojen hinnat^2.0 
| title_t:asuntojen hinnat^2.0 | ingress_t:asuntojen hinnat | 
(text_fi:asunto text_fi:hinta) | (table.description_fi:asunto 
table.description_fi:hinta) | table.description_t:asuntojen hinnat | 
graphic.title_t:asuntojen hinnat^2.0 | ((graphic.title_fi:asunto 
graphic.title_fi:hinta)^2.0) | ((table.title_fi:asunto 
table.title_fi:hinta)^2.0) | table.contents_t:asuntojen hinnat | 
text_t:asuntojen hinnat | (ingress_fi:asunto ingress_fi:hinta) | 
(table.contents_fi:asunto table.contents_fi:hinta) | ((title_fi:asunto 
title_fi:hinta)^2.0))~0.01 () type:tie^6.0 type:kuv^2.0 type:tau^2.0 
(1.0/(3.16E-11*float(ms(const(1319437912691),date(date.modified_dt)))+1.0))^100.0/str


lst name=explain
str 
name=/media/nss/DATA2/data/wwwprod/til/ashi/2011/07/ashi_2011_07_2011-08-26_tie_001_fi.html

3.1653805 = (MATCH) sum of:
  1.9299976 = (MATCH) max plus 0.01 times others of:
1.9211313 = weight(title_t:asuntojen hinnat^2.0 in 5891), product of:
  0.26658234 = queryWeight(title_t:asuntojen hinnat^2.0), product of:
2.0 = boost
14.413042 = idf(title_t: asuntojen=250 hinnat=329)
0.009247955 = queryNorm
  7.206521 = fieldWeight(title_t:asuntojen hinnat in 5891), 
product of:

1.0 = tf(phraseFreq=1.0)
14.413042 = idf(title_t: asuntojen=250 hinnat=329)
0.5 = fieldNorm(field=title_t, doc=5891)
0.03292808 = (MATCH) sum of:
  0.016520109 = (MATCH) weight(text_fi:asunto in 5891), product of:
0.044221584 = queryWeight(text_fi:asunto), product of:
  4.781769 = idf(docFreq=3251, maxDocs=142742)
  0.009247955 = queryNorm
0.3735757 = (MATCH) fieldWeight(text_fi:asunto in 5891), 
product of:

  1.0 = tf(termFreq(text_fi:asunto)=1)
  4.781769 = idf(docFreq=3251, maxDocs=142742)
  0.078125 = fieldNorm(field=text_fi, doc=5891)
  0.016407972 = (MATCH) weight(text_fi:hinta in 5891), product of:
0.03705935 = queryWeight(text_fi:hinta), product of:
  4.0073023 = idf(docFreq=7054, maxDocs=142742)
  0.009247955 = queryNorm
0.44274852 = (MATCH) fieldWeight(text_fi:hinta in 5891), 
product of:

  1.4142135 = tf(termFreq(text_fi:hinta)=2)
  4.0073023 = idf(docFreq=7054, maxDocs=142742)
  0.078125 = fieldNorm(field=text_fi, doc=5891)
0.34379265 = (MATCH) sum of:
  0.19207533 = (MATCH) weight(graphic.title_fi:asunto in 5891), 
product of:

0.10662244 = queryWeight(graphic.title_fi:asunto), product of:
  5.76465 = idf(docFreq=1216, maxDocs=142742)
  0.01849591 = queryNorm
1.8014531 = (MATCH) fieldWeight(graphic.title_fi:asunto in 
5891), product of:

  1.0 = tf(termFreq(graphic.title_fi:asunto)=1)
  5.76465 = idf(docFreq=1216, maxDocs=142742)
  0.3125 = fieldNorm(field=graphic.title_fi, doc=5891)
  0.15171732 = (MATCH) weight(graphic.title_fi:hinta in 5891), 
product of:

0.09476117 = 

Re: Want to support did you mean xxx but is Chinese

2011-10-24 Thread Floyd Wu
Hi Li Li,

Thanks for your detail explanation. Basically I have similar
implementation like yours. I just want to know if there is a better
and total solution. I'll keep trying and see if I have any improvement
that can share with you and the community.

Any idea or advice are welcome .

Floyd



2011/10/21 Li Li fancye...@gmail.com:
we have implemented one supporting did you mean and preffix suggestion
 for Chinese. But we base our working on solr 1.4 and we did many
 modifications so it will cost time to integrate it to current solr/lucene.

 Here are our solution. glad to see any advices.

 1. offline words and phrases discovery.
   we discovery new words and new phrases by mining query logs

 2. online matching algorithm
   for each word, e.g., 贝多芬
   we convert it to pinyin bei duo fen, then we indexing it using
 n-gram, which means gram3:bei gram3:eid ...
   to get did you mean result, we convert query 背朵分 into n-gram,
 it's a boolean or query, so there are many results( the words' pinyin
 similar to query will be ranked top)
  Then we reranks top 500 results by fine-grained algorithm
  we use edit distance to align query and result, we also take
 character into consideration. e.g query 十度,matches are 十渡 and 是度,their
 pinyins are exactly the same the 十渡 is better than 是度 because 十 occured in
 both query and match
  also you need consider the hotness(popular degree) of different
 words/phrases. which can be known from query logs

  Another question is to convert Chinese into pinyin. because some
 character has more than one pinyin.
 e.g. 长沙 长大 长's pinyin is chang in 长沙,you should segment query and
 words/phrases first. word segmentation is a basic problem is Chinese IR


 2011/10/21 Floyd Wu floyd...@gmail.com

 Does anybody know how to implement this idea in SOLR. Please kindly
 point me a direction.

 For example, when user enter a keyword in Chinese ��多芬 (this is
 Beethoven in Chinese)
 but key in a wrong combination of characters  背多分 (this is
 pronouncation the same with previous keyword ��多芬).

 There in solr index exist token ��多芬 actually. How to hit documents
 where ��多芬 exist when 背多分 is enter.

 This is basic function of commercial search engine especially in
 Chinese processing. I wonder how to implements in SOLR and where is
 the start point.

 Floyd




Using CURL to index directory

2011-10-24 Thread Jagdish Kumar

Hi
 
I have been using curl for indexing individual files, does anyone of you knows 
how to index entire directory using curl ?
 
Thanks
Jagdish   

Re: Can Solr handle large text files?

2011-10-24 Thread Peter Spam
Thanks for the reminder - I had that set to 214xxx... (the max), but perf was 
terrible when I injected large files.

So what's the max recommended field size in kb?  I can try chopping up the 
syslogs into arbitrarily small pieces, but would love to know where to start.

Thanks!

Sent from my iPhone

On Oct 23, 2011, at 2:01 PM, Erick Erickson erickerick...@gmail.com wrote:

 Also be aware that by default Solr is configured to only index the
 first 10,000 lines
 of text. See maxFieldLength in solrconfig.xml
 
 Best
 Erick
 
 On Fri, Oct 21, 2011 at 7:34 PM, Peter Spam ps...@mac.com wrote:
 Thanks for your note, Anand.  What was the maximum chunk size for you?  
 Could you post the relevant portions of your configuration file?
 
 
 Thanks!
 Pete
 
 On Oct 21, 2011, at 4:20 AM, anand.ni...@rbs.com wrote:
 
 Hi,
 
 I was also facing the issue of highlighting the large text files. I applied 
 the solution proposed here and it worked. But I am getting following error :
 
 
 Basically 'hitGrouped.vm' is not found. I am using solr-3.4.0. Where can I 
 get this file from. Its reference is present in browse.vm
 
 div class=results
  #if($response.response.get('grouped'))
#foreach($grouping in $response.response.get('grouped'))
  #parse(hitGrouped.vm)
#end
  #else
#foreach($doc in $response.results)
  #parse(hit.vm)
#end
  #end
 /div
 
 
 HTTP Status 500 - Can't find resource 'hitGrouped.vm' in classpath or 
 'C:\caprice\workspace\caprice\dist\DEV\solr\.\conf/', 
 cwd=C:\glassfish3\glassfish\domains\domain1\config 
 java.lang.RuntimeException: Can't find resource 'hitGrouped.vm' in 
 classpath or 'C:\caprice\workspace\caprice\dist\DEV\solr\.\conf/', 
 cwd=C:\glassfish3\glassfish\domains\domain1\config at 
 org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:268)
  at 
 org.apache.solr.response.SolrVelocityResourceLoader.getResourceStream(SolrVelocityResourceLoader.java:42)
  at org.apache.velocity.Template.process(Template.java:98) at 
 org.apache.velocity.runtime.resource.ResourceManagerImpl.loadResource(ResourceManagerImpl.java:446)
  at
 
 Thanks  Regards,
 Anand
 Anand Nigam
 RBS Global Banking  Markets
 Office: +91 124 492 5506
 
 
 -Original Message-
 From: karsten-s...@gmx.de [mailto:karsten-s...@gmx.de]
 Sent: 21 October 2011 14:58
 To: solr-user@lucene.apache.org
 Subject: Re: Can Solr handle large text files?
 
 Hi Peter,
 
 highlighting in large text files can not be fast without dividing the 
 original text in small piece.
 So take a look in
 http://xtf.cdlib.org/documentation/under-the-hood/#Chunking
 and in
 http://www.lucidimagination.com/blog/2010/09/16/2446/
 
 Which means that you should divide your files and use Result Grouping / 
 Field Collapsing to list only one hit per original document.
 
 (xtf also would solve your problem out of the box but xtf does not use 
 solr).
 
 Best regards
  Karsten
 
  Original-Nachricht 
 Datum: Thu, 20 Oct 2011 17:59:04 -0700
 Von: Peter Spam ps...@mac.com
 An: solr-user@lucene.apache.org
 Betreff: Can Solr handle large text files?
 
 I have about 20k text files, some very small, but some up to 300MB,
 and would like to do text searching with highlighting.
 
 Imagine the text is the contents of your syslog.
 
 I would like to type in some terms, such as error and mail, and
 have Solr return the syslog lines with those terms PLUS two lines of 
 context.
 Pretty much just like Google's highlighting.
 
 1) Can Solr handle this?  I had extremely long query times when I
 tried this with Solr 1.4.1 (yes I was using TermVectors, etc.).  I
 tried breaking the files into 1MB pieces, but searching would be wonky
 = return the wrong number of documents (ie. if one file had a term 5
 times, and that was the only file that had the term, I want 1 result, not 
 5 results).
 
 2) What sort of tokenizer would be best?  Here's what I'm using:
 
   field name=body type=text_pl indexed=true stored=true
 multiValued=false termVectors=true termPositions=true
 termOffsets=true /
 
fieldType name=text_pl class=solr.TextField
  analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 catenateWords=0 
 catenateNumbers=0
 catenateAll=0 splitOnCaseChange=0/
  /analyzer
/fieldType
 
 
 Thanks!
 Pete
 
 ***
 The Royal Bank of Scotland plc. Registered in Scotland No 90312.
 Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB.
 Authorised and regulated by the Financial Services Authority. The
 Royal Bank of Scotland N.V. is authorised and regulated by the
 De Nederlandsche Bank and has its seat at Amsterdam, the
 Netherlands, and is registered in the Commercial Register under
 number 33002587. Registered Office: Gustav Mahlerlaan 350,
 Amsterdam, The Netherlands. The 

Re: Using CURL to index directory

2011-10-24 Thread Dirceu Vieira
Hi,

Try the attached post-text.sh file.
It was not written by me, it's part of a great tutorial written by Avi
Rappoport that you can find at:
http://www.lucidimagination.com/devzone/technical-articles/whitepapers/indexing-text-and-html-files-solr

Regards,

On Mon, Oct 24, 2011 at 9:13 AM, Jagdish Kumar 
jagdish.thapar...@hotmail.com wrote:


 Hi

 I have been using curl for indexing individual files, does anyone of you
 knows how to index entire directory using curl ?

 Thanks
 Jagdish




-- 
Dirceu Vieira Júnior
---
+47 9753 2473
dirceuvjr.blogspot.com
twitter.com/dirceuvjr


post-text.sh
Description: Bourne shell script


Re: Painfully slow indexing

2011-10-24 Thread Pranav Prakash
Hey guys,

Your responses are welcome, but I still haven't gained a lot of improvements

*Are you posting through HTTP/SOLRJ?*
I am using RSolr gem, which internally uses Ruby HTTP lib to POST document
to Solr

*Your script time 'T' includes time between sending POST request -to-
the response fetched after successful response right??*
Correct. It also includes the time taken to convert all those documents from
a Ruby Hash to XML.


 *generate the ready-for-indexing XML documents on a file system*
Alain, I have somewhere 6m documents for Indexing. You mean to say that I
should convert all of it into one XML file and then index?

*are you calling commit after your batches or do an optimize by any chance?*
I am not optimizing, but I am performing an autocommit every 10 docs.

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com |
Google http://www.google.com/profiles/pranny


On Fri, Oct 21, 2011 at 16:32, Simon Willnauer 
simon.willna...@googlemail.com wrote:

 On Wed, Oct 19, 2011 at 3:58 PM, Pranav Prakash pra...@gmail.com wrote:
  Hi guys,
 
  I have set up a Solr instance and upon attempting to index document, the
  whole process is painfully slow. I will try to put as much info as I can
 in
  this mail. Pl. feel free to ask me anything else that might be required.
 
  I am sending documents in batches not exceeding 2,000. The size of each
 of
  them depends but usually is around 10-15MiB. My indexing script tells me
  that Solr took T seconds to add N documents of size S. For the same data,
  the Solr Log add QTime is QT. Some of the sample data are:
 
N ST   QT
  -
   390 docs  |   3,478,804 Bytes   | 14.5s|  2297
   852 docs  |   6,039,535 Bytes   | 25.3s|  4237
  1345 docs | 11,147,512 Bytes   |  47s  |  8543
  1147 docs |   9,457,717 Bytes   |  44s  |  2297
  1096 docs | 13,058,204 Bytes   |  54.3s   |   8782
 
  The time T includes the time of converting an array of Hash objects into
  XML, POSTing it to Solr and response acknowledged from Solr. Clearly,
 there
  is a huge difference between both the time T and QT. After a lot of
 efforts,
  I have no clue why these times do not match.
 
  The Server has 16 cores, 48GiB RAM. JVM options are -Xms5000M -Xmx5000M
  -XX:+UseParNewGC
 
  I believe my Indexing is getting slow. Relevant portion from my schema
 file
  are as follows. On a related note, every document has one dynamic field.
  Based on this rate, it takes me ~30hrs to do a full index of my database.
  I would really appreciate kindness of community in order to get this
  indexing faster.
 
  indexDefaults
 
  useCompoundFilefalse/useCompoundFile
 
  mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler
 
  int name=maxMergeCount10/int
 
  int name=maxThreadCount10/int
 
   /mergeScheduler
 
  ramBufferSizeMB2048/ramBufferSizeMB
 
  maxMergeDocs2147483647/maxMergeDocs
 
  maxFieldLength300/maxFieldLength
 
  writeLockTimeout1000/writeLockTimeout
 
  maxBufferedDocs5/maxBufferedDocs
 
  termIndexInterval256/termIndexInterval
 
  mergeFactor10/mergeFactor
 
  useCompoundFilefalse/useCompoundFile
 
  !-- mergePolicy class=org.apache.lucene.index.TieredMergePolicy
 
   int name=maxMergeAtOnceExplicit19/int
 
  int name=segmentsPerTier9/int
 
  /mergePolicy --
 
  /indexDefaults
 
  mainIndex
 
  unlockOnStartuptrue/unlockOnStartup
 
  reopenReaderstrue/reopenReaders
 
  deletionPolicy class=solr.SolrDeletionPolicy
 
   str name=maxCommitsToKeep1/str
 
  str name=maxOptimizedCommitsToKeep0/str
 
  /deletionPolicy
 
  infoStream file=INFOSTREAM.txtfalse/infoStream
 
  /mainIndex
 
  updateHandler class=solr.DirectUpdateHandler2 
 
  autoCommit
 
   maxDocs10/maxDocs
 
  /autoCommit
 
  /updateHandler
 
 
  *Pranav Prakash*
 
  temet nosce
 
  Twitter http://twitter.com/pranavprakash | Blog 
 http://blog.myblive.com |
  Google http://www.google.com/profiles/pranny
 

 hey,

 are you calling commit after your batches or do an optimize by any chance?

 I would suggest you to stream your documents to solr and try to commit
 only if you really need to. Set your RAM Buffer to something between
 256 and 320 MB and remove the maxBufferedDocs setting completely. You
 can also experiment with your merge settings a little and 10 merging
 threads seem to be a lot. I know you have lots of CPU but IO will be
 the bottleneck here.

 simon



Re: Solr indexing plugin: skip single faulty document?

2011-10-24 Thread samuele.mattiuzzo
Thanks Erik! I'll be reading that issue, it's pretty much everything i need!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-plugin-skip-single-faulty-document-tp3427646p3447400.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr indexing plugin: skip single faulty document?

2011-10-24 Thread Erick Erickson
Don't get too excited, I don't know what the state of that patch is
in. It's on my lng
TODO list to go back and look some more. If you wanted to work on it
and bring it up
to snuff please feel free to do it and submit a modernized patch!

Erick

On Mon, Oct 24, 2011 at 9:44 AM, samuele.mattiuzzo samum...@gmail.com wrote:
 Thanks Erik! I'll be reading that issue, it's pretty much everything i need!

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-indexing-plugin-skip-single-faulty-document-tp3427646p3447400.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: help needed on solr-uima integration

2011-10-24 Thread Xue-Feng Yang
Hi,

Where can I find test code for solr-uima component?

Thanks,

Xue-Feng




From: Xue-Feng Yang just4l...@yahoo.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Sunday, October 23, 2011 3:43:58 AM
Subject: help needed on solr-uima integration

Hi,

After google online, some parts in the puzzle still missing. The best is to 
find a simple example showing the whole process. Is there any example like 
apache-uima/examples/descriptors/tutorial/ex3 RoomNumber and DateTime 
integrated into solr?  In particular, how to feed text into solr for indexing 
which has at least two fields?

Thanks,

Xue-Feng

RE: Using CURL to index directory

2011-10-24 Thread Jagdish Kumar

Thanks for quick response
 
I am working on Windows machine and also I need to post text, zip, pdfs, images 
etc , it would be gr8 if you can help me out with multiple filetypes on windows
 
Thanks
Jagdish
 



Date: Mon, 24 Oct 2011 09:30:49 +0200
Subject: Re: Using CURL to index directory
From: dirceu...@gmail.com
To: solr-user@lucene.apache.org

Hi,


Try the attached post-text.sh file.
It was not written by me, it's part of a great tutorial written by Avi 
Rappoport that you can find at: 
http://www.lucidimagination.com/devzone/technical-articles/whitepapers/indexing-text-and-html-files-solr


Regards,


On Mon, Oct 24, 2011 at 9:13 AM, Jagdish Kumar jagdish.thapar...@hotmail.com 
wrote:


Hi

I have been using curl for indexing individual files, does anyone of you knows 
how to index entire directory using curl ?

Thanks
Jagdish   


-- 
Dirceu Vieira Júnior
---
+47 9753 2473
dirceuvjr.blogspot.com
twitter.com/dirceuvjr
  

Re: Solr indexing plugin: skip single faulty document?

2011-10-24 Thread samuele.mattiuzzo
Ok i'll surely check out what i can!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-plugin-skip-single-faulty-document-tp3427646p3447537.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: data-import problem

2011-10-24 Thread karsten-solr
Hi Radha Krishna,

try command full-import instead of fullimport
see
http://wiki.apache.org/solr/DataImportHandler#Commands


Best regards
  Karsten

 Original-Nachricht 
 Datum: Mon, 24 Oct 2011 11:10:22 +0530
 Von: Radha Krishna Reddy radhakrishn...@gmail.com
 An: solr-user@lucene.apache.org
 Betreff: data-import problem

 Hi,
 
 I am trying to comfigure solr on aws ubuntu instance.I have mysql on a
 different server.so i created a ssh tunnel for mysql on port 3309.
 
 Download the mysql jdbc driver and copied it to lib folder.
 
 *I edited the example/solr/conf/solrconfig.xml*
...
 *when i tried to import data.*
 
 http://myservername/solr/dataimport?command=fullimport
 
 i* am getting the following response*
 
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint
 name=QTime5/int/lstlst name=initArgslst
 name=defaultsstr
 name=configdata-config.xml/str/lst/lststr
 name=commandfullimport/strstr name=statusidle/strstr
 name=importResponse/lst name=statusMessages/str
 name=WARNINGThis response format is experimental.  It is likely to
 change in the future./str
 /response
 
 
 Can someone help me on this?Also where can i find the logs.
 
 Thanks and Regards,
 Radha Krishna.


Re: multiple document types in a core

2011-10-24 Thread lee carroll
Hi Erick,

Your right I think. On resources we gain a little bit on:
disk (a production implementation with live data would be 500 mb saved
in disk usage on each slave and master)
some reduction in network traffic on replication (we do a full
re-index every 24 hours at present)

On design we gain a little by being able to support searches at
various document levels (perform a destination search or hotel search
and return
documents at the correct level for the search with out the need to
perform field collapsing)

But in the cold light of day I don't think we gain huge amounts.
(leaving aside the index replication of a full index)

cheers lee c



On 23 October 2011 19:05, Erick Erickson erickerick...@gmail.com wrote:
 Yes, stored fields are placed verbatim for every doc. But I wonder
 at the utility of trying to share stored information. The stored
 info is put in certain files in the index, see:
 http://lucene.apache.org/java/3_0_2/fileformats.html#file-names

 and the files that store data are pretty much irrelevant to searching,
 the data in them is only referenced when assembling the document
 for return. So by adding this complexity you'll be saving a bit
 on file transfers when replicating your index, but not much else.

 Is it worth it? If so, why?

 Best
 Erick

 On Mon, Oct 17, 2011 at 11:07 AM, lee carroll
 lee.a.carr...@googlemail.com wrote:
 Just as a follow up

 it looks like stored fields are stored verbatim for every doc.

 hotel index and store dest attributes
 index size: 131M
 number of records 49147

 hotel index only dest attributes

 index size: 111m
 number of records 49147


 ~400 chars(bytes) of destination data * 49147 (number of hotel docs) = ~19m

 basically everything is being stored

 No difference in time to index (very rough and not scientific :-) )

 So it does seem an ok strategy to denormalise docs with index fields
 but normalise with stored fields ?
 Or have i missed some problems with this ?

 cheers lee c



 On 16 October 2011 11:54, lee carroll lee.a.carr...@googlemail.com wrote:
 Hi Chris thanks for the response

 It's an inverted index, so *tems* exist once (per segment) and those terms
 point to the documents -- so having the same terms (in the same fields)
 for multiple types of documents in one index is going to take up less
 overall space then having distinct collections for each type of document.

 I'm not asking about the indexed terms but rather the stored values.
 By having two doc types are we gaining anything by storing
 attributes only for that doc type

 cheers lee c





Re: help needed on solr-uima integration

2011-10-24 Thread Koji Sekiguchi

(11/10/24 17:42), Xue-Feng Yang wrote:

Hi,

Where can I find test code for solr-uima component?


You should find them under:

solr/contrib/uima/src/test

koji
--
Check out Query Log Visualizer for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/


indexing key value pair into lucene solr index

2011-10-24 Thread jame vaalet
hi,
in my use case i have list of key value pairs in each document object, if i
index them as separate index fields then in the result doc object i will get
two arrays corresponding to my keys and values. The problem i face here is
that there wont be any mapping between those keys and values.

do we have any easy to index these data in solr ? thanks in advance ...

-- 

-JAME


Re: indexing key value pair into lucene solr index

2011-10-24 Thread karsten-solr
Hi Jame,

you can
 - generate one token for each pair (key, value) -- key_value
 - insert a gap between each pair and us phrase queries
 - use key as field-name (if you have a restricted set of keys)
 - wait for joins in Solr 4.0 (http://wiki.apache.org/solr/Join)
 - use position or payloads to connect key and value
 - tell the forum your exact use-case with examples

Best regrads
  Karsten

 Original-Nachricht 
 Datum: Mon, 24 Oct 2011 17:11:49 +0530
 Von: jame vaalet jamevaa...@gmail.com
 An: solr-user@lucene.apache.org
 Betreff: indexing key value pair into lucene solr index

 hi,
 in my use case i have list of key value pairs in each document object, if
 i
 index them as separate index fields then in the result doc object i will
 get
 two arrays corresponding to my keys and values. The problem i face here is
 that there wont be any mapping between those keys and values.
 
 do we have any easy to index these data in solr ? thanks in advance ...
 
 -- 
 
 -JAME


Re: indexing key value pair into lucene solr index

2011-10-24 Thread jame vaalet
thanks karsten.
can we preserve order within index field ? if yes, i can index them
separately and map them using their order.

On 24 October 2011 17:32, karsten-s...@gmx.de wrote:

 Hi Jame,

 you can
  - generate one token for each pair (key, value) -- key_value
  - insert a gap between each pair and us phrase queries
  - use key as field-name (if you have a restricted set of keys)
  - wait for joins in Solr 4.0 (http://wiki.apache.org/solr/Join)
  - use position or payloads to connect key and value
  - tell the forum your exact use-case with examples

 Best regrads
  Karsten

  Original-Nachricht 
  Datum: Mon, 24 Oct 2011 17:11:49 +0530
  Von: jame vaalet jamevaa...@gmail.com
  An: solr-user@lucene.apache.org
  Betreff: indexing key value pair into lucene solr index

  hi,
  in my use case i have list of key value pairs in each document object, if
  i
  index them as separate index fields then in the result doc object i will
  get
  two arrays corresponding to my keys and values. The problem i face here
 is
  that there wont be any mapping between those keys and values.
 
  do we have any easy to index these data in solr ? thanks in advance ...
 
  --
 
  -JAME




-- 

-JAME


Re: indexing key value pair into lucene solr index

2011-10-24 Thread karsten-solr
Hi Jame,

preserve order in index fields:

if you don't want to use phrase queries in key or value this order is 
position.
if you use phrase queries but no value has more then 50 Tokens you also could 
use position and start each pair with position 100, 200, 300 ...
Otherwise you could use payloads.

Imho there is no standard way to connect the positions of two fields.
You have to write your own Query.
My Tip: 
 Take org.apache.lucene.search.spans.TermSpans as starting point and use the 
queryparser-Module.

btw: 
normaly there is a standard solution in lucene for each problem.
So please tell more about your use-case and somebody will have an answer 
without program by your own.

Best regards
  Karsten



 Original-Nachricht 
 Datum: Mon, 24 Oct 2011 17:53:26 +0530
 Von: jame vaalet jamevaa...@gmail.com
 An: solr-user@lucene.apache.org
 Betreff: Re: indexing key value pair into lucene solr index

 thanks karsten.
 can we preserve order within index field ? if yes, i can index them
 separately and map them using their order.
 
 On 24 October 2011 17:32, karsten-s...@gmx.de wrote:
 
  Hi Jame,
 
  you can
   - generate one token for each pair (key, value) -- key_value
   - insert a gap between each pair and us phrase queries
   - use key as field-name (if you have a restricted set of keys)
   - wait for joins in Solr 4.0 (http://wiki.apache.org/solr/Join)
   - use position or payloads to connect key and value
   - tell the forum your exact use-case with examples
 
  Best regrads
   Karsten
 
   Original-Nachricht 
   Datum: Mon, 24 Oct 2011 17:11:49 +0530
   Von: jame vaalet jamevaa...@gmail.com
   An: solr-user@lucene.apache.org
   Betreff: indexing key value pair into lucene solr index
 
   hi,
   in my use case i have list of key value pairs in each document object,
 if
   i
   index them as separate index fields then in the result doc object i
 will
   get
   two arrays corresponding to my keys and values. The problem i face
 here
  is
   that there wont be any mapping between those keys and values.
  
   do we have any easy to index these data in solr ? thanks in advance
 ...
  
   --
  
   -JAME
 
 
 
 
 -- 
 
 -JAME


Re: indexing key value pair into lucene solr index

2011-10-24 Thread Ken Krugler

On Oct 24, 2011, at 1:41pm, jame vaalet wrote:

 hi,
 in my use case i have list of key value pairs in each document object, if i
 index them as separate index fields then in the result doc object i will get
 two arrays corresponding to my keys and values. The problem i face here is
 that there wont be any mapping between those keys and values.
 
 do we have any easy to index these data in solr ? thanks in advance ...

As Karsten said, providing more detail re what you're actually trying to do 
usually makes for better and more helpful/accurate answers.

But I'm guessing you only want to search on the key, not the value, right?

If so, then:

1. Create a multi-value field with a custom type, indexed, stored.
2. During indexing, add entries as keytabvalue
3. In the custom type, set the analyzer to strip off the tabvalue so you 
only index the key. E.g.

fieldType name=key_value class=solr.TextField 
positionIncrementGap=100 autoGeneratePhraseQueries=true 
omitTermFreqAndPositions=true omitNorms=true
  analyzer type=index
!-- Get rid of tabvalue text at the end of each string --
charFilter class=solr.PatternReplaceCharFilterFactory 
pattern=\t\d+$ replacement= /
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=1/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
splitOnCaseChange=1/
  /analyzer
/fieldType

-- Ken

--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom big data solutions  training
Hadoop, Cascading, Mahout  Solr





Re: help needed on solr-uima integration

2011-10-24 Thread Xue-Feng Yang
Thanks Koji. I found it. I should the solution there.

Xue-Feng




From: Koji Sekiguchi k...@r.email.ne.jp
To: solr-user@lucene.apache.org
Sent: Monday, October 24, 2011 7:30:01 AM
Subject: Re: help needed on solr-uima integration

(11/10/24 17:42), Xue-Feng Yang wrote:
 Hi,

 Where can I find test code for solr-uima component?

You should find them under:

solr/contrib/uima/src/test

koji
-- 
Check out Query Log Visualizer for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/

Re: java.lang.NoSuchMethodError: org.slf4j.spi.LocationAwareLogger.log

2011-10-24 Thread tgfisher
I am currently running into the exact same exception, but I'm not using
Maven. What are my options to fix the issue?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-lang-NoSuchMethodError-org-slf4j-spi-LocationAwareLogger-log-tp3435001p3447968.html
Sent from the Solr - User mailing list archive at Nabble.com.


Tag cloud from tweets

2011-10-24 Thread Rohit
I have saved tweets related to some keywords in solr, can Solr be used to
generate the tag cloud of important words from these tweets?

 

Regards,

Rohit 

 



Re: Tag cloud from tweets

2011-10-24 Thread Erik Hatcher
Sure.  Just facet on a tokenized field of the tweet text.  You'll want to tune 
the analysis configuration to suit your desires, but no problem getting counts 
back using facet=onfacet.field=tweet_text kinda thing.

Erik

On Oct 24, 2011, at 13:14 , Rohit wrote:

 I have saved tweets related to some keywords in solr, can Solr be used to
 generate the tag cloud of important words from these tweets?
 
 
 
 Regards,
 
 Rohit 
 
 
 



RE: Optimization /Commit memory

2011-10-24 Thread Jaeger, Jay - DOT
I have not spent a lot of time researching it, but one would expect that the OS 
RAM requirement for optimization of an index to be minimal.

My understanding is that during optimization an essentially new index is built. 
 Once complete it switches out the indexes and will throw away the old one.  
(In Windows it may not throw away the old one until the next Commit).

JRJ

-Original Message-
From: Sujatha Arun [mailto:suja.a...@gmail.com] 
Sent: Friday, October 21, 2011 12:10 AM
To: solr-user@lucene.apache.org
Subject: Re: Optimization /Commit memory

Just one more thing ,when we are talking about Optimization , we
are referring to  HD  free space for  replicating the index  (2 or 3 times
the index size  ) .what is role of  RAM (OS) here?

Regards
Suajtha

On Fri, Oct 21, 2011 at 10:12 AM, Sujatha Arun suja.a...@gmail.com wrote:

 Thanks that helps.

 Regards
 Sujatha


 On Thu, Oct 20, 2011 at 6:23 PM, Jaeger, Jay - DOT 
 jay.jae...@dot.wi.govwrote:

 Well, since the OS RAM includes the JVM RAM, that is part of your
 requirement, yes?  Aside from the JVM and normal OS requirements, all you
 need OS RAM for is file caching.  Thus, for updates, the OS RAM is not a
 major factor.  For searches, you want sufficient OS RAM to cache enough of
 the index to get the query performance you need, and to cache queries inside
 the JVM if you get a lot of repeat queries (see solrconfig.xml for the
 various caches: we have not played with them much).  So, the amount of RAM
 necessary for that is very much dependent upon the size of your index, so I
 cannot give you a simple number.

 You seem to believe that you have to have sufficient memory to have the
 entire index in memory.  Except where extremely high performance is
 required, I have not found that to be the case.

 This is just one of those your mileage may vary things.  There is not a
 single answer or formula that fits every situation.

 JRJ

 -Original Message-
 From: Sujatha Arun [mailto:suja.a...@gmail.com]
 Sent: Wednesday, October 19, 2011 11:58 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Optimization /Commit memory

 Thanks  Jay ,

 I was trying to compute the *OS RAM requirement*  *not JVM RAM* for a 14
 GB
 Index [cumulative Index size of all Instances].And I put it thus -

 Requirement of Operating System RAM for an Index of  14GB is   - Index
 Size
 + 3 Times the  maximum Index Size of Individual Instance for Optimize .

 That is to say ,I have several Instances ,combined Index Size is 14GB
 .Maximum Individual Index Size is 2.5GB .so My requirement for OS RAM is
  14GB +3 * 2.5 GB  ~ = 22GB.

 Correct?

 Regards
 Sujatha



 On Thu, Oct 20, 2011 at 3:45 AM, Jaeger, Jay - DOT jay.jae...@dot.wi.gov
 wrote:

  Commit does not particularly spike disk or memory usage, unless you are
  adding a very large number of documents between commits.  A commit can
 cause
  a need to merge indexes, which can increase disk space temporarily.  An
  optimize is *likely* to merge indexes, which will usually increase disk
  space temporarily.
 
  How much disk space depends very much upon how big your index is in the
  first place.  A 2 to 3 times factor of the sum of your peak index file
 size
  seems safe, to me.
 
  Solr uses only modest amounts of memory for the JVM for this stuff.
 
  JRJ
 
  -Original Message-
  From: Sujatha Arun [mailto:suja.a...@gmail.com]
  Sent: Wednesday, October 19, 2011 4:04 AM
  To: solr-user@lucene.apache.org
  Subject: Optimization /Commit memory
 
  Do we require  2 or 3 Times OS RAM memory or  Hard Disk Space while
  performing Commit or Optimize or Both?
 
  what is the requirement in terms of  size of RAM and HD for commit and
  Optimize
 
  Regards
  Sujatha
 





some basic information on Solr

2011-10-24 Thread Dan Wu
 Hi all,

I am doing a student project on search engine research. Right now I have
some basic questions about Slor.

1. How many types of data file Solr can support (estimate)? i.e. No. of
file types solr can look at for indexing and searching.

2. How much is estimated cost of incidents per year for Solr ?

Since the numbers could vary from different platforms, however we would like
to know the estimate answers regarding the general cases.

Thanks



-- 
Dan Wu (Fiona Wu)  武丹
Master of Engineering Management Program Degree Candidate
Duke University, North Carolina, USA
Email: dan...@duke.edu
Tel: 919-599-2730


RE: some basic information on Solr

2011-10-24 Thread Jaeger, Jay - DOT
1.  Solr, proper, does not index files.  An adjunct called Solr Cel can.  See 
http://wiki.apache.org/solr/ExtractingRequestHandler .  That article describes 
which kinds of files it Solr Cel can handle.

2.  I have no idea what you mean by incidents per year.  Please explain.

3.  Even though you didn't ask:  You are apparently a student at an advanced 
level.  At your level I would guess that your professors expect *YOU* to read 
thru the material available on the Internet on Solr and figure it out on your 
own, rather than just asking others to do your work for you.  ;^)

In particular, before asking further questions you should probably read thru 
http://wiki.apache.org/solr/FrontPage and 
http://lucene.apache.org/solr/tutorial.html .

JRJ

-Original Message-
From: Dan Wu [mailto:wudan1...@gmail.com] 
Sent: Monday, October 24, 2011 12:43 PM
To: solr-user@lucene.apache.org
Subject: some basic information on Solr

 Hi all,

I am doing a student project on search engine research. Right now I have
some basic questions about Slor.

1. How many types of data file Solr can support (estimate)? i.e. No. of
file types solr can look at for indexing and searching.

2. How much is estimated cost of incidents per year for Solr ?

Since the numbers could vary from different platforms, however we would like
to know the estimate answers regarding the general cases.

Thanks



-- 
Dan Wu (Fiona Wu)  武丹
Master of Engineering Management Program Degree Candidate
Duke University, North Carolina, USA
Email: dan...@duke.edu
Tel: 919-599-2730


RE: indexing key value pair into lucene solr index

2011-10-24 Thread Jaeger, Jay - DOT
Maybe put them in a single string field (or any other field type that is not 
analyzed -- certainly not text) using some character separator that will 
connect them, but won't confuse the Solr query parser?

So maybe you start out with key value pairs of

Key1 value1
Key2 value2
Key3 value3

Preprocess them for indexing, and then index (and search) for them as, for 
example, 

Key1$value1
Key2$value2
Key3$value3

(You could also store their individual values in a separate field, of course).

JRJ

-Original Message-
From: jame vaalet [mailto:jamevaa...@gmail.com] 
Sent: Monday, October 24, 2011 6:42 AM
To: solr-user@lucene.apache.org
Subject: indexing key value pair into lucene solr index

hi,
in my use case i have list of key value pairs in each document object, if i
index them as separate index fields then in the result doc object i will get
two arrays corresponding to my keys and values. The problem i face here is
that there wont be any mapping between those keys and values.

do we have any easy to index these data in solr ? thanks in advance ...

-- 

-JAME


Re: org.apache.pdfbox.pdmodel.PDPage Error

2011-10-24 Thread MBD
Is this really a stumper? This is my first experience with Solr and having 
spent only an hour or so with it I hit this barrier (below). I'm sure *I* am 
doing something completely wrong just hoping someone more familiar with the 
platform can help me identify  fix it.

For starters...what's Could not initialize class ... mean in Java exactly? 
Maybe that the class (ie code) itself doesn't exist? - so perhaps I haven't 
downloaded all the pieces of the project? Or, could it be a hint that my kit is 
just not configured correctly? Sorry, I'm not a Java expert...but would like to 
get this stabilized...if possible.

If this is the wrong mailing list then just tell me and I'll go away...

Thanks!

On Oct 20, 2011, at 2:54 PM, MBD wrote:

 Hi, I'm new to Solr and trying to get it to index PDFs. Having trouble 
 getting started. Following examples in ExtractingRequestHandler wiki 
 http://wiki.apache.org/solr/ExtractingRequestHandler. 
 
 Got Solr running and it indexes html, xml  txt files just fine...but when I 
 try to feed it a .pdf it spits out an Error 500 Could not initialize class 
 org.apache.pdfbox.pdmodel.PDPage:
 
  $ curl 
 http://localhost:8983/solr/update/extract?literal.id=doc1commit=true; -F 
 myfile=@index.pdf
  html
  head
  meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
  titleError 500 Could not initialize class org.apache.pdfbox.pdmodel.PDPage
 
  java.lang.NoClassDefFoundError: Could not initialize class 
 org.apache.pdfbox.pdmodel.PDPage
  ...
 
 I thought maybe it's because Tika isn't installed/included so I tried 
 downloading and installing Tika separately...but even the Tika install fails 
 with: 
 
  
 ---
  Test set: org.apache.tika.parser.pdf.PDFParserTest
  
 ---
  Tests run: 5, Failures: 0, Errors: 5, Skipped: 0, Time elapsed: 0.63 sec  
 FAILURE!
  testVarious(org.apache.tika.parser.pdf.PDFParserTest)  Time elapsed: 0.165 
 sec   ERROR!
  java.lang.NoClassDefFoundError: Could not initialize class 
 org.apache.pdfbox.pdmodel.PDPage
 
 I don't know Java (but hopefully won't need to in order to get basic indexing 
 up and running as ultimate goal is to query this via Sunspot from a Rails 
 app) so go easy on me. 
 
 Let me know if you want/need more of the error dump.
 
 Any help would be greatly appreciated!
 -Mike



Re: some basic information on Solr

2011-10-24 Thread Dan Wu
 JRJ,

We did check the solr official website but found it was really technical,
since we are not on the developer side and we just want some basic
information or numbers about its usage.

Thanks for your answer, anyway.



2011/10/24 Jaeger, Jay - DOT jay.jae...@dot.wi.gov

 1.  Solr, proper, does not index files.  An adjunct called Solr Cel can.
  See http://wiki.apache.org/solr/ExtractingRequestHandler .  That article
 describes which kinds of files it Solr Cel can handle.

 2.  I have no idea what you mean by incidents per year.  Please explain.

 3.  Even though you didn't ask:  You are apparently a student at an
 advanced level.  At your level I would guess that your professors expect
 *YOU* to read thru the material available on the Internet on Solr and figure
 it out on your own, rather than just asking others to do your work for you.
  ;^)

 In particular, before asking further questions you should probably read
 thru http://wiki.apache.org/solr/FrontPage and
 http://lucene.apache.org/solr/tutorial.html .

 JRJ

 -Original Message-
 From: Dan Wu [mailto:wudan1...@gmail.com]
 Sent: Monday, October 24, 2011 12:43 PM
 To: solr-user@lucene.apache.org
 Subject: some basic information on Solr

  Hi all,

 I am doing a student project on search engine research. Right now I have
 some basic questions about Slor.

 1. How many types of data file Solr can support (estimate)? i.e. No. of
 file types solr can look at for indexing and searching.

 2. How much is estimated cost of incidents per year for Solr ?

 Since the numbers could vary from different platforms, however we would
 like
 to know the estimate answers regarding the general cases.

 Thanks



joins and filter queries effecting scoring

2011-10-24 Thread Jason Toy
I have 2 types of docs, users and posts.
I want to view all the docs that belong to certain users by joining posts
and users together.  I have to filter the users with a filter query of
is_active_boolean:true so that the score is not effected,but since I do a
join, I have to move the filter query to the query parameter so that I can
get the filter applied. The problem is that since the is_active_boolean is
moved to the query, the score is affected which returns back an order that I
don't want.
  If I leave the is_active_boolean:true in the fq paramater, I get no
results back.

My question is how can I apply a filter query to users so that the score is
not effected?


Re: questions on query format

2011-10-24 Thread Memory Makers
Thanks,

?q.alt=*:* worked for me -- how do I make sure that the standard query
parser is configured.

Thanks.

MM.


On Mon, Oct 24, 2011 at 2:47 AM, Ahmet Arslan iori...@yahoo.com wrote:

  2. If send solr the following query:
q=*:*
 
I get nothing just:
 responseresult
  name=response numFound=0 start=0
  maxScore=0.0/lst
  name=highlighting//response
 
  Would appreciate some insight into what is going on.

 If you are using dismax as query parser, then *:* won't function as match
 all docs query. To retrieve all docs - with dismax - use q.alt=*:*
 parameter. Also, adding debugQuery=on will display information about parsed
 query.



Is there a good web front end application / interface for solr

2011-10-24 Thread Memory Makers
Greetings guys,

Is there a good front end application / interface for solr?

Features I'm looking for are:
  configure query interface (using non programatic features)
  configure pagination
  configure bookmarking of results
  export results of a query to a csv or other format (JSON, etc.)

  Is there any demand for such an application?

Thanks.


DataImportHandler Nested Entities

2011-10-24 Thread Andy Shimell
Hi,

I want to use Solr 3.1 to index the content of a website. Rather than using a 
web crawler to fetch the content and load it into Solr I want to use the DIH to 
get the data from the Content Management Database that supports the website.

It would be possible to write SQL to obtain a complete set of metadata (for 
example DC.subject or DC.type) for each page or binary document stored in the 
database, using the JDBCDataSource. One of the values obtained would be the 
HTTP URL of the actual page or document, and I would need to obtain and index 
this content as well.

Could you tell me if its possible to nest entities that use a URLDataSource 
inside entities that use a JDBCDataSource ?

Andy




Re: questions on query format

2011-10-24 Thread Ahmet Arslan
 ?q.alt=*:* worked for me -- how do I make sure that the
 standard query
 parser is configured.

You can append defType=lucene to your search URL. 

More permanent way is to default defType parameter in solrconfig.xml.


A sort-by-geodist question

2011-10-24 Thread Yung-chung Lin
Hi,

I've started to use Solr to build up a search service, but I have
encountered a problem here.

However, when I use this URL, it always returns *sort param could not be
parsed as a query, and is not a field that exists in the index: geodist()*
*
*
http://localhost:8080/solr/select/?indent=truefl=name,coordinatesq=*:*sfield=coordinatespt=45.15,-93.85sort=geodist()%20aschttp://antlion.skimbl.com:8080/skimbl-solr/select/?indent=truefl=name,coordinatesq=*:*sfield=coordinatespt=45.15,-93.85sort=geodist(45.15,-93.85)%20asc

It works only when I specify coordinates in geodist().
http://localhost:8080/solr/select/?indent=truefl=name,coordinatesq=*:*sfield=coordinatespt=45.15,-93.85sort=geodist(45.15,-93.85)%20aschttp://antlion.skimbl.com:8080/skimbl-solr/select/?indent=truefl=name,coordinatesq=*:*sfield=coordinatespt=45.15,-93.85sort=geodist(45.15,-93.85)%20asc

And the returned documents don't seem to be ranked by distance according to
the criteria.

My lucene is 3.4. The field 'coordinates' is in geohash format.

Can anyone here give me some pointers?

Thank you very much.

Yung-chung Lin


Solr main query response input to facet query

2011-10-24 Thread solrdude
Hi,
I am implementing an solr solution where I want to use some field values
from main query output as an input in building facet. How do I do that?

Eg: 
Response from main query:

doc
str name=namename1/str
int name=prod_id200/int
/doc
doc
str name=namename1/str
int name=prod_id400/int
/doc

I want to build facet for the query where prod_id:200 prod_id:400. I like
to do all this in single query ideally. if it can't be done in one query, I
am ok with 2 query as well. Please help.

Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-main-query-response-input-to-facet-query-tp3449938p3449938.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sorting fields with letters?

2011-10-24 Thread Peter Spam
Tried using the ord() function, but it was the same as the standard sort.

Do I just need to bite the bullet and reindex everything?


Thanks!
Pete

On Oct 21, 2011, at 5:26 PM, Tomás Fernández Löbbe wrote:

 I don't know if you'll find exactly what you need, but you can sort by any
 field or FunctionQuery. See http://wiki.apache.org/solr/FunctionQuery
 
 On Fri, Oct 21, 2011 at 7:03 PM, Peter Spam ps...@mac.com wrote:
 
 Is there a way to use a custom sorter, to avoid re-indexing?
 
 
 Thanks!
 Pete
 
 On Oct 21, 2011, at 2:13 PM, Tomás Fernández Löbbe wrote:
 
 Well, yes. You probably have a string field for that content, right? so
 the
 content is being compared as strings, not as numbers, that why something
 like 1000 is lower than 2. Leading zeros would be an option. Another
 option
 is to separate the field into numeric fields and sort by those (this last
 option is only recommended if your data always look similar).
 Something like 11C15 to field1: 11, field2:C field3: 15. Then use
 sort=field1,field2,field3.
 
 Anyway, both this options require reindexing.
 
 Regards,
 
 Tomás
 
 On Fri, Oct 21, 2011 at 4:57 PM, Peter Spam ps...@mac.com wrote:
 
 Hi everyone,
 
 I have a field that has a letter in it (for example, 1A1, 2A1, 11C15,
 etc.).  Sorting it seems to work most of the time, except for a few
 things,
 like 10A1 is lower than 8A100, and 10A100 is lower than 10A99.  Any
 ideas?
 I bet if my data had leading zeros (ie 10A099), it would behave better?
 (But I can't really change my data now, as it would take a few days to
 re-inject - which is possible but a hassle).
 
 
 Thanks!
 Pete